Kaoutar El Maghraoui  Kaoutar El Maghraoui photo         

contact information

Principal Research Staff Member - AI Engineering
Thomas J. Watson Research Center, Yorktown Heights, NY USA


Professional Associations

Professional Associations:  ACM  |  ACM SIGOPS  |  IEEE   |  Society of Women Engineers  |  Women in Technology International (WITI)

IBM’s Watson Cognitive Technology and Advanced Analytics Techniques for next generation Technical Support

I am leading a strategic research project that aims at applying IBM’ Watson cognitive technology to systems problem diagnosis and resolution. The goal is to provide a full-scale Watson technology based search system in order to effectively and efficiently diagnose customer problems documented by call center agents and to identify the best action plan to resolve them. The project also aims at using analytics techniques and Natural Language Processing techniques (NLP) to identify trends/critical common problems and suggest action items with regards to problems that IBM customers face in the POWER platform. There is a significant potential for business impact (optimize call durations, increase first time fix, and increase client satisfaction). One of the tools that we have developed is called AutoDiag, this tool leverages both deep expertise in systems software and advanced analytics techniques to help customer support personnel solve Operating Systems’ downtime faster and more accurately. The tool uses Information Retrieval techniques to mines past solutions to handle new customer problem records based on stack trace-based signatures & graph-clustering techniques.

Automatic Problem Diagnosis and Recovery Project

I am a member of a small IBM research team that focuses on designing and building novel technologies for automated fault diagnosis and recovery for large servers. We have designed and prototyped a novel OS-hypervisor infrastructure that allows automated and transparent OS crash diagnosis and recovery in a virtual environment. This infrastructure eliminates the need for reboots or checkpoint-restart mechanisms, which require preserving the states of critical applications before the crash happens and also require extensive modifications to those applications. At the core of our invention is a small hidden OS-repair-image that is dynamically created from the healthy running OS instance. When an OS crashes, the hypervisor dynamically loads this repair-image to perform diagnosis and repair. One way of repair we have experimented with, is to quarantine the offending process and resume the running of the fixed OS automatically without a reboot.

I have led the design and development of the diagnosis and recovery component of the project which involved fault detection, root cause analysis, and recovery. The project touches on several layers in IBM’s POWER architecture software stack: PHYP (hypervisor), the firmware, and the operating system.

FLASH SSD Research

I was a key contributor to the modeling, design, and implementation of a novel Flash based Solid-State simulator. Given SSD wear-leveling characteristics, a Flash simulator is a cost-effective tool for benchmarking centers and research teams for initial experimentation and early design decisions. This was a research collaboration effort with my colleague Gokul Kandiraju. We have develop a model for the timing of a request and present a method to efficiently simulate delay for an incoming I/O request based on its (i) size (ii) direction of I/O and (iii) sequentiality of I/O requests. These modeling characteristics can be extracted from any devices and fed into the simulator.

The simulator that was built follows a novel approach which uses a block of memory to emulate all the internal operations of a Flash SSD storage including logical to physical address mapping, various log-block based wear leveling algorithms, garbage collection, etc. Applications can run on top of the simulator without any changes and as if they were running on a real SSD based block device. In contrast to this simulator technology, existing SSD simulators are trace-based and do not capture the full I/O request behavior in presence of OS effects.

We have also investigated the impact of SSD endurance on a representative set of representative applications and benchmarks. This investigation has resulted in interesting results about the wear-out characteristics including logical, physical and translation characteristics, and how they correlate with application behavior and SSD life-times. Another idea that came out of this work is using the linearity of the logical-to-physical block mapping to tune the garbage collection algorithm parameters (e.g., the size of the log-blocks). This idea has been patented.

Simultaneous Multi-Threading Optimization Work

The goal of this project is to optimize the performance (e.g., throughput) of applications through automatically tuning IBM’s POWER7 smart thread flexibility features (additional Simultaneous Multithreading (SMT) levels and hardware thread priorities). I have analyzed and studied the sensitivity of various workloads/applications to the multithreading levels used using standard benchmarks such as SPECCPU2006. The goal was to understand the effect of various SMT modes and to explore the design space of SMT reconfiguration algorithms in a controlled setting. The analysis consisted of doing a Cycles per Instruction (CPI) breakdown analysis and understanding the effects on data cache misses. This research showed that in the presence of contention of hardware resources in multi-core systems, lower multithreading modes could give better throughput. I also designed algorithms to help improve the performance of software threads by choosing the right SMT level. Some of these algorithms have been implemented as part of AIX smart thread optimization framework.

 We performed detailed experimental evaluation and micro-architectural analysis of the SMT gains for various benchmarks with the purpose of developing algorithms for the automatic tuning of the SMT-levels. We designed an SMT-selection metric which can automatically predict if it is beneficial for an application or a set of application to increases of decrease their multi-threading level.

 I have also collaborated with other researchers in building scheduling algorithms that optimize the mapping/scheduling of workloads on SMT threads of customer workloads to drive a larger throughput out of servers. Software threads consume the core functional units and caches in different ways. Hence co-locating friendly processes on the same core may improve the overall system throughput.


Memory Compression Modeling Work

Compressed RAM/disk paging system has significant impact on the performance of the virtual memory subsystem through having a low-overhead LRU/page stealing mechanism without involving disk I/O.

The goal of this project was to investigate whether memory compression is a beneficial technology for IBM’s POWER systems in terms of performance compared to the cost/price reduction. I designed and implemented a memory profiling tool called “mprofiler” that analyses the memory working set size and estimates the performance of hardware-assisted memory compression for various workloads. The results of mprofiler were instrumental in deciding that memory compression is to be included in the design of IBM’s POWER7+ processor. Mprofiler has also benefited other projects. It was used to estimate the performance of large memory caches (in the range of GBs) and to help analyze SAP customer workloads for properly planning live migration.

PhD Thesis Work

The goal of my thesis research was to devise grid middleware services that enable distributed applications to adapt to the constantly changing behavior of dynamic environments. Reconfiguration is supported at the component-level of applications for more flexible adaptation and is based on peer-to-peer protocols to achieve scalable decisions. This dissertation involved:
• Design and implementation of reconfiguration strategies, application-level and resource-level
profiling services.
• Design and implementation of the PCM library (Process Checkpointing and Migration), a library that extends iterative Message Passing Interface (MPI) applications with checkpointing, process migration , and split and merge capabilities.

For more details, please check my PhD thesis at: http://wcl.cs.rpi.edu/theses/elmaghraoui-phd.pdf