Alexandre E. Eichenberger  Alexandre E. Eichenberger photo       

contact information

Compiler optimization for SIMD & multicore parallelism
Thomas J. Watson Research Center, Yorktown Heights, NY USA
  +1dash914dash945dash1812

links

Professional Associations

Professional Associations:  ACM SIGMICRO  |  IEEE


Biographical Info

Dr. Alexandre Eichenberger is currently a Research Staff Member in the Advanced Compiler Technologies group of the VLSI Systems department at the IBM T.J. Watson Research Center. My research interests focus on the interaction between compiler technology and micro-architecture design.

While at Watson, I have explored and developed compiler support for many of the micro-architectural tradeoffs present in the Cell Broadband Engine (Cell BE) processors, primarily on the Synergistic Processor Element (SPE). Examples of tradeoffs are SPE-specific scheduling and bundling issues, as well as compiler techniques to prevent instruction fetch starvation on the SPEs.

More recently, I have be a primary contributor to the automatic generation of SIMD code targeting the SIMD units found in the CELL (SPE/VMX), Power (VMX), and BlueGene/L (double-precision floating-point) architectures, focusing on data alignment and code generation related issues. Prior work includes unroll-and-pack approaches and loop-based approaches. The novel approach that I pioneered combines aspects of both of these approaches. It also attempts to systematically minimize the impact of data reorganization due to compile-time or runtime data misalignment, and it can perform auto-simdization in the presence of data conversion (i.e., conversion from one data type to another). Auto-simdization can generate such minimum data reorganization code for the SIMD unit even in presence of multiple compile-time or runtime alignment. It also handles induction variables, private variables, and non-stride one memory reference patterns.

Prior to working at IBM, I worked on applying modulo scheduling to new architectures, such as clustered architectures where the compiler explicitly handles communication among clusters of functional units. I also applied modulo scheduling to new domains, such as to generating faster code instrumentation gathering branch profiling using modulo schedules that are distributed in the code. In that work, best technique reduces slow down due to instrumentation by a 10x factor.

I also extended my area of expertise to straight line code by investigating scheduling algorithms for superblocks, where most efficient algorithm (balance) explicitly delays some branches to reduce average execution time. This algorithm can reduce slowdown compared to lower bound by a 2x factor.

To achieve higher degree of performance, superblocks including multiple predicated execution traces (i.e. hyperblocks) have been extensively used. However, prior to my work, hyperblocks could not be optimized to the same extent as single path regions, because some conditions along one path may prevent useful optimization along some other paths. I proposed an approach that enables single path optimizations in hyperblocks by selectively renaming registers and replicating operations in the hyperblock. Renaming and replicating is performed only when it enables some optimizations to break a critical dependence. Measurements indicate that large speedups are possible, e.g. up to 66% for wc, 8% for li, and 7% for compress.

I received a Diploma in Computer Science at Eidgenoessische Technische Hochschule, Zuerich, Switzerland in 1991. I studied at the Computer and Electrical Engineering Department at the University of Michigan, Ann Arbor and received a M.S. and a Ph.D. degrees in Computer and Electrical Engineering in, respectively, 1993 and 1996. I was Assistant Professor at the Department of Electrical and Computer Engineering at the North Carolina State University before joining IBM Research at the IBM T.J. Watson Research Center in 2001.

I have published more than 30 refereed papers in journals and conferences including MICRO, PACT, PLDI, CGO, ICS.