Researcher with a focus on special-purpose accelerators and optimizing compilers. My current focus is on building compilation technology for next-generation GPU based supercomputers using the LLVM compilation infrastructure. I proposed and implemented a new technique to map fork-join parallelism via the OpenMP programming model onto the SIMT architecture of NVIDIA GPUs. Our LLVM-based compiler is freely available at: https://github.com/clang-ykt/clang/wiki. My work has also been integrated into IBM's commercially supported XL compiler toolchain.
Previously I worked on loop scheduling optimizations for the Active Memory Cube (AMC), a low power near-memory processor. I extended swing modulo scheduling for the exposed pipeline Vector-VLIW core. More details are available in several papers [AMC] [COMPILER] [SCHEDULER] [CO-DESIGN].