Pradip Bose photo Chen-yong Cher photo

Research Areas

Project Name

Reliability-Aware Microarchitectures


Tab navigation

This project is focused on microarchitectural support to ensure reliable operation, in the face of hard and soft hardware failures. The technological trends that are causing high on-chip temperatures and increased soft-error rates have motivated architects to pursue innovative new approaches to maintain target reliability figures, without exceeding power budgets or incurring significant performance degradation. In this project, we focus on the following problems:

  • How to model the effects of hard and soft failures (that first manifest at the device or interconnect level) up at the architecture or system level? Can we project the chip-level failure rate (in FITs) or the mean time to failure (MTTF) for various input workloads of interest?
  • How to validate and calibrate pre-silicon predictive models that project reliability metrics of a target chip or system? What are the limits of applicability of our assumed modeling axioms?
  • What indeed are the right metriics to use?
  • How best to apply the principles of spatial and temporal redundancy in architecting solutions that provide error tolerance, while maintaining performance targets and power budgets.
This research project is built upon strong collaborative ties with external university groups (e.g. Prof. Sarita Adve's group at University of Illinois, Urbana-Champaign), as well as internal development group partners (e.g. Pia Sanda, Paul Muller, Ron Kalla, Scott Swaney, Lisa Spainhower, Balaram Sinharoy and many others within IBM Systems and Technology Group).