Computer Architecture - Seminars
IBM Research welcomes members of the research community to our seminars. To ensure compliance with IBM security guidelines, we request you to contact the seminar host in advance. When you arrive at the Research lab, please provide the host's name to the receptionist.
|Understanding the memory systems of a modern NUMA Processor|
|Thomas Gross||On:||5-Nov-2010 11:00 AM - 12:00 PM|
|Professor||At:||Watson Research Center (Yorktown), Room YKT 20-001|
We investigate the memory performance of a Intel Xeon E 5520 quad-core processor (based on the Nehalem microarchitecture). The programming model of this system is simple (shared memory) yet in a two-processor system, the available memory bandwidth is not shared equally among all cores. We report the performance interaction between the location chosen for data memory allocation and the location chosen for a process. Depending on the number of processes that execute, it may be beneficial to move a process away from the processor that holds its data. To demonstrate the practical importance of this analysis (and its corresponding system model), we present a process scheduling algorithm specially adapted to this processor that takes the performance characteristics of the system (the sliding cost of remote vs. local memory access) into account. Whereas the overall system performance improvements for workloads of SPEC CPU2006 benchmark programs is moderate (average performance improvement of 3% over the default Linux scheduler), for a parallel program the performance improvement is 60%.
Thomas R. Gross is a Professor of Computer Science at ETH Zurich, Switzerland. Thomas Gross joined the Department of Computer Science at Carnegie Mellon University in Pittsburgh, PA, in 1984 after receiving a Ph.D. in Electrical Engineering from Stanford University. In 2000, he became a Full Professor at ETH Zurich.
|Architecture, design, and implementation of a 3D-IC many-core processor|
|Hsien-Hsin Lee||On:||4-Nov-2010 10:00 AM - 11:30 AM|
|Professor||At:||Watson Research Center (Yorktown), Room YKT 20-043|
As device scaling faces several fundamental changes due to physical limitations, die-stacked 3D integration is emerged as the frontrunner technology to continue Gordon Moore’s prophecy in the vertical dimension. It enables a true System-on-Chip design style by stacking multiple die, fabricated with either homogeneous or heterogeneous processes, onto the same package using inter-die vias or through-silicon vias (TSV). This highly anticipated solution not only packs more transistors for a given footprint, it could also offer several potential advantages, e.g., flexibility in integration, high memory bandwidth, low power consumption, and a much smaller form factor of a system, thereby presenting an ideal prospect for future embedded systems. Nevertheless, there are several unknown and unaddressed technical issues that could also push this technology only to a niche market. In this talk, I will discuss the opportunities, caveats and challenges of 3D stacked IC technology and present my view of its outlook toward system design. Then I will discuss a 3D-IC many-core processor called 3D-MAPS that we, at Georgia Tech, recently designed and taped out using Chartered Semiconductor’s 130nm technology node with Tezzaron’s 3D process. This particular 3D chip was prototyped with a goal of demonstrating the performance potential brought about by memory-on-logic 3D integration for applications with data crunching and streaming types of behavior, e.g., data-driven accelerators or GPGPU. Finally, I will discuss other potential architectural innovations enabled by 3D integration that we learned from this exercise.
Hsien-Hsin S. Lee is an Associate Professor in the School of Electrical and Computer Engineering at Georgia Institute of Technology. He received his Ph.D. degree in Computer Science and Engineering from the University of Michigan, Ann Arbor. His main research interests include computer architecture, low-power VLSI, cyber security, and 3D-IC technology.
|Intelligent Compilers (Evaluating Models to Predict Good Compiler Optimizations)|
|John Cavazos||On:||15-Oct-2010 11:04 AM|
|Assistant Professor||At:||Austin Research Lab, Room ARL Conf room|
|University of Delaware|
Choosing the right set of optimizations can make a significant difference in the running time of a program. However, compilers typically have a large number of optimizations to choose from, making it impossible to iterate over a significant fraction of the entire optimization search space. Recent research has proposed “intelligently” iterating over the optimization search space using predictive methods. In particular, state-the-art methods in iterative compilation techniques use characteristics of the code being optimized to predict good optimization sequences to evaluate. An important step in developing predictive methods for compilation is deciding how to model the problem of choosing the right optimizations. In this talk, I will discuss three different ways of modeling the problem of choosing the right optimization sequences using machine learning techniques. I will present two novel prediction modeling techniques, namely a speedup predictor and a tournament predictor, and those are able to effectively predict good optimization sequences. I will show that these novel modeling techniques out-perform current state-of-the-art predictors and out-performs the most aggressive setting of the Open64 compiler (Ofast) on an average by more than 20% in just 10 iterations. I will also present results in applying our speedup predictor to drive a polyhedral optimizer. I will show that our speedup predictor can effectively determine the right polyhedral optimizations to apply significantly outperforming the Intel compiler.
John Cavazos is an Assistant Professor in the Department of Computer & Information Sciences at the University of Delaware. He graduated with a Ph.D. in Computer Science from the University of Massachusetts, Amherst in 2004. Before coming to Delaware, he did post-doctoral research in the School of Informatics at the University of Edinburgh, Scotland, UK. His research interests are in intelligent and iterative compilation and auto-tuning for computer systems, spanning embedded computers to large-scale supercomputers.
|Architecture Highlights 2010: IBM Architecture PIC Student Workshop|
|Click here for Workshop webpage||On:||7-Oct-2010 08:15 AM - 06:00 PM|
|At:||Watson Research Center (Yorktown), Room 26-004/014/024|
Architecture Highlights 2010 is the first student workshop, a two day event, hosted by the Computer Architecture PIC of the IBM T.J. Watson Research Center. The workshop will include presentations from 11 invited students and IBMers. Topics include, but are not limited to:
|Data Sharing and Performance Isolation Design for Scalable Multi-Core Platforms|
|Sandhya Dwarkadas||On:||1-Oct-2010 10:00 AM - 11:30 AM|
|Professor||At:||Watson Research Center (Yorktown), Room 20-043|
|University of Rochester|
Technology projections indicate the possibility of 50 billion transistors on a chip in a few years, with the processor landscape being dominated by multi-core designs. Developing correct and reliable software that takes advantage of the multiple cores to deliver high performance, while at the same time ensuring performance isolation, remains a growing challenge. In this talk, I will begin by describing our changes to existing coherence mechanisms in order to scale data sharing support and to improve the efficiency of fine-grain sharing. As time permits, I will also describe our efforts in combining resource utilization prediction, resource control mechanisms (page coloring, hardware execution throttling, resource -aware scheduling), and resource -aware policies in order to effect performance isolation at the operating system level and improve multi-core resource utilization.
Sandhya Dwarkadas is currently a Professor of Computer Science at the University of Rochester, with a secondary appointment in Electrical and Computer Engineering. She spent a sabbatical year at IBM Watson in 2002---2003. She received her Bachelor's from the Indian Institute of Technology, Madras, India, in 1986, and her M.S. and Ph.D. in Electrical and Computer Engineering from Rice University in 1989 and 1993, respectively. Her research lies at the interface of hardware and software with a particular focus on concurrency. She is also co-inventor on 7 granted U.S. patents, associate editor for IEEE Computer Architecture Letters (2006---2010), and for IEEE Transactions on Parallel and Distributed Systems (2000--2003).
|Dark Silicon and its Implication on Server Design|
|Babak Falsafi||On:||7-Sep-2010 10:00 PM - 11:30 PM|
|Professor||At:||Watson Research Center (Yorktown), Room 20-043|
Technology forecasts indicate that device scaling will continue well into the next decade. Unfortunately, it is becoming extremely difficult to harness performance out of this increase in the number of transistors due to a number of technological, circuit, architectural, methodological and programming challenges. In this talk, I will argue that the ultimate emerging showstopper is power even for workloads with abundant parallelism. Voltage scaling as a means to maintain a constant power envelope with an increase in transistor numbers has hit diminishing returns, requiring drastic measures to cut power to continue riding the Moore's law. I will present results backing this argument based on validated models for future server chips and parameters extracted from real commercial workloads. Then I use these results to project future research directions for server hardware and software.
Babak Falsafi is a Professor in the School of Computer and Communication Sciences at EPFL, and an Adjunct Professor of Electrical and Computer Engineering and Computer Science at Carnegie Mellon. He is the founder and the director of the Parallel Systems Architecture Laboratory (PARSA) at EPFL where he conducts research on architectural support for parallel programming, resilient systems, architectures to break the memory wall, and analytic and simulation tools for computer system performance evaluation. He is a recipient of an NSF CAREER award in 2000, IBM Faculty Partnership Awards between 2001 and 2004, and an Alfred P. Sloan Research Fellowship in 2004. He is a senior member of IEEE and ACM.
|Making Enterprise Computing Green: Energy-Efficiency Challenges in Enterprise Data Centers|
|Thomas Wenisch||On:||17-Aug-2010 10:00 AM - 11:30 AM|
|Assistant Professor||At:||Austin Research Lab, Room 904/6D-000 (simulcast to Yorktown 20-001)|
|University of Michigan|
Architects and circuit designers have made enormous strides in managing the energy efficiency and peak power demands of processors and other silicon systems. Sophisticated power management features and modes are now myriad across system components, from DRAM to processors to disks. And yet, despite these advances, typical data centers today suffer embarrassing energy inefficiencies: it is not unusual for less than 20% of a data center's multi-megawatt total power draw to flow to computer systems actively performing useful work. Managing power and energy is challenging because individual systems and entire facilities are conservatively provisioned for rare utilization peaks, which leads to energy waste in underutilized systems and over-provisioning of physical infrastructure. These inefficiencies lead to worldwide energy waste measured in billions of dollars and tens of millions of metric tons of CO2. Our collective narrow view "inside the box" fails to capture opportunities for energy, power, and thermal management that cut across system components or extend beyond the server to the data center's physical infrastructure. In this talk, I first discuss the massive power wasted due to idleness---servers that are powered on, but not performing useful work---and survey techniques to reduce this waste. I then give examples of opportunities for computer system designers to impact power management of physical infrastructure like power delivery and cooling systems. I close with a call-to-arms for our community to create and disseminate the modeling tools, benchmarks, and characterizations of real-world systems that are needed to make rapid progress on system- and data-center-level power management.
Thomas Wenisch is an Assistant Professor of Computer Science and Engineering at the University of Michigan, specializing in computer architecture. Tom received an NSF CAREER award in 2009. Prior to his academic career, Tom was a software developer at American Power Conversion, where he worked on data center thermal topology estimation. He is co-inventor on three patents. Tom received his Ph.D. in Electrical and Computer Engineering from Carnegie Mellon University.
|Looking back on the Language and Hardware Revolution: Measured Power, Performance, and Scaling|
|Kathryn McKinley||On:||3-Aug-2010 10:00 AM - 11:30 AM|
|Professor||At:||Austin Research Lab, Room 6D-000|
|University of Texas at Austin|
This talk evaluates measured power, energy, performance, and scaling of native and managed, single and multi-threaded workloads across a selection of IA32 processor technology generations from 2003 to 2010: 130nm, 65nm, 45nm, and 32nm. We measure chip power with a Hall effect sensor and report findings in three areas. 1) Methodology: TDP and maximum power are not well correlated with measured power or energy and should not be used to compare processors. For software designers to optimize power and performance, hardware vendors must make accurate, accessible power metering available to software. 2) Power-performance trends: While power improvements stayed constant, performance improvements fell markedly, from 139% in the two generations from 130nm to 65nm to just 74% in the next two generations, 65nm to 32nm, even using parallel workloads. 3) Scaling: A wide variety of resource sharing patterns limit multi-threaded workload scaling, but virtual machines for managed languages offer new opportunities for software scalability. These findings challenge prevalent methodologies, show differences between native and managed workloads, and offer new insight into how microarchitectures have traded power and performance as process technology shrank. We point to new opportunities and challenges that managed runtimes are poised to exploit.
Professor McKinley received her Ph.D. from Rice University. Her research interests include compilers, memory management, runtime systems, programming languages, debugging, and architecture. McKinley is an ACM Fellow.
|Architecture PIC Workshop: Future Challenges and Opportunities|
|Joel Emer (Intel), Janak Patel (UIUC), Trevor Mudge (UMich||On:||27-May-2010 01:00 PM - 04:45 PM|
|At:||Watson Research Center (Yorktown), Room 20-001|
Spend an afternoon with 3 distinguished computer architects at this Computer Architecture Workshop. The three distinguished speakers will present their recent research. The Workshop will be opened by Dr. Joel Emer, Intel Fellow with a seminar on "An Evolution of General Purpose Processing: Reconfigurable Logic Computing". Prof. Janak Patel of the University of Illinois at Urbana-Champaign will present his work on "CMOS Process Variations: A ‘Critical Operation Point’ hypothesis", and Prof. Trevor Mudge of the University of Michigan will talk about advances in Computer Architecture. The workshop will conclude with a 45 minute Q&A session and panel discussion inviting all attendees to discuss with our speakers.
Joel Emer is an Intel Fellow and Director of Microarchitecture Research at Intel in Hudson, Massachusetts. Previously he worked at Compaq and Digital Equipment Corporation where he held various research and advanced development positions investigating processor micro-architecture for a variety of VAX and Alpha processors and developing performance modeling and evaluation techniques. His research included pioneering efforts in simultaneous multithreading and early contributions on the now pervasive quantitative approach to processor evaluation. His current research interests include memory hierarchy design, processor reliability, reconfigurable logic-based computation and performance modeling. In his spare time, he serves as visiting faculty at MIT. He received his PhD in electrical engineering under Edward S Davidson at the University of Illinois at Urbana-Champaign in 1979. He is a Fellow of both the ACM and the IEEE, and was the 2009 recipient of the Eckert-Mauchly award for lifetime contributions in computer architecture. Janak H. Patel is a Research Professor in Coordinated Science Laboratory and Department of Electrical and Computer Engineering at University of Illinois at Urbana-Champaign. Patel’s research contributions include Pipeline Scheduling, Cache Coherence, Cache Simulation, Interconnection Networks, On-line Error Detection, Reliability analysis of memories with ECC and scrubbing, Design for Testability, Built-In Self-Test, Fault Simulation and Automatic Test Generation. Patel has supervised over 85 M.S. and Ph.D. theses and published over 200 technical papers and listed as a Highly Cited Researcher. He was a founding technical advisor to Nexgen Microsystems that gave rise to the entire line of microprocessors from AMD. He was a founder of successful startup, Sunrise Test, a CAD company for chip testing, now owned by Synopsys. He received a Bachelor of Science degree in Physics from Gujarat University, India and Bachelor of Technology in Electrical Engineering from the Indian Institute of Technology, Madras, India, and a Master of Science and Ph.D. in Electrical Engineering from Stanford University. He is a fellow of ACM and IEEE and a recipient of the 1998 IEEE Piore Award. Trevor Mudge received the Ph.D. degree in Computer Science from the University of Illinois, Urbana, in 1977. Since then, he has been on the faculty of the University of Michigan, Ann Arbor. In 2004, he was named the first Bredt Family Professor of Electrical Engineering and Computer Science after concluding a ten year term as the Director of the Advanced Computer Architecture Laboratory -- a group of about 8 faculty and 80 graduate students. He is author of numerous papers on computer architecture, programming languages, VLSI design, and computer vision. He has also chaired 33 theses in these research areas. In addition to faculty position, he runs Idiot Savants, a chip design consultancy. Trevor Mudge is a Fellow of the IEEE, a member of the ACM, the IET, and the British Computer Society.
|Spiral: Program Generation and Automatic Algorithm/Platform Co-Design|
|Franz Franchetti||On:||21-May-2010 10:00 AM - 11:30 AM|
|Assistant Research Professor||At:||Watson Research Center (Yorktown), Room YKT 20-001|
|Carnegie Mellon University|
Spiral (www.spiral.net) is a program and hardware design generation system for linear transforms such as the discrete Fourier transform, discrete cosine transforms, filters, and others. We are currently extending Spiral beyond its original problem domain, using linear algebra, coding, software defined radio, and radar image formation as examples. For a user-selected problem specification, Spiral autonomously generates different algorithms, represented in a declarative form as mathematical formulas, and their implementations to find the best match to the given target platform. Besides the search, Spiral performs deterministic optimizations on the formula level, effectively restructuring the code in ways unpractical at the code or design level. The implementation generated by Spiral rival the performance of expertly hand-tuned libraries. Spiral's mathematical approach also enables us to start exploring an "inverse" question: Which architectural features are well suited for a particular algorithm or class of algorithms? How would an "optimal" architecture for a given problem look like? In this talk, we give a short overview on Spiral. We explain then how Spiral generates efficient programs for parallel platforms including vector architectures, shared and distributed memory platforms, and GPUs; as well as hardware designs (Verilog) and automatically partitioned software/hardware implementations. We then discuss our approach towards solving the "inverse questions" and present the Data Pump Architecture, a highly parameterizable non-von Neumann architecture that allows for explicit control of data movement. We discuss how the parametric DPA architecture together with Spiral's program generation capabilities provides a pathway to balanced, domain-specific processor designs.
Franz Franchetti is an Assistant Research Professor with the Department of Electrical and Computer Engineering at Carnegie Mellon University. He received the Dipl.-Ing. (M.Sc.) degree in Technical Mathematics and the Dr. techn. (Ph.D.) degree in Computational Mathematics from the Vienna University of Technology in 2000 and 2003, respectively.
|Scheduling for Highly Multithreaded and SIMD Cores|
|Kevin Skadron||On:||6-May-2010 10:00 AM - 11:30 AM|
|Associate professor||At:||Watson Research Center (Yorktown), Room 20-001|
|University of Virginia|
Multithreaded and SIMD cores require caches to support increasing numbers of threads. Data-parallel workloads exhibit similarity among threads' data-access patterns, and this property can be exploited to improve thread scheduling to maintain high throughput in the presence of severe cache contention. Instead of conventional tiling, or throttling the number of active threads, we propose to partition data sets at the granularity of cores and then assign threads to operate on neighboring data elements. This improves throughput by 69% on average. For SIMD organizations, we also propose new hardware support for divergent branch and cache-hit behavior. This increases utilization and also improves memory level parallelism, and increases throughput by a further 70%.
Kevin Skadron is an associate professor of computer science at the University of Virginia, where he has been on the faculty since 1999. He received his Ph.D. in Computer Science from Princeton University, and his BS and BA degrees in Computer Engineering and Economics from Rice University. Skadron's research interests focus on power, thermal, reliability, and programming challenges for multicore, manycore, and heterogeneous architectures.
|Circuit, Architectural, and Algorithmic Mitigation of Unreliable Hardware|
|Mikko Lipasti,||On:||1-Apr-2010 10:00 AM - 11:30 AM|
|Professor||At:||Watson Research Center (Yorktown), Room 20-001|
|University of Wisconsin - Madison|
Future processors will be built using devices with many unattractive characteristics: they will be increasingly vulnerable to soft errors, their performance will degrade over time due to aging, and they will often fail prematurely due to wearout effects. To combat these effects, researchers must explore potential solutions to these challenges at the circuit, microarchitecture, and algorithmic levels. This talk will summarize recent work in the PHARM group at all three levels. First, I will describe circuit-level approaches for efficiently detecting and correcting logic soft errors in pipelined circuits. Next, I will describe a novel and comprehensive microarchitectural approach for mitigating aging and wearout effects by equalizing device duty cycles. Finally, I will briefly discuss our efforts to develop neurally-inspired computing systems that are inherently tolerant of both transient and permanent component faults.
Mikko Lipasti is currently the Philip Dunham Reed Professor of Electrical and Computer Engineering at the University of Wisconsin-Madison. He earned his BS/MS/PhD degrees in computer engineering at Valparaiso University and Carnegie Mellon University.
|The Road to Many-core Computing|
|David Kaeli||On:||2-Mar-2010 10:00 PM - 11:30 AM|
|Professor||At:||Watson Research Center (Yorktown), Room 20-001|
In high performance computing (HPC) environments, we are beginning to see a rapid migration from multi-core systems to many-core platforms. This transition has been fueled through the introduction of new programming languages such as CUDA and OpenCL to ease application development on Graphics Processing Units (GPUs). These platforms are now being used to accelerate a wide range of critical applications, including: remote sensing, environmental monitoring, financial forecasting and medical image analysis. Both GPU hardware and software have been evolving; we fully expect to see GPU with more than 1000 cores available soon, programmed with a range of general purpose languages based on C/C++, Python and Matlab. In this talk we will review the current state-of-the-art in the use of these powerful platforms for a number of computationally challenging applications. We will also present our recent work on improving performance on these platforms through better utilization of the heterogeneous memory system present, and through the use of multiple GPUs to exploit new degrees of parallelism. We also describe our current work on developmenting GPU libraries for the biomedical imaging research community.
David Kaeli received a BS and PhD in Electrical Engineering from Rutgers University, and an MS in Computer Engineering from Syracuse University. He is presently a Full Processor on the ECE faculty at Northeastern University, Boston, MA where he directs the Northeastern University Computer Architecture Research Laboratory (NUCAR). Prior to joining Northeastern in 1993, Kaeli spent 12 years at IBM, the last 7 at T.J. Watson Research Center, Yorktown Heights, NY.
|Logs and Lifeguards: Using Chip Multiprocessors to Help Software Behave Correctly|
|Todd C. Mowry||On:||23-Feb-2010 10:00 AM - 11:30 AM|
|Professor||At:||Watson Research Center (Yorktown), Room 20-001|
|Carnegie Mellon University|
While performance and power-efficiency are both important, correctness is perhaps even more important. In other words, if your software is misbehaving, it is little consolation that it is doing so quickly or power-efficiently. Companies that operate large data centers have already done an impressive job of addressing one of the reasons why software may misbehave, which is that the underlying hardware may fail. In the Log-Based Architectures (LBA) project, however, we are focusing on perhaps an even more challenging source of misbehavior, which is that the application itself contains bugs, including obscure bugs that only cause problems during security attacks. Software bugs are difficult to recognize, and they are particularly problematic because they may cause every node in the system to fail (unlike hardware failures, which tend to be more isolated). To help detect and fix software bugs, we have been exploring techniques for accelerating dynamic program monitoring tools, which we call "lifeguards". Lifeguards are typically written today using dynamic binary instrumentation frameworks such as Valgrind or Pin . Due to the overheads of binary instrumentation, lifeguards that require instruction-grain information typically experience 30X-100X slowdowns, and hence it is only practical to use them during explicit debug cycles. Our goal is to reduce these overheads to the point where lifeguards can run continuously on deployed code. To accomplish this, we create a dynamic log of instruction-level events in the monitored application and stream this information to one or more lifeguards running on separate cores on the same chip multiprocessor (CMP). In our results so far, we have shown that the basic logging approach typically reduces the slowdown by roughly an order of magnitude from roughly 30X to roughly 3X. In a recent ISCA paper, we demonstrated several hardware-based techniques that can eliminate redundancy in the even-driven lifeguards and reduce the slowdown to just 20%. In our ongoing research, we are attempting to achieve similar performance through software-only techniques (by extending dynamic compiler optimization techniques to eliminate redundancy within the lifeguards), and we are extending our support to parallel and concurrent environments. We believe that our techniques are applicable to any event-driven lifeguards that processes streams of events, and are compatible with sampling-based techniques that can further reduce the power and performance impacts of monitoring. This talk will describe the work that we have done so far, as well as our plans for future research.
Todd C. Mowry is a Professor and the Associate Head for Faculty in the Computer Science Department at Carnegie Mellon University. He received his Ph.D. from Stanford University in 1994. He currently co-leads the Log-Based Architectures project and the Claytronics project. Prof. Mowry served a rotation as the Director of the Intel Research Pittsburgh lab from 2004 through 2007. He is an Associate Editor of ACM Transactions on Computer Systems.
|Towards Efficient, Adaptive Hardware Systems|
|Martha Kim||On:||21-Jan-2010 10:00 AM - 11:30 AM|
|Assistant Professor||At:||Watson Research Center (Yorktown), Room 20-043|
Software applications are highly dynamic entities, and as such demonstrate characteristics that fluctuate in response to numerous factors, including program phase, program input, time of day, software patches, and software environment. In contrast, the hardware executing these applications remains relatively static, showing, at best, narrow adaptation to changing software needs. This limited responsiveness to changing workloads leads to inefficiencies. As computers conduct more of our work and energy usage becomes an increasingly serious concern, efficient computation has become the focus of intense scientific scrutiny. This talk outlines two approaches to adaptive, efficient hardware that we are pursuing. The first approach is a "hardware OS" that will actively manage hardware system settings (be it via voltage and frequency scaling, parameterization, or configuration). Whereas the standard software OS provides a uniform abstraction of the hardware to a diverse collection of applications, the role of a hardware OS is to provide a uniform interface for interaction with the software to a diverse collection of subsystems. The second approach investigates the use of specialized hardware accelerators to meet future efficiency requirements. While hardware accelerators are relatively easy to build—transistors are cheap and getting cheaper—making them useful is another matter. We believe the only successful accelerator architectures will be those that are easy to program. We propose a scheme to make accelerator use easy, in which accelerators are stored-program computers, the programmer’s intent is captured via the use of high-level libraries, and the resulting programs are mapped via a just-in-time compiler to the available accelerators.
Martha Kim is an Assistant Professor in the Computer Science Department at Columbia University. She received her PhD in Computer Science and Engineering from the University of Washington in December 2008. She earned her bachelors in Computer Science from Harvard University in 2002 and a Masters in embedded systems design from the University of Lugano in Switzerland in 2003.