2016 IBM Research Workshop on Architectures for Cognitive Computing and Datacenters - Talk Details

 Neha Agarwal

Neha Agarwal - University of Michigan, Ann-Arbor, advised by Prof. Thomas Wenisch

Talk title: Paving a path for simplifying CPU-GPU coherence

Abstract: Cache coherence is ubiquitous in shared memory multi-processors because it provides a simple, high performance memory abstraction to programmers. Recent work suggests extending hardware cache coherence between CPUs and GPUs to help support programming models with tightly coordinated sharing between CPU and GPU threads. However, implementing hardware cache coherence is particularly challenging in systems with discrete CPUs and GPUs that may not be produced by a single vendor. Instead, we propose selective caching, wherein we disallow GPU caching of any memory that would require coherence updates to propagate between the CPU and GPU, thereby decoupling the GPU from vendor-specific CPU coherence protocols. We propose several architectural improvements to offset the performance penalty of selective caching including aggressive request coalescing, CPU-side coherent caching for GPU-uncacheable requests, and CPU–GPU interconnect optimizations. These optimizations bring a selective caching GPU implementation to within 93% of the throughput of a hardware cache-coherent implementation without the need to integrate CPUs and GPUs under a single hardware coherence protocol.

Bio: Neha Agarwal is a PhD student at University of Michigan, Ann Arbor. She works with Prof. Thomas Wenisch. Her research focuses on heterogenous memory management at OS and hardware level. Currently, she is exploring techniques to make cheaper/slower memory viable for data-centers. Prior to this work she has worked on CPU-GPU systems addressing cache coherency, data placement and migration challenges. In her spare time she likes to travel.

Song Han

Song Han - Stanford University, advised by Prof. Bill Dally

Talk title: Deep Compression and EIE: Deep Neural Network Model Compression and Hardware Acceleration

Abstract: Deep neural networks have evolved to be the state-of-the-art technique for machine learning tasks ranging from computer vision, speech recognition to natural language processing. However, deep learning algorithms are both computationally intensive and memory intensive, making them difficult for efficient deployment. In particular, accessing memory is more than two orders of magnitude more energy consuming than ALU operations, thus it is critical to reduce memory reference. To address this problem, this talk first introduces “Deep Compression” that can compress the deep neural networks by 10x-49x without loss of prediction accuracy. Then this talk will discuss EIE, the "Efficient Inference Engine" that works directly on the deep-compressed DNN model and accelerates the inference, taking advantage of weight sparsity, activation sparsity and weight sharing, which is 13x faster and 3000x more energy efficient than a TitanX GPU.

Bio: Song Han is 5th year PhD student with Prof. Bill Dally at Stanford University. His research interest is deep learning and computer architecture. Currently his research is improving the accuracy and efficiency of neural networks. He worked on Deep Compression that can compress state-of-the art CNNs by 10x-49x. He compressed SqueezeNet to only 470KB, which can fit fully in on-chip SRAM. Then he designed EIE accelerator, an ASIC that works on the compressed model, which is 13x faster and 3000x energy efficient than TitanX GPU. His work has been covered by TheNextPlatform, TechEmergence, Embedded Vision and O’Reilly. His work on Deep Compression has won the best paper award in ICLR’16. Before joining Stanford, Song Han graduated from Institute of Microelectronics, Tsinghua University.

Johann Hauswald

Johann Hauswald - University of Michigan, advised by Profs. Jason Mars, Lingjia Tang, and Trevor Mudge

Talk title: Investigating Emerging Workloads and Their Implications on Future Datacenter Designs

Abstract: As Intelligent Personal Assistants (IPA) such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo continue to gain traction, webservice companies are now providing image, speech, and natural language processing web services as the core applications in their datacenters. These emerging applications require machine learning and are known to be significantly more compute intensive than traditional cloud based web services, giving rise to a number of questions surrounding the designs of server and datacenter architectures for handling this volume of computation.

Bio: Johann Hauswald is Ph.D. student in Computer Science and Engineering at the University of Michigan, Ann Arbor, MI. His research focuses on system design for emerging cloud workloads.

Keqiang He

Keqiang He - University of Wisconsin-Madison, advised by Prof. Aditya Akella

Talk title: Improving Datacenter Network Performance via Intelligent Edge

Abstract: Low latency and high throughput are required in Datacenter networks to support a diverse set of applications. But today’s datacenter networks have inflated TCP RTT and reduced throughput due to congestion and imperfect traffic load balancing. In this talk, I will present how we can use the intelligent network edge to solve these issues. First, I will present a work which uses Open vSwitch to perform virtualized congestion control enforcement for multi-tenant clouds. Second, I will present a work which uses Open vSwitch and Generic Receive Offload (GRO) to achieve near-perfect traffic load balancing in Clos networks. These two works were published in SIGCOMM’15 and SIGCOMM’16 respectively.

Bio: Keqiang He is a final year Ph.D. student in the Computer Science department of the University of Wisconsin-Madison. He received his B.Eng. and M.S. degrees from Xidian University and Tsinghua University. Keqiang’s Ph.D. thesis is on improving throughput, reducing latency and improving robustness of datacenter networks. During his Ph.D., he studied and proposed solutions for datacenter traffic load balancing, congestion control for multi-tenant datacenters and researched into the control plane latency issue in SDN switches. He has published several technical papers in top-tier conferences such as ACM SIGCOMM, IMC and SOSR during his Ph.D. study. Keqiang was a recipient of UW-Madison Lawrence H. Landweber fellowship in Distributed Systems and Tsinghua-Morgan Stanley scholarship.

Jason Lowe-Power

Jason Lowe-Power - University of Wisconsin-Madison, advised by Profs. Mark Hill and David Wood

Talk title: Programmable Accelerators

Abstract: Programming accelerators is challenging. Not only do programmers reason about parallelism, but unlike traditional CPU programming, they also must reason about data movement and transformation. Multicore and multiprocessor CPUs hide this movement and transformation through two mechanisms: virtual addressing and cache coherence. In this talk, I focus on one current high-performance accelerator--integrated general-purpose GPUs--as an example. I show that directly using CPU techniques on high performance accelerators, like the GPU, is impractical. GPUs can sustain many more memory requests per cycle which causes significant performance degradation for both address translation (up to 10x slowdown) and cache coherence (up to 4x slowdown). I present two techniques to bring these vital CPU capabilities to high-performance accelerators. 
First, I show that providing virtual address translation for integrated GPUs is feasible by leveraging the GPU architecture to filter most TLB accesses and using a high-bandwidth page table walker. Second, I will show that providing fine-grained coherence for integrated  GPUs is practical by exploiting the spatial locality of GPU workloads and using a coarse-grained coherence mechanism. With these two techniques, virtual addressing and cache coherence can be feasibly implemented on high performance accelerators which will ease programmer adoption of these emerging platforms. 

Bio: Jason Lowe-Power is a Ph.D. candidate at the University of Wisconsin-Madison in the Computer Sciences department advised by Mark Hill and David Wood. He received a B.Sc. in Computer Science from Georgia Institute of Technology in 2010 and an M.Sc. in Computer Science from UW-Madison in 2013. His research focuses on increasing the energy efficiency of computing systems with a focus on exposing energy-efficient accelerators to all programmers. His research also targets reducing the energy needed for analytic database operations used by companies like Amazon, Netflix, Google, Target, etc. to analyze and deeply understand their customers’ needs. He has completed internships at Advanced Micro Devices Research and Georgia Tech Research Institute. He was awarded the Wisconsin Distinguished Graduate Fellowship Cisco Computer Sciences Award in 2014 and 2015. Jason is scheduled to graduate in May 2016.

Prashant Nair

Prashant Nair - Georgia Institute of Technology, advised by Prof. Moin Qureshi

Talk title: Architectural Techniques to Enable Reliable and Scalable Memory Systems

Abstract: Computer systems are using increasing number of cores and specialized units that process large amounts of data. To meet this demand, memory systems must grow proportionately to store and deliver this data. However, as memory technology scales, memory cells tend to become weak and unreliable. Due to this, the benefits of higher capacity are offset by a reduction in memory reliability. This talk focuses on architectural techniques that enable scalable and resilient memories for systems that span cloud, client, server and exascale computing. This talk presents simple and effective techniques for solving pressing problems like row-hammering, sub-20nm scaling and efficient Chipkill for conventional and stacked memories.

Bio: Prashant Nair is a 6th year PhD. candidate who is advised my Moin Qureshi and currently works on memory reliability. During his PhD., he has published 11 papers in several top tier conferences highlighting effective techniques that enable scalable memory systems. His other research interests include Internet of Things and Quantum Computing, which he would like to explore in the future.

Brandon Reagen

Brandon Reagen - Harvard University, advised by Profs. David Brooks and Gu-Yeon Wei

Talk title: Architectural Support for Deep Learning at Harvard: Minerva, Fathom, Bayesian Optimization, and More

Abstract: In this talk I will give an overview of the major focus areas myself and lab mates are working on at Harvard. Minerva is a framework for designing and optimizing DNN accelerators across the compute stack-- from algorithms to circuits. Proposing aggressive optimizations (quantization, pruning, and fault-mitigation), a Minerva optimized accelerator dissipates 8x less power than a competing design without compromising model accuracy. Minerva appeared in ISCA 2016. Fathom is a benchmark suite for Deep Learning. It consists of 8 seminal Deep Learning models (everything from the AlexNet to the more radical DeepQNet) all implemented in TensorFlow. It appeared at IISWC 2016 and the code is available. Finally, I'll talk about a powerful optimization technique-- Bayesian optimization. We have applied Bayesian optimization to efficiently explore unruly DNN hyperparameter and accelerator design spaces. We find it significantly outperforms traditional methods (e.g., genetic algorithms) and identifies Pareto optimal design points with only dozens of objective function evaluations. The results that will be presented are currently under submission for publication. If time remains, some details of our chip prototype may also be shown.

Bio: Brandon Reagen is a 5th year PhD candidate being co-advised by David Brooks and Gu-Yeon Wei at Harvard. During the first half of his PhD he focused on hardware accelerators. Specifically their design (quantifying acceleration), Modeling (Aladdin), benchmarking (MachSuite), and fabrication (RoboBee SoC). For the past 2 years he has been working on Machine Learning and considering how to co-design hardware to the inherent, unique properties of DNNs to push the efficiency limits achievable and enable wide scale DNN deployment to even the most constrained devices. His Machine Learning projects include Minerva, Fathom, and Bayesian optimization.


Dario Gil - Vice President of Science and Solutions, IBM Research

Talk title: The Cognitive Era and the New Frontiers of Information Technology

Abstract:  Ever since humans began trading, and consequently, adding and subtracting numbers, they perceived a need for mechanical assistance to help them keep track of their transactions. Thus, the earliest computers were born. In fundamental ways, and although the mechanisms of how we perform calculations have profoundly changed over the millennia, we have been building calculators ever since. But something fundamental is changing now. For the first time, we are capable of building learning systems that can be deployed at scale. The web and the Internet of Things are providing us with vast amounts of digitized knowledge, knowledge that is being used to train machine-learning algorithms. The power of these algorithms is their ability to learn from data, rather than follow only explicitly programmed instructions. And thanks to our powerful computers, the algorithms now operate at the scale and speed required to tackle really complex problems. Robotics, self-driving cars, speech and image recognition, medical diagnosis; the applications will reach as far as there are patterns to be discovered. The future of knowledge and expertise is a collaborative relationship between humans and computers that we call Cognitive Computing. Cognitive systems can make sense of the 90 percent of the world’s data that computer scientists call “unstructured.” This enables them to keep pace with the volume, complexity, and unpredictability of information and systems in the modern world. None of this involves either sentience or autonomy on the part of machines. Rather, it consists of augmenting the human ability to understand – and act upon – the complex systems of our society. This augmented intelligence is the necessary next step in our ability to harness technology in the pursuit of knowledge, to further our expertise, and to improve the human condition. That is why it represents not just a new technology, but the dawn of a new era of technology, business, and society: The Cognitive Era.

In parallel with the fundamental shift of traditional computing to Cognitive Computing, an intensely promising and radically new way of computing is also emerging: Quantum Computing. While Cognitive Computing systems are built on the same types of silicon transistors that have underpinned traditional computing for half a century, Quantum Computing harnesses the unique and non-intuitive properties of quantum devices to compute in entirely new ways that will allow us to solve problems that would be otherwise intractable. While quantum technology is still nascent, it is now in an exciting and formative stage that is simultaneously pushing the boundaries of both physics and information technology.

Together, Cognitive Computing and Quantum Computing represent new frontiers of information technology and promise to usher in an era of unprecedented advances in the power of technology to tackle the world's toughest problems.

Bio: Dr. Gil is a leading technologist and senior executive at IBM. As Vice President of Science and Solutions of IBM Research, Dr. Gil directs a global organization of ~1,500 researchers across 11 laboratories. He has direct responsibility for IBM’s science agenda, with a broad portfolio of activities spanning the physical sciences, the mathematical sciences, healthcare, and the life sciences. Dr. Gil is also responsible for IBM’s cognitive solutions research agenda, which aims to create scientific and technological breakthroughs to differentiate IBM’s solutions businesses and serves as an incubator for future cognitive industry solutions for IBM and its clients. Prior to his current position, Dr. Gil was the Director of Symbiotic Cognitive Systems, where he led the creation of cognitive environments, highly interactive physical spaces designed to improve the quality of decision-making through always-on ambient intelligence. During his tenure he was responsible for the design and creation of three pioneering laboratories and experiential centers: the Cognitive Environments Laboratory, the IBM Research THINK Lab and the IBM Watson Experience Center. Dr. Gil is a passionate advocate of collaborative research business models and is the creator and Founding Director of two research consortia: the IBM Research Frontiers Institute and the Smarter Energy Research Institute. An expert in the field of nanofabrication, he led the team that built the world's first microprocessor with immersion lithography in 2004. Dr. Gil is a frequent speaker at business events, conferences (including TED), universities, research institutions and foundations. His research results have appeared in over 20 international journals and conferences and he is the author of numerous patents. Dr. Gil is a member of the Future Trends Forum, the Industrial Advisory Group of the Institute of Photonic Sciences, and an elected member of the IBM Academy of Technology. He received his Ph.D. in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology.