Signal Processing (discontinued) - IBM T.J. Watson NVM Workshop 2010 -- Workshop Program

Talks and panel will be in Yorktown room 20-001

Introduction to IBM Research, Chung-Sheng Li, IBM Research Director of Commercial Systems and Area Strategy for System Software
9.45-10.05 Overview of NVM Research at IBM, John Karidis, IBM Distinguished Engineer
10.10-10.35 Yoongu Kim, Carnegie Mellon University, "Exploiting differences in threads' memory access behavior to achieve both system throughput and fairness"
10.35-11.00 Vidyabhushan Mohan, University of Virginia, "Tools for Architecture-Level Design and Analysis of Flash Memory Systems"
11.00-11.25 Nak Hee Seong, Georgia Institute of Technology, "Security Refresh: Prevent Malicious Wear-out and Increase Durability for Phase-Change Memory with Dynamically Randomized Address Mapping"
11.25-11.50 Arya Mazumdar, University of Maryland, College Park, "Rank modulation codes for error correction in flash memories"
13.00-14.00 Meet Watson, our local Jeopardy! champion
14.00-14.25 Eitan Yaakobi, University of California, San-Diego, "Efficient WOM-Codes Constructions"
14.25-14.50 Dimitris Papailiopoulos, University of Southern California, "Repair of Storage Codes through Interference Alignment"
14.50-15.15 Joel Coburn, University of California, San-Diego, "NV-Heaps: Making Persistent Objects Fast and Safe with Next-Generation, Non-Volatile Memories"
15.15-15.40 Anirudh Badam, Princeton University, "SSDAlloc: Hybrid RAM/SSD Memory Management Made Easy"
15.40-16.05 Yue Wang, Texas A&M University. "Rank Modulation with Multiplicity"
16.05-16.20 Break
Panel Discussion

Slides link (requires IBM login)

Session 1

10.10-10.35 AM Nak Hee Seong, Georgia Institute of Technology

Title: Security Refresh: Prevent Malicious Wear-out and Increase Durability for Phase-Change Memory with Dynamically Randomized Address Mapping


Phase change memory (PCM) is an emerging memory technology for future computing systems. Compared to other non volatile memory alternatives, PCM is more matured to production, and has a faster read latency and potentially higher storage density. The main roadblock precluding PCM from being used, in particular, in the main memory hierarchy, is its limited write endurance. To address this issue, recent studies proposed to either reduce PCM's write frequency or use wear-leveling to evenly distribute writes. Although these techniques can extend the lifetime of PCM, most of them will not prevent deliberately designed malicious codes from wearing it out quickly. Furthermore, all the prior techniques did not consider the circumstances of a compromised OS and its security implication to the overall PCM design. A compromised OS will allow adversaries to manipulate processes and exploit side channels to accelerate wear-out.

In this paper, we argue that a PCM design not only has to consider normal wear-out under normal application behavior, most importantly, it must take the worst-case scenario into account with the presence of malicious exploits and a compromised OS to address the durability and security issues simultaneously. In this paper, we propose a novel, low-cost hardware mechanism called Security Refresh to avoid information leak by constantly migrating their physical locations inside the PCM, obfuscating the actual data placement from users and system software. It uses a dynamic randomized address mapping scheme that swaps data using random keys upon each refresh due. The hardware overhead is tiny without using any table. The best lifetime we can achieve under the worst-case malicious attack is more than six years. Also, our scheme incurs around 1% performance degradation for normal program operations.

10.35-11.00 AM Vidyabhushan Mohan, University of Virginia

Title: Tools for Architecture-Level Design and Analysis of Flash Memory Systems


Flash memory has gained tremendous popularity in recent years. A variant of flash, called NAND Flash, is widely used in consumer electronics products, such as cell-phones, music players, while NAND flash based Solid-State Disks (SSDs) are increasingly displacing hard disk drives as the primary storage device in laptops, desktops, and even data centers. Computer architects have recently begun exploring the use NAND flash, from SSD organizations to disk caches and new flash-based server architectures. In order to study this design space effectively, architects require simulation tools that can provide detailed insights into the behavior of flash memory. In this talk, I will present two such tools that I have developed - FlashPower and FENCE. I will also provide an outline of how these tools will be used to develop a full-fledged NAND flash memory simulation environment.

FlashPower is a microarchitecture level modeling tool that provides a detailed analytical power model for Single-Level Cell (SLC) based NAND flash. FlashPower is built off the CACTI memory modeling tool and estimates the power consumed by a NAND flash chip during its various operating modes. Flash EnduraNCE (FENCE) models the endurance characteristics of NAND flash and captures the impact of stress and recovery on NAND flash memory cells. Using FENCE and a set of real enterprise workloads and a realistic SSD model, we show that the recovery process allows for orders of magnitude higher endurance than those given in datasheets. I will also present an overview of how FlashPower and FENCE are being used to facilitate a comprehensive architecture level analysis of performance, power and reliability of NAND flash memory.

11.00-11.25 AM Yoongu Kim, Carnegie Mellon University

Title: Exploiting differences in threads' memory access behavior to achieve both system throughput and fairness.


In a modern chip-multiprocessor system, memory is a shared resource among multiple concurrently executing threads. The memory scheduling algorithm should resolve memory contention by arbitrating memory access in such a way that

competing threads progress at a relatively fast and even pace, resulting in high system throughput and fairness. Previously proposed memory scheduling algorithms are predominantly optimized for only one of these objectives and do not provide both high system throughput and high fairness simultaneously.

We present a new memory scheduling algorithm that decouples the two objectives and addresses them separately in order to achieve the best of both worlds. The main idea is to divide threads into two separate clusters and employ different memory request scheduling policies to each cluster. We dynamically group threads with similar memory access behavior into either the latency-sensitive (memory-non-intensive) or the bandwidth-sensitive (memory-intensive) cluster. We propose to 1) prioritize the latency-sensitive cluster over the bandwidth-sensitive cluster to improve system throughput; 2) we introduce a ``niceness'' metric that captures a thread's propensity to interfere with other threads; 3) we use niceness to periodically shuffle the priority order of the threads in the bandwidth-sensitive cluster to provide fair access to each thread in a way that reduces inter-thread interference. On the one hand, prioritizing memory-non-intensive threads significantly improves system throughput without degrading fairness, because such ``light'' threads only use a small fraction of the total available memory bandwidth. On the other hand, shuffling the priority order of memory-intensive threads improves system fairness because it ensures no thread is disproportionately slowed down or starved.

11.25-11.50 AM Arya Mazumdar, University of Maryland, College Park

Title: Rank modulation codes for error correction in flash memories


Codes for rank modulation have been recently proposed as a means of protecting flash memory devices from errors. We provide a brief overview of the rank modulation scheme. We study basic coding theoretic problems for such codes, representing them as subsets of the set of permutations of n elements equipped with the Kendall tau distance. We derive several lower and upper bounds on the size of codes. These bounds enable us to establish the exact scaling of the size of optimal codes for large values of n. We also show the existence of codes whose size is within a constant factor of the sphere packing bound for any fixed number of errors. Construction of 'good' rank modulation codes of arbitrary lengths and given error correction capability remains a challenge.

Session 2

14.00-14.25 PM Eitan Yaakobi, University of California, San-Diego

Title: Efficient WOM-Codes Constructions

A Write Once Memory (WOM) is a storage device that consists of cells that can take on q possible linearly-ordered values, with the added constraint that rewrites can only increase a cell's value. In the binary case, each cell can change from the level zero to the level one only once. Examples of WOMs include punch cards, optical disks, and more recently flash memories. A length-n, t-write WOM-code is a coding scheme that allows t messages to be stored in n cells. If Mi messages can be stored in the i-th write, then the rate of the i-th write is the ratio of the number of bits written to the WOM to the total number of cells used, i.e., log2(Mi) = n. The rate of the WOM-code is the sum of all individual rates in all writes.

In this work, we start with a construction of binary two-write WOM-codes. The construction is generalized for two-write WOM-codes with q levels per cell. Then, we show how to use such a code with ternary cells in order to construct three and four-write binary WOM-codes. This construction is used recursively in order to generate a family of t-write binary WOM-codes for all t. Another generalized construction is given which provides us with more ways to construct families of WOM-codes. Finally, we give a comparison between our codes and the best known WOM-codes in order to show that the WOM-codes constructed here outperform all previously known WOM-codes for 3 < t < 11.

14.25-14.50 PM Dimitris Papailiopoulos, University of Southern California

Title: Repair of Storage Codes through Interference Alignment


Storage systems often introduce redundancy to increase reliability. When coding is used, the repair problem arises: if a storage unit of the system fails, we need to exactly restore the encoded information at a new node. To restore the lost information, the new node has to download a combination of the remaining data and appropriately process it. The size of this download defines the repair bandwidth. To minimize this bandwidth both the storage code and the repair process have to be carefully selected. What makes the design challenging is the presence of undesired data in the downloaded repair information that interfere with the desired contents. Interestingly, we find surprising connections with interference alignment techniques and use them to achieve information theoretic minimum repair bandwidth bounds.

14.50-15.15 PM Joel Coburn, University of California, San-Diego

Title: NV-Heaps: Making Persistent Objects Fast and Safe with Next-Generation, Non-Volatile Memories


Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase change memory, will remove this constraint and allow programmers to build high-performance, persistent structures in non-volatile storage that is almost as fast as DRAM. However, existing persistent object systems are ill-suited to these memories because the assumption that storage is slow drives many aspects of their design. Creating structures that are flexible and robust in the face of application and system failure, while minimizing software overheads, is challenging. The system must be lightweight enough to expose the performance of the underlying memories, but it also must avoid familiar bugs such as dangling pointers, multiple free()s, and locking errors in addition to unique types of hard-to-find pointer safety bugs that only arise with persistent objects. These bugs are especially dangerous since any corruption they cause will be permanent.

We have implemented a lightweight, high-performance persistent object system called NV-heaps that prevents these errors and provides a model for persistence that is easy to use and reason about. We implement search trees, hash tables, sparse graphs, and arrays using NV-heaps, BerkeleyDB, and Stasis. Our results show that NV-heap performance scales with thread count and that data structures implemented using NV-heaps out-perform BerkeleyDB and Stasis implementations by 32X and 244X, respectively, when running on the same memory technology. We also quantify the cost of enforcing the safety guarantees that NV-heaps provides and measure the costs for NV-heap primitive operations.

15.15-15.40 PM Anirudh Badam, Princeton University

Title: SSDAlloc: Hybrid RAM/SSD Memory Management Made Easy


Non-volatile memory in the form of flash memory has challenged system designers at many points during the evolution of virtual memory and storage hierarchy technology. While alternative memory technologies have been championed for more than a decade, their attractiveness has increased as the gap between processor speed and disk widened, and as their costs dropped, spurring research activity in this area. Transparent approaches to using nonvolatile memory include integrating it into virtual memory or integrating it into the storage hierarchy taking a performance hit from either the paging or/and the storage sub-systems. Others have proposed redesigning applications to use flash-aware data structures for performance which requires considerable amount of programmer effort and assumes expertise with non-volatile memory. Our goal in this paper is to use as few modifications as possible to provide a nearly-transparent path to enable applications to use flash memory, providing ease of migration and usage, high performance, and when desired, simple ACID transactions.

We introduce SSDAlloc, a hybrid main memory management system that allows developers to treat solid-state disk (SSD) as an extension of the RAM in a system. SSDAlloc moves the SSD upward in the memory hierarchy, usable as a larger, slower form of RAM instead of just a cache for the hard drive. Using SSDAlloc, applications can nearly transparently extend their memory footprints to hundreds of gigabytes and beyond without restructuring, well beyond the RAM capacities of most servers. Additionally, application performance is typically an order of magnitude higher than when using SSD as swap space or disk cache, and 200 times more than using traditional hard drives. Transactional applications can use SSDAlloc's framework for full ACID transaction support at low overhead.

15.40-16.05 PM Yue Wang, Texas A&M University

Title: Rank Modulation with Multiplicity


Rank modulation is a scheme that uses the relative order of cell levels to represent data. Its applications include flash memories, phase-change memories, etc. An extension of rank modulation is studied in this paper, where multiple cells can have the same rank. We focus on the rewriting of data based on this new scheme, and study its basic properties.