Science of Runtime Bloat       

links

 Nick M. Mitchell photo

Science of Runtime Bloat - overview


The SORBET project investigates problems caused by pervasive runtime bloat in real, framework-intensive Java applications.

Overview

A program is bloated when execution time and memory consumption is high compared to what the program actually accomplishes. Based on our 8 years of experience solving performance problems, we have found that runtime bloat in real Java applications is pervasive. The following are typical scenarios:
  • 1G of memory consumed to support a few hundred users.
  • 50K session state consumed per user.
  • 100K temporaries created per web hit.
  • Bloat is a systemic problem, resulting from a software engineering culture encourages abstraction and layering, with a goal of rapid application development. Java programs today are assembled from many frameworks and libraries, written by different people, at different times and places. Because of the huge number of interfaces that must be digested, and the opacity of the implementations hidden in libraries, developers have little hope of understanding the performance consequences of their design choices.

    We are exploring the following questions:
  • How can bloat be characterized and quantified? What are good application-independent metrics?
  • What are the common bloat patterns?
  • What are good tools for helping users identify bloat patterns in their code?
  • Can we improve compiler optimization to remove bloat?
  • Can we optimize the Java collection class library for storage?
  • Characterizing Bloat

    The first step is to understand the nature and magnitude of the bloat problem. We have studied traces and memory snapshots from many real applications to capture the common anti-patterns of bloat. These anti-patterns are useful for programmers, tool designers, and compiler writers. The ultimate goal is to be able to discover these anti-patterns automatically and to optimize them away.
    See (ECOOP-2006.)

    We have also defined application-neutral metrics to quantify both performance and memory bloat. Performance bloat metrics are based on the number and kinds of data transformations in an execution. For memory bloat, we have defined the notion of a "health signature", which summarizes memory usage based on purpose. (See OOPSLA-2007.)

    Dynamic Memory Analysis

    Java heaps have been getting bigger with time, and it is not unusual to see heaps with 10's of millions of long-lived objects. Common long-lived data structures include various caches, pools, and user sessions. Programmers typically get into trouble sizing them, understanding per-entry costs, and managing the lifetimes of these structures. Inefficient and erroneous designs lead to memory leaks and poor scalability.

    We have developed technology for summarizing Java heap snapshots to help the programmer understand the consequences of various design decisions. First, we tease apart the heap into separate domains of functionality, that we call data structures. Second, for each data structure, we recover a UML-data model, and annotate it with memory costs. Third, for each entity in the data model, we show how it is implemented using a type graph.

    Understanding Temporaries

    The creation of too many temporaries can cause performance bloat. Even with state-of-the-art garbage collection, programs with lots of temporary objects are problematic. Excessive temporaries lead to too many garbage collections, high object initialization costs, and memory bandwidth problems. We have developed techniques for combining static and dynamic analysis (blended analysis) for understanding temporary object creation. See (blended analysis.)

    Bloat Awareness Training

    We are giving tutorials on memory-aware Java programming at various conferences: ICSE 08 and 09, and OOPSLA 08.