Science of Runtime Bloat
The SORBET project investigates problems caused by pervasive runtime bloat in real, framework-intensive Java applications.
OverviewA program is bloated when execution time and memory consumption is high compared to what the program actually accomplishes. Based on our 8 years of experience solving performance problems, we have found that runtime bloat in real Java applications is pervasive. The following are typical scenarios:
Bloat is a systemic problem, resulting from a software engineering culture encourages abstraction and layering, with a goal of rapid application development. Java programs today are assembled from many frameworks and libraries, written by different people, at different times and places. Because of the huge number of interfaces that must be digested, and the opacity of the implementations hidden in libraries, developers have little hope of understanding the performance consequences of their design choices.
We are exploring the following questions:
Characterizing BloatThe first step is to understand the nature and magnitude of the bloat problem. We have studied traces and memory snapshots from many real applications to capture the common anti-patterns of bloat. These anti-patterns are useful for programmers, tool designers, and compiler writers. The ultimate goal is to be able to discover these anti-patterns automatically and to optimize them away.
We have also defined application-neutral metrics to quantify both performance and memory bloat. Performance bloat metrics are based on the number and kinds of data transformations in an execution. For memory bloat, we have defined the notion of a "health signature", which summarizes memory usage based on purpose. (See OOPSLA-2007.)
Dynamic Memory AnalysisJava heaps have been getting bigger with time, and it is not unusual to see heaps with 10's of millions of long-lived objects. Common long-lived data structures include various caches, pools, and user sessions. Programmers typically get into trouble sizing them, understanding per-entry costs, and managing the lifetimes of these structures. Inefficient and erroneous designs lead to memory leaks and poor scalability.
We have developed technology for summarizing Java heap snapshots to help the programmer understand the consequences of various design decisions. First, we tease apart the heap into separate domains of functionality, that we call data structures. Second, for each data structure, we recover a UML-data model, and annotate it with memory costs. Third, for each entity in the data model, we show how it is implemented using a type graph.