IBM Programming Languages Day - PL Day 2012
The 2012 Programming Languages Day will be held at the IBM Thomas J. Watson Research Center on Thursday, June 28, 2012. The day will be held in cooperation with the New Jersey and New England Programming Languages and Systems Seminars. The main goal of the event is to increase awareness of each other's work, and to encourage interaction and collaboration.
You are welcome from 9AM onwards, and the keynote presentation will start at 10AM sharp. We expect the program to run until 4PM. Programming Languages day will be held in the Auditorium (room GN-F15) in the Hawthorne-1 building of the Thomas J. Watson Research Center at 19 Skyline Drive, Hawthorne, New York 10538.
If you plan to attend the Programming Languages Day, please register by sending an e-mail with your name, affiliation, and contact information to firstname.lastname@example.org. It is important that everyone register in advance, since security regulations at Watson require us to provide a list of attendees in advance; also, registering enables us to plan for lunch and refreshments to be available.
If you plan to attend, please let me know ideally by JUNE 8, but no later than JUNE 14 (2 weeks before the event).
- Patrick Cousot, Ecole Normale Superieure and New York University
- Julian Dolby, IBM Thomas J. Watson Research Center
- John Field, Google
- Martin Rinard, Massachusetts Institute of Technology
- Sign-in, Welcome
- 10:00-11:00 Keynote
- 11:15-12:30 Concurrency
- Stephen Freund
- Cooperative Concurrency for a Multicore World
- Y. Annie Liu
- From Clarity to Efficiency for Distributed Algorithms
- Sebastien Mondet, Ashish Agarwal, Paul Scheid, Aviv Madar, Richard Bonneau, Jane Carlton, Kristin C. Gunsalus
- Managing and Analyzing Big-Data in Genomics
- 1:40-2:55 Program Analysis and Verification
- J. Ian Johnson
- Designing Precise Higher-Order Pushdown Flow Analyses
- Eric Koskinen
- Temporal property verification as a program analysis task
- Gregory Malecha, Adam Chlipala, Patrick Hulin, and Edward Yang
- A Framework for Verifying Low-level Programs
- 3:10-4:00 Runtime Issues
- Kangkook Jee
- A General Approach for Efficiently Accelerating Software-based Dynamic Data Flow Tracking on Commodity Hardware
- Jose Castanos, David Edelsohn, Kazuaki Ishizaki,Toshio Nakatani, Takeshi Ogasawara, Priya Nagpurkar, Peng Wu
- On the Benefits and Pitfalls of Extending a Statically Typed Language JIT Compiler for Dynamic Scripting Languages
Joint work with Arjun Guha, Joe Gibbs Politz, Ben Lerner, Claudiu Saftoiu, Matt Carroll, and Hannah Quay-de la Vallee.
Cooperative Concurrency for a Multicore World
Multithreaded programs are notoriously prone to unintended interference between concurrent threads. To address this problem, we argue that yield annotations in the source code should document all thread interference, and we present a type system for verifying the absence of undocumented interference. Well-typed programs behave as if context switches occur only at yield annotations. Thus, they can be understood using intuitive sequential reasoning, except where yield annotations remind the programmer to account for thread interference.
Experimental results show that our type system for yield annotations is more precise at capturing thread interference than prior techniques based on method-level atomicity specifications and reduces the number of interference points a programmer must reason about by an order of magnitude. The type system is also more precise than prior methods targeting race freedom. In addition, yield annotations highlight all known concurrency defects in the Java programs studied.
This is joint work with Cormac Flanagan, Tim Disney, Caitlin Sadowski, and Jaeheon Yi at UC Santa Cruz.Y. Annie Liu
From Clarity to Efficiency for Distributed Algorithms
This talk presents a very high-level language for clear description of distributed algorithms, compilation to executable code, and optimizations necessary for generating efficient implementations. The language supports high-level control flows where complex synchronization conditions can be expressed using high-level queries, especially logic quantifications, over message history sequences. Unfortunately, these programs would be extremely inefficient, including consuming unbounded memory, if executed straightforwardly.
We present new optimizations that automatically transform complex synchronization conditions into incremental updates of necessary auxiliary values as messages are sent and received. The core of the optimizations is the first general method for transforming logic quantifications into efficient implementations. We have developed an operational semantics of the language, implemented a prototype of the compiler and the optimizations, and successfully used the language and implementation on a variety of important distributed algorithms, including multi-Paxos for distributed consensus.
Our high-level specification and optimization method also helped us discover improvements to some of the algorithms, both for correctness and for optimizations. Our language has also been used by undergraduate and graduate students to easily implement a variety of distributed algorithms used in distributed file sharing, routing, etc., including Chord, Kademlia, Pastry, and Tapestry, AODV, and parts of HDFS and Upright.
(Joined work with Scott Stoller, Bo Lin, and Michael Gorbovitski)Sebastien Mondet, Ashish Agarwal, Paul Scheid, Aviv Madar, Richard Bonneau, Jane Carlton, Kristin C. Gunsalus
Managing and Analyzing Big-Data in Genomics
Biology is an increasingly computational discipline. Rapid advances in experimental techniques, especially DNA sequencing, are generating data at exponentially increasing rates. Aside from the algorithmic challenges this poses, researchers must manage large volumes and innumerable varieties of data, run computational jobs on an HPC cluster, and track the inputs/outputs of the numerous computational tools they employ. Here we describe a software stack fully implemented in OCaml that operates the Genomics Core Facility at NYUﾕs Center for Genomics and Systems Biology.
We define a domain specific language (DSL) that allows us to easily describe the data we need to track. More importantly, the DSL approach provides us with code generation capabilities. From a single description, we generate PostgreSQL schema definitions, OCaml bindings to the database, and web pages and forms for end-users to interact with the database. Strong type safety is provided at each stage. Database bindings check properties not expressible in SQL, and web pages, forms, and links are validated at compile time by the Ocsigen framework. Since the entire stack depends on this single data description, rapid updates are easy; the compiler informs us of all necessary changes.
The application launches compute intensive jobs on a high-performance compute (HPC) cluster, requiring consideration of concurrency and fault-tolerance. We have implemented what we call a ﾒflowﾓ monad that combines error and thread monads. Errors are modeled with polymorphic variants, which get arranged automatically into a hierarchical structure from lower level system calls to high level functions. The net result is extremely precise information in the case of any failures and reasonably straightforward concurrency management.
1:40-2:55 Program Analysis and Verification
J. Ian Johnson
Designing Precise Higher-Order Pushdown Flow Analyses
Formalisms for context-free approaches to higher-order control-flow analysis are complex and require significant effort to prove correct. However, these approaches are enticing because they provide improved precision over finite state approaches. We present a new method for deriving context-free analyses that results in �obviously correct� formalisms that consists of making small changes to the original concrete semantics. We validate this method by using it to derive existing context-free analyses from abstract machines. We further exercise the technique by applying it to abstract machines that more closely represent real language implementations and consequently derive analyses more precise than existing ones. Specifically, we use an escape analysis to derive better stack allocation, and use garbage collection to derive better heap allocation. We also present a novel semantics for call/cc that, when turned into an analysis, handles non-escaping continuations more precisely than prior treatments of first-class control.Eric Koskinen
Temporal property verification as a program analysis task
We describe a reduction from temporal property verification to a program analysis problem. We produce an encoding which, with the use of recursion and nondeterminism, enables off-the-shelf program analysis tools to naturally perform the reasoning necessary for proving temporal properties (e.g. backtracking, eventuality checking, tree counterexamples for branching-time properties, abstraction refinement, etc.). This reduction allows us to prove branching-time properties of programs, as well as trace-based properties such as those expressed in Linear Temporal Logic (LTL) when the reduction is combined with our recently described iterative symbolic determinization procedure. We demonstrate---using examples taken from the PostgreSQL database server, Apache web server, and Windows OS kernel---that our method can yield enormous performance improvements in comparison to known tools, allowing us to automatically prove properties of programs where we could not prove them before.Gregory Malecha, Adam Chlipala, Patrick Hulin, and Edward Yang
A Framework for Verifying Low-level Programs
High level languages provide abstractions that assist programmers; however these abstractions are not always suﬃcient and, in some cases, they get in the way of writing eﬃcient or functioning correct code. In this work we develop Bedrock2, a Coq framework for foundational reasoning about low-level pro- grams using a program logic based on Ni and Shao’s XCAP . Bedrock2 is an extension and rewrite of Chlipala’s Bedrock language . Bedrock2 allows programmers to deﬁne both control and data abstractions and integrate them into the system in a ﬁrst-class way. Control abstractions, e.g. conditionals and function calls, are deﬁned by providing a denotation into the core language and derived inference rules are veriﬁed in a foundational way with respect to the core language semantics. These inference rules are used by the veriﬁcation condition generator simplify the proof obligations provided to the programmer. Veriﬁcation conditions are expressed as pre- and post- conditions on execution traces allowing the bulk of the work to be done by symbolic evaluation. Unlike Bedrock, the Bedrock2 symbolic evaluator incorpo- rates user-deﬁned abstract predicates to enable eﬃcient manipulation of arrays, structures, stack frames, and other data abstractions. Final veriﬁcation condi- tions are discharged using a cancellation-based separation logic prover. Proofs are generated using computational reﬂection to enable good performance that, experiences suggest, will scale to larger programs than previous work. Bedrock2 embraces a more realistic machine model that exposes machine word sizes, byte ordering, and ﬁnite memory. We are working to extend the language to more interesting abstractions, real assembly languages, and concurrency.
3:10-4:00 Runtime Issues
A General Approach for Efficiently Accelerating Software-based Dynamic Data Flow Tracking on Commodity Hardware
Despite the demonstrated usefulness of dynamic data flow tracking (DDFT) techniques in a variety of security applications, the poor performance achieved by available prototypes prevents their widespread adoption and use in production systems. We present and evaluate a novel methodology for improving the performance overhead of DDFT frameworks, by combining static and dynamic analysis. Our intuition is to separate the program logic from the corresponding tracking logic, extracting the semantics of the latter and abstracting them using a Taint Flow Algebra. We then apply optimization techniques to eliminate redundant tracking logic and minimize interference with the target program. Our optimizations are directly applicable to binary-only software and do not require any high level semantics. Furthermore, they do not require additional resources to improve performance, neither do they restrict or remove functionality. Most importantly, our approach is orthogonal to optimizations devised in the past, and can deliver additive performance benefits. We extensively evaluate the correctness and impact of our optimizations, by augmenting a freely available high-performance DDFT framework, and applying it to multiple applications, including command line utilities, server applications, language runtimes, and web browsers. Our results show a speedup of DDFT by as much as 2:23x, with an average of 1:72x across all tested applications.Jose Castanos, David Edelsohn, Kazuaki Ishizaki,Toshio Nakatani, Takeshi Ogasawara, Priya Nagpurkar, Peng Wu
On the Benefits and Pitfalls of Extending a Statically Typed Language JIT Compiler for Dynamic Scripting Languages
Whenever the need to compile a new dynamically typed language arises, the option of reusing an existing statically typed language Just-In-Time (JIT) compiler (reusing JITs) always seems appealing. Existing reusing JITs, however, do not deliver the kind of performance boost as proponents have hoped. The performance of JVM languages, for instance, often lags behind standard interpreter implementations. Even more customized solutions that extend the internals of a JIT compiler for the target language cannot compete with those designed specifically for dynamically typed languages. Our own Fiorano JIT compiler is a living example of such a phenomenon. As a state-of-the-art reusing JIT compiler for Python, Fiorano JIT compiler outperforms two other reusing JITs (Unladen Swallow and Jython), but still has a noticeable performance gap with PyPy, the best performing Python JIT compiler today.
In this talk, we offer an in-depth look on the reusing JITs phenomenon. Based on our Fiorano JIT experience and evaluation of PyPy, Jython, Fiorano JIT, and Unalden-swallow JIT, we discuss techniques that have proved effective and quantify weakness of the reusing JIT approach. More importantly, we identify several common pitfalls of today's reusing JITs, the most important one of them is not focusing sufficiently on specialization, an abundant optimization opportunity unique to dynamically typed languages. These findings, though presented in the context of Python, are applicable to other scripting languages as well.