IBM Programming Languages Day - PL Day 2013


The 2013 Programming Languages Day (PL day) will be held at the IBM T.J. Watson Research Center on Monday, September 23, 2013. This is a one day event that features a keynote speaker, followed by several conference-style talks, as well as a poster session for closer interaction among the attendees. The event is open to all IBMers and we encourage local participation from professors and students from the North-East corridor.

PL day is held in cooperation with the New England and New Jersey Programming Languages and Systems Seminars. The main goal of the event is to increase awareness of each other's work, and to encourage interaction and collaboration.

Announcements

Program Committee

Call for Submissions

We welcome all topics related to programming languages and systems. Tutorials or joint presentations are welcome. We also solicit input on topics or particular presentations that would be of interest to attendees. Talks may vary in length from 15 to 60 minutes so that we may accommodate smaller results, preliminary work, progress reports, and tutorials.

If you would like to present your work, submit a title, abstract (about 300 words), and desired talk duration by August 30. Notification of accepted abstracts will be sent approximately by September 9.

Registration and Logistics

Please register before September 22 if you are planning to attend PL day.

Attendees are welcome to arrive at the IBM T.J. Watson Research Center in Yorktown Heights starting at 9AM. The keynote presentation will start promptly at 9:30AM.

Program

9:00-9:30 BREAKFAST
Keynote
9:30-10:30 Leo Meyerovich, UC Berkeley
The Sociology of Programming Language Adoption
10:30-10:45 BREAK
Session 1
10:45-12:05 Neng-Fa Zhou, CUNY Brooklyn College and GC
Scripting and Modeling in Picat

Vishakha Sharma, Stevens Institute of Technology
Language Design and Implementation for Computational Modeling, Simulation and Visualization

Mandana Vaziri, IBM Research
ActiveSheets: Visualizing Big Data in Stream Processing
12:05-1:15 LUNCH
Session 2
1:15-3:00 Annie Liu, Stony Brook University
High-Level Executable Specifications of Distributed Algorithms

Rashed Bhatti, IBM Research
InfoSphere Streams Processing Language (SPL) Extension for Heterogeneous Computing Architectures

Stephen Fink, IBM Research
Using a High-Level Language to Program an FPGA to Play Blokus Duo

Avraham Shinnar, IBM Research
Increased Performance for In-Memory Hadoop Jobs

Program Details

 

  • 9:30-10:30 Keynote

    The Sociology of Programming Language Adoption [slides]
    Leo Meyerovich, UC Berkeley
    Why do some programming languages succeed and others fail?
    What happens when we stop thinking of adoption as a result and start considering it as a process or even as a resource?

    This talk will explore three directions we have taken to understand the nature of programming language adoption. First, from genetic corn to safe sex to telephones, we found many connections to adoption studies performed by social scientists. Second, we surveyed thousands of programmers and mined hundreds of thousands of open source projects in order to analyze language adoption in practice. We found several surprises, such as what three factors matter the most. Finally, as time permits, I'll describe how understanding these ideas changed how I design my own languages.

    This is joint work with Ariel S. Rabkin, Princeton.


    10:45-11:15

    Scripting and Modeling in Picat [slides]
    Neng-Fa Zhou, CUNY Brooklyn College and GC

    Picat (picat-lang.org) is a simple, and yet powerful, logic-based multi-paradigm programming language aimed for general-purpose applications. Picat is a rule-based language, in which predicates, functions, and actors are defined with pattern-matching rules. Picat incorporates many declarative language features for better productivity of software development, including explicit non-determinism, explicit unification, functions, list comprehensions, constraints, and tabling. Picat also provides imperative language constructs, such as assignments and loops, for programming everyday things. The Picat implementation, which is based on a well-designed virtual machine and incorporates a memory manager that garbage-collects and expands the stacks and data areas when needed, is efficient and scalable. Picat can be used for not only symbolic computations, which is a traditional application domain of declarative languages, but also for scripting and modeling tasks.

    Picat offers many advantages over other languages. Compared with functional and scripting languages, the support of explicit unification, explicit non-determinism, tabling, and constraints makes Picat more suitable for symbolic computations. Compared with Prolog, Picat is arguably more expressive and scalable: it is not rare to find problems for which Picat requires an order of magnitude fewer lines of code to describe than Prolog and Picat can be significantly faster than Prolog because pattern-matching facilitates indexing of rules.

    In this talk, I'll present the features of Picat through examples from scripting, dynamic programming, planning, and constraint optimization with CP, SAT, and MIP.

    This is joint work with Jonathan Fruhman and Hakan Kjellerstrand


  • 11:15-11:45

    Language Design and Implementation for Computational Modeling, Simulation and Visualization [slides]
    Vishakha Sharma, Stevens Institute of Technology, NJ

    We design BioScape, a high-level modeling language for the stochastic simulation of biological and biomaterials processes in a reactive environment in 3D space. BioScape is based on the Stochastic Pi-Calculus, and it is motivated by the need for individual-based, continuous motion, and continuous space simulation in modeling complex bacteria-materials interactions. The novel aspects of BioScape include high-level syntactic primitives to declare the scope in space where species can move, diffusion rate, shape, and reaction distance, and an operational semantics that deals with the specifics of 3D locations, verifying reaction distance, and featuring random movement. We define a translation from the non-stochastic fragment of BioScape to a low-level pi-calculus with 3D primitives (3pi) and prove its soundness with respect to the operational semantics.

    In order to aid the development of optimal bifunctional surfaces, we build a three dimensional computational model using BioScape. The resulting model is able to simulate varying configurations of surface coatings at a fraction of the time necessary to perform in-vitro experiments. The output of the model not only plots populations over time, but it also produces 3D-rendered videos of bacteria-surface interactions enhancing the visualization of the system's behavior.

    We extend BioScape with a fully parallel semantics. In order to model larger and more realistic systems, a semantics that may take advantage of the new multi-core and GPU architectures motivates the introduction of Parallel BioScape.

    We present an efficient computational framework to study the large-scale bacteria-materials interactions via BioScape enabled by the massively parallel processing capability of the GPUs.

    We define BioScape^L, an extension of BioScape with abstract locations. The motivation for such an extension comes from the need to describe systems whose behavior depends on geometric information and dynamic spatial arrangements of their entities, such as in assembly of polymers, oligomers, and complexes.


  • 11:45-12:05

    ActiveSheets: Visualizing Big Data in Stream Processing
    Mandata Vaziri, IBM Research

    Stream processing is a computing paradigm that enables continuous analysis of massive volumes of data. We propose using a spreadsheet as a graphical user interface for visualizing and editing live streaming data. Spreadsheets are an easy platform that non-programmers can use to visualize and analyze data, and they have proven very useful in this context.

    This talk will demonstrate a new integrated capability called "ActiveSheets" that brings the benefits of spreadsheet programming to the development of business analytics applications. ActiveSheets allows a user to debug and understand streaming code more easily, visualize executions, as well as edit the underlying code. They would thus act as sensors and actuators in a streaming program, providing a unique graphical user interface in which code can be edited in the same place where executions of that code are visualized.


  • 1:15-1:45

    High-Level Executable Specifications of Distributed Algorithms [slides]
    Annie Liu, Stony Brook University

    This talk describes a method for specifying complex distributed algorithms at a very high yet executable level, focusing in particular on general principles for making properties and invariants explicit while keeping the control flow clear. This is critical for understanding the algorithms and proving their correctness. It is also critical for generating efficient implementations using invariant-preserving transformations, ensuring the correctness of the optimizations.

    We have studied and experimented with a variety of important distributed algorithms, including well-known difficult variants of Paxos, by specifying them in a very high-level language with an operational semantics. In the specifications that resulted from following our method, critical properties and invariants are explicit, making the algorithms easier to understand and verify. Indeed, this helped us discover improvements to some of the algorithms, for correctness and for optimizations.

    A paper on this topic appeared in SSS 12 and we are continuing to make progress in this area. Joint work with Scott Stoller and Bo Lin.


  • 1:45-2:15

    InfoSphere Streams Processing Language (SPL) Extension for Heterogeneous Computing Architectures
    Rashed Bhatti, IBM Research

    IBM InfoSphere Streams is an advanced distributed application development platform for very high throughput, real time, Big Data solutions. It provides multi-discipline based complex analytics over heterogeneous and unstructured data acquired in the from of text, images, audio, voice, VoIP, video, web traffic, email, GPS data, financial transaction data, satellite data, and sensors. A developer may use its easy-to-use application development language called Streams Processing Language (SPL) for rapid application development. The core of this platform is its highly optimized distributed runtime, which automatically parallelizes the composed operators within an application and distribute them over multiple CPUs nodes within a cluster. Yet in some applications related to life sciences, fluid dynamics, finance, and anomaly detection, there are certain compute intensive operators, which cannot be distributed over multiple nodes. Modern heterogeneous architectures solve this, by providing natively available, massively parallel computing resources to accelerate such applications. With the forever-scaling silicon technologies, computational resources like GPUs, FPGAs, SOCs, SPU/Es, and ASICs, have become more prevalent and ubiquitous. Also these optional components are relatively cost efficient, extremely energy efficient, and significantly reduce the physical footprint of the hardware needed for a particular Big Data application. To take advantages of these modern heterogeneous hardware architectures IBM InfoSphere Streams research started an effort of hardware acceleration enablement of its programing environment. Streams SPL provides integration options for user-defined Java and C/C++ based native APIs. These ports are being used to establish bindings for languages like OpenCL and Liquid Metal Lime, for programing the GPUs or FPGA based platforms. On this IBM programing language day we would present our work in progress in this direction.


  • 2:15-2:35

    Using a High-Level Language to Program an FPGA to Play Blokus Duo [slides]
    Stephen Fink, IBM Research

    The vast majority of hardware designers employ design languages such as VHDL and Verilog, which provide a lower level of abstraction than languages used for software development. Using such tools, FPGA design requires much more time, effort, and expertise than programming equivalent functionality in software.

    To make hardware design easier, many projects have investigated hardware synthesis from higher-level languages. To evaluate whether or not a high-level synthesis is viable, we must answer a key question: Using a high-level language, can a developer design hardware whose quality matches hardware designed with standard tools?
    A design competition provides an attractive laboratory to test the question for a particular challenge. To this end, we have developed an entry for the 2013 ICFPT Blokus Duo Design Competition, built with the Liquid Metal system from IBM Research.

    The Liquid Metal system provides a high-level language and toolchain targeting heterogeneous systems which mix CPUs, GPUs, and FPGAs. Liquid Metal is based on Lime, a Java-like language enhanced with constructs which express parallelism and isolation.


  • 2:35-2:55

    Increased Performance for In-Memory Hadoop Jobs [slides] Avraham Shinnar, IBM Research

    Map Reduce is a widely adopted programming paradigm for distributed computing over massive data sets. Map Reduce engines such as Apache Hadoop are engineered to reliably execute Map Reduce computations over massive data sets on scale out clusters of thousands of nodes. However, standard Map Reduce engines are not designed to obtain optimal performance for pipelined Map Reduce computations that operate over moderate-sized data sets that can fit in cluster memory.

    To better address this important segment of Map Reduce computations, IBM Research has developed the M3R engine (Main Memory Map Reduce). The M3R engine trades fault tolerance for high performance, allowing Map Reduce applications to quickly produce results for data sets that can fit in the combined main memory of a cluster. The M3R engine includes a Hadoop adapter that enables existing Hadoop Map Reduce programs to be executed on the M3R engine unchanged and for individual executions of a program to be scheduled on M3R or Hadoop as appropriate.

    In this talk we will present an overview of the architecture and implementation of the M3R engine and describe some of the scenarios where M3R has been successfully applied.