Group Name

Stream Computing Platforms, Applications, and Analytics


Motivation

As the amount of data available to enterprises and other organizations dramatically increases, more and more companies are looking to turn this data into actionable information and knowledge. Addressing these requirements require systems and applications that enable efficient extraction of knowledge and information from potentially enormous volumes and varieties of continuous data streams.

System S

System S provides a programming model and an execution platform for user-developed applications that ingest, filter, analyze, and correlate potentially massive volumes of continuous data streams. It supports the composition of new applications in the form of stream processing graphs that can be created on the fly, mapped to a variety hardware configurations, and adapted as requests come and go. System S is designed to scale from systems that acquire, analyze, interpret, and organize continuous streams on a single processing node, to high performance clusters of hundreds of processing nodes. System S was designed to address the following data management platform objectives:

  • Parallel and high performance stream processing software platform capable of scaling over a range of hardware capability
  • Agile and automated reconfiguration in response to changing user objectives, available data, and the intrinsic variability of system resource availability
  • Incremental tasking in the face of rapidly changing data forms and types
  • Multi-user, secure, and auditable execution environment

The System S Stream Computing Platform.

The System S stream computing system is shown in the figure above. System S supports the programming model specified by the Streams Processing Language (SPL). Using SPL, users create applications in the form of dataflow graphs, to express their analysis needs. These applications consist of analytics operators interconnected by streams. These are then compiled into deployable components by the SPL optimizing compiler. The compiled executables are then deployed and managed by the distributed runtime. The runtime manages the requirements of newly submitted and already executing applications on a shared cluster of hosts. The runtime continually monitors the state and utilization of its computing resources and adapts to failures and resource changes.

Application Areas

System S technology enables construction of applications that can respond quickly to events and changing requirements, adapt rapidly to changing workloads, and continuously analyze real-time data at rates that are orders of magnitude greater than existing systems. Application areas for System S are very broad, including but not limited to telecommunications, financial services, fraud and anomaly detection, manufacturing, health monitoring, environmental monitoring and surveillance, scientific computing such as radio-astronomy, etc.

System Components

The System S platform comprises several components, including:

  • A Stream Computing Programming Language and Library: System S supports applications written using the Streams Processing Language (SPL).
  • A Distributed Runtime: System S runtime provides an execution substrate for streaming applications, which includes services such as high performance data transport, resource allocation and scheduling, advanced job management, high availability, and security.
  • Integrated Development Environment: System S provides an eclipse-based IDE for developing streaming applications using the SPL language. The development environment also includes support for interacting with the System S runtime via application launch capabilities and visualization of running jobs.
  • Configuration and Administration: System S includes web-based interfaces as well as command line tooling for configuring and administering System S instances in multi-user environments.

Programming Model

SPL is a language and a compiler for creating distributed data stream processing applications to be deployed on System S. SPL offers:

  • A language for flexible composition of parallel and distributed data-flow graphs, which can be used directly by programmers;
  • A toolkit of type-generic built-in stream processing operators, which include all basic stream-relational operators, as well as a number of plumbing operators (such as stream splitting, demultiplexing, etc.);
  • An extensible operator framework, which supports the addition of new type generic and configurable operators, including the possibility to wrap existing, possibly legacy analytics;
  • A broad range of edge adapters used to ingest data from outside sources and publish data to outside destinations, such as network sockets, databases, file systems, etc.