photoRania Khalaf photo
Nirmal K. Mukhi photo photo

More Information

Research Areas

Group Name

Service Integration and Analytics Group

Business Process Insight (BPI)

BPI is a Software as a service (SaaS) enabled, collaborative system and approach for insight into semi-structured processes. It contains a process intelligence toolset bringing together techniques from multiple disciplines: business intelligence, business activity monitoring, complex event processing, BPM and process mining.


Overview Video
BPI Life-Cycle

Overview Video

Video Download Link (QuickTime Player Required)


Traditional business process management (BPM) has focused on completely structured processes running under the purview of a BPM platform. A large number of business processes, however, execute in the wild, outside of the control of a process management platform. Business processes in reality span a spectrum of functionality from fully structured (e.g.: Business Process Execution Language processes) to loosely-structured (e.g.: Case Management) to completely ad-hoc (e.g. phone/e-mail/fax). While those areas have traditionally not been considered by BPM, we (and others [1]) see this is part of a spectrum. We focus on enabling insight into process behavior for the more flexible end of this spectrum. In particular, we work with end-to-end or semi-structured processes: a semi structured process is a single process from the perspective of the business but whose execution is not coordinated by a single entity (such as a workflow management system) and that may cross enterprise boundaries. A semi-structured process is executed by loosely coupled, heterogeneous, distributed systems and may contain human interactions and decisions.

Tracking and analyzing the execution of such processes is essential for understanding process behavior and its evolution, increasing the effectiveness of business operations, and managing operational risk. This is a complex endeavor because of the flexibility and distributed nature of semi-structured processes. There is currently no mechanism by which to manage these processes holistically with an end-to-end view. Furthermore, a business owner’s assumption of what is happening could be very different from what is actually happening. Discovery of process behavior is currently done either through manual interviews or process mining techniques [2]. The newly formed IEEE Process Mining Task Force is galvanizing the community to make strides in identifying and addressing key challenges in this area, starting with the Process Mining Manifesto [3].

The current approach for managing end to end processes, using a BPM platform, requires substantial effort whose first steps are to explicitly model the process and deploy it in a BPM runtime that controls execution. This provides complete control, but has the drawback of resulting in a rigid model of execution that is unable to adapt or handle activities performed outside the BPM runtime. While this is an acceptable trade-off in some situations that require strict automation and sequencing, it becomes less so in others.

Our goal is to enable the management of semi-structured processes by providing improved semi-automated visibility into their behavior and improved runtime management of their execution by leveraging process intelligence and process-aware analytics. We discover the behavior of and subsequently provide live insight into the execution of end-to-end processes synthesized from a set of historical and a stream of observed events belonging to these processes. Each instance of a business process is related to a subset of the events that form an end-to-end execution trace for that instance. Runtime management is improved through root cause analysis through a collaborative exploration of discovered behavior and continuous monitoring and alerting through built-in process-aware analytics.

[1] S. Kelmsley.: It’s Not About BPM vs. ACM, It’s About A Spectrum Of Process Functionality.
[2] W.M.P van der Aalst, et al. . Business process mining: An industrial application. Journal of Information Systems 32(5), Elsevier, 2007.
[3] W.M.P. van der Aalst et al.:Process Mining Manifesto. Business Process Management Workshops (1) 2011, pp.169-194, 2011.

BPI Life-Cycle


The data from source systems is integrated and processed in either a historical or live manner. The first path is to conduct analytics on historical data. For this purpose data is usually loaded in batches (from databases, log-files, etc.). It may also be integrated continously, such as by tapping into message queues used by the semi-structured process.

The data is persisted in its raw form in elastic cloud storage. The design behind our approach is to gather as much data as possible for two reasons: first, the knowledge required about the value of the data may not exist at this point; second, users may not know at this stage specifically what they are going to look for in the analytics stage. Following the principle: “store everything and discover later,” the source events and artifacts (including e-mails, documents, etc.) are stored in their original form.

The data sources produce events that represent activities or resources associated with processes. Such events can be in different formats (XML, PDF, JSON, CSV, etc) and with various structures (XSD, column semantics of CSV files, etc). Furthermore, the data sources are constantly subject to change. Changes may occur when IT systems are replaced, when data structures are improved, errors are fixed or new components are introduced that add additional data. Connecting systems directly with the source is therefore rarely an alternative as every change is accompanied with large integration efforts. Therefore, data integration creates an abstraction layer over source events in order to have a stable representation which can be used by applications at higher layers.

Once data integration has been defined and configured, a correlation discovery algorithm is applied to detect various dimensions of relationships between events. This discovery algorithm takes events from the storage component and determines correlations by calculating a unique combination of statistics on attributes. Its output consists of correlation rules that express how certain events are related. Correlations can either isolate process instances (e.g. an Order Process) or dimensions (e.g. by Customer, by Product). A correlation engine uses the discovered correlation rules to either group related events together or create a graph of relationships by connecting events through their shared dimensional relationships. Based on the user’s interest, the dimensions of relationships can now be extracted through queries and used for further analytics. In case the dimension of relationship that has been sliced out is a process, the resulting linked data would represent the behavior of each process instance in the form of process traces.

Process traces can now be used to learn different models, such as mining process models to explore aggregate behavior or training predictive models to make live predictions on possible future behavior. Process models provide an understanding of how semi-structured processes are executed, which is particularly useful for identifying best-practices or process pain points.

In the live part of BPI’s life-cycle, we assume that data integration and data extraction has already been deployed and configured, and new events are continuously integrated into the system and correlated by the correlation engine. Correlation discovery can still be applied to detect potential changes, such as newly introducted event types. Predictive models, such as decision trees, can be trained on historical data and applied to the live path to enable insight and feedback on running process instances. This includes predicting the likelihood of tasks in a running instance, triggering alerts, and injecting actions into source systems in order to seize opportunities or trigger counter-measures to avoid risks.