Graph Analytics - overview
Graph Analytics research team focuses on two primary problems: first, to develop schema and data mapping algorithms from / to relational to graph platforms; second, to develop models, platforms, and mining algorithms for dynamic and versioned graph data.
Mapping Schema/Data to Graphs
Graphs are considered a natural representation of entities and relationships between them. They are used in a num- ber of domains, ranging from social networks to finance and from chemical interaction models to data center operations, and more.Despite this schema-free nature of graphs, more often
than not, they are constructed from multiple, well-curated
sources (typically relational tables) that have predefined schemata.For instance, consider a financial institution that wants to run several forms of graph analytics –for fraud detection, customer profiling, etc.– on the data they have, potentially scattered in different databases. We focus on the process of mapping the schema and data from structured sources to graph stores. The goal is to take attributes present in a relational schema and map them to attributes of a vertex or an edge in a graph schema, such that it enables efficient graph analytics. One of the challenges that we address here is to make the data migration as efficient as possible by using novel in-memory and cluster computing frameworks like Spark.
Dynamic and Versioned Graphs - Platforms and Analytics
Most large-scale graphs evolve rapidly due to addition/deletion of nodes and edges, due to changes to the properties associated with nodes (edges), or due to activation/deactivation of relationships between nodes (for e.g., in telephone call graphs). Apart from the challenges in managing rapid dynamics of the graph, in many analytics applications it is critically important to retain the entire audit-trail of changes to the graph, so that one can answer queries which pertain to the state of the graph at any given time-point or time-interval in the past. Such queries, called the time-travel queries are well studied in traditional database domain and are available in many commercial database systems through temporal extensions, but unfortunately, are not found in modern large-scale graph management systems. In this research, we are working on various resulting research challenges in dynamics and versioned graph storage, building these to work on top common graph platforms like JanusGraph (Titan), and develop scalable data mining algorithms.