My core area of work is information management in distributed systems. Challenges in information management lie in processing large amount of dynamic data. For faster processing of dynamic data we can harness large amount of distributed processing and/or use efficient algorithms to reduce processing and communication requirements. Since joining IBM Research in June 2000, I have worked on both the aspects of distributed processing. Specifically, my work includes:
- BRIDGE: This project is aimed at discoverying and managing metadata for a big data system. Specifically, we define the entity based relationship between big data assets as fraction of common entities between two assets. This type of relationship helps in identifying data assets for an analytics need.
- Massive scale analytics: Designed and developed tooling using which big data ETL and analytics flows can be written in declarative manner, and executed over Hadoop based systems, e.g. IBM BigInsights. Aim of this tooling has been to make job of the application developer easier by providing/enabling better meta-data management, debugging, and application maintenance of big data applications. I also worked on enabling massive scale OLAP analytics of archived data at enterprises. We developed Hadoop based systems to archive the warehouse data and get business insights from that.
- Efficient execution of continuous queries over distributed data sources: In this work we modeled dynamics of data items which helped estimating query execution cost in terms of processing and/or networking cost. The data dynamics model was used to develop algorithms so that number of messages required to satisfy client QoS guarantees in continuous query executions can be minimized. Client specifies one or more of incoherency bound and threshold as query parameters. The client wants value of the query only if its incoherency bound is about to get violated or some threshold is likely to cross. This aspect is used to reduce the number of data refresh messages to be sent to the client. We considered various SQL aggregation queries such as SUM, AVG, MIN, MAX, weighted SUM, portfolio queries, ratio queries etc.
- Automation in service desk: We worked on automation of various components of incident management workflow in service desk. Service desk tickets have a combination of structured and unstructured information. Further, related information is also available in other operation and maintenance products. The related information includes CPU/memory utilization, log events, other tickets, etc. We developed methods to link unstructured, structured and streaming information from various sources and present it to the service desk personnel in a single portal.
- Simulating networking protocols: My other area of interest has been modeling and simulation of networking layers and protocols. I modeled Bluetooth physical layers, TCP/IP over 802.11 and DiffServ techniques as part of this work. Using simulated Bluetooth physical layer we analyzed capacity of Bluetooth systems in various scenarios.
PhD from Indian Institute of Technology (IIT) Bombay in Computer Science and Engineering.
Matser of Technology ( M Tech) from IIT Delhi in Communication Engineering.
Bachelor of Technology (B Tech) from IIT Kharagpur in Electronics and Electrical Communications Engineering.
Local arrangement chair for SRDS 2010, PC Member APWeb 2009, CACS 2010 Publicity chair COMAD 2008, DASFAA 2009 (Helped organizing the conference in Delhi)
Reviewed papers for VLDB 2008-09, WWW 2005-09, ICDE 2008-10, Transactions on Mobile Computing (TMC) 2010, TKDE 2009, ICDE2011