Projects and Groups
My interest lies in reasoning under uncertainty for analysis and prediction over heterogeneous and relational data using probabilistic models. Tools and techniques that I use include Parametric and Non-parametric Bayesian models that allow incorporation of prior information and domain knowledge, and Probabilistic Graphical Models that use factorizations and graph structures for efficient learning and inference. I am also interested in the design of scalable inference and learning algorithms over large data. I have explored a wide range of applications, including entity resolution and deduplication in databases, word-sense disambiguation, text mining, natural language processing, sentiment analysis and opinion mining, social interaction analysis, service mining from legacy software and system log analysis.
|PhD in Computer Science||University of Maryland, College Park||12/2006|
|MS in Computer Science||University of Maryland, College Park||6/2004|
|BTech in Computer Science||Indian Institute of Technology, Kharagpur||6/1999|
9/2012 - current: Research Scientist at the Business Analytics and Mathematical Sciences Department at IBM India Research Lab, Bangalore.
6/2010 - 8/2012: Assistant Professor at the Department of Computer Science and Automation at the Indian Institue of Science, Bangalore.
4/2007 - 5/2010: Research Scientist at the Information Management group at IBM India Research Lab, New Delhi, working on pattern mining over large heterogeneous and noisy information sources. I investigated research challenges around information integration and data cleansing, clustering of heterogeneous data and cross-domain learning.
1/2003 - 02/2007: Research assistant at the Department of Computer Science, University of Maryland under Lise Getoor working on models and algorithms for collectively resolving references to real-world entities in structured and semi-structured domains, like bibliographic and natural language data. I have designed a relational clustering algorithm that takes domain relationships into account for iteratively clustering references into entities. In addition, I have proposed a probabilistic generative model that looks for hidden group structures among domain entities as evidence for resolving references. I have developed an efficient unsupervised inference algorithms for this model using Gibbs Sampling techniques. I have shown that both of these approaches improve performance over attribute baselines in multiple real-world and synthetic datasets. In addition to collective resolution over an entire database, I have investigated the problem of query-centric entity resolution. For the related problem of word sense disambiguation using multiple languages, I have developed generative models for bilingual corpora and have shown that they outperform existing sense disambiguation approaches in real datasets.
6/2002 - 12/2002: Research assistant at the Graphics Lab, University of Maryland, working on faster rendering techniques and compact representations for 3D models making use of local similarity in datasets.
6/2001 - 8/2001: Research intern at Virtio Corporation, Campbell, California, working on the design and implementation of a translator for Virtio's virtual prototyping language to SystemC.
6/1999 - 5/2000: Project officer at the Department of Computer Science and Engineering, IIT Kharagpur in the National Semiconductor Corporation funded "Virtual Silicon" project, working on verification techniques for the SDL-C prototyping language.
8/1998 - 5/1999: Undergraduate researcher at the Department of Computer Science and Engineering, IIT Kharagpur, working on "Similarity Retrieval from Image Databases" by rank-ordering images in a database with respect to the spatial and topological relations existing between objects. I focussed on developing fuzzy similarity measures to deal with uncertainty/vagueness in images.
- PROGRAM COMMITTEE CO-CHAIR, ACM I-KDD Conference on Data Sciences (CODS), 2015
- PEER REVIEWER FOR FACULTY MEMBER EVALUATION, IIT Bombay, 2014
- GUEST EDITOR for Journal Track of the European Conference on Machine Learning (ECML-PKDD), 2012- current
- SENIOR PROGRAM COMMITTEE MEMBER, International Joint Conference on Artificial Intelligence (IJCAI), 2013
- PROGRAM COMMITTEE MEMBER (since 2009)
- Conference on Very Large Databases (VLDB), Industrial Track, 2015
- European Conference on Machine Learning (ECML-PKDD), 2009-2015
- IEEE International Conference on Data Mining (ICDM), 2009-2015
- Annual Meeting of the Association for Computational Linguistics (ACL), 2012
- Conference on Artificial Intelligence (AAAI), 2011
- International Joint Conference on Artificial Intelligence (IJCAI), 2009
- International Conference on Machine Learning (ICML), 2009
- JOURNAL REVIEWER for Journal of Machine Learning Research, IEEE Transactions on Knowledge and Data Engineering, ACM Transactions on Database Systems, IEEE Transactions on Systems, Man, and Cybernetics, IEEE Transactions on Neural Networks, ACM Journal on Data and Information Quality, Pattern Analysis and Applications
- CO-ORGANIZER of Workshop on Collective Learning and Inference on Structured Data, in conjunction with European Conference on Machine Learning (ECML-PKDD), 2011
- CO-ORGANIZER of the Workshop on Text Mining in conjunction with the International Conference on Pattern Recognition and Machine Intelligence(PReMI), 2009
- THESIS COMMITTEE MEMBER
– Ajay Nagesh, PhD, IIT Bombay and Monash University, Australia, June 2015.
– Varish Mulwad, PhD, University of Maryland, Baltimore County, USA, January 2015.
– S. Shivashankar, MSc, IIT Madras, October, 2011.
– Rahul Gupta, PhD, School of Information Technology, IIT Bombay, March 2011.