Research InterestMy interest lies in reasoning under uncertainty for analysis and prediction over heterogeneous and relational data using probabilistic models. Tools and techniques that I use include Parametric and Non-parametric Bayesian models that allow incorporation of prior information and domain knowledge, and Probabilistic Graphical Models that use factorizations and graph structures for efficient learning and inference. I am also interested in the design of scalable inference and learning algorithms over large data. More recently, I have been investigating the problem of learning from weak or indirect supervision, such as inter-active learning, partially relevant supervision and use of various forms of background knowledge. I have explored a wide range of applications, including entity resolution and deduplication in databases, word-sense disambiguation, text mining, sentiment analysis and opinion mining, social media analysis, service mining from legacy software and system log analysis.
|PhD in Computer Science||University of Maryland, College Park||12/2006|
|MS in Computer Science||University of Maryland, College Park||6/2004|
|BTech in Computer Science||Indian Institute of Technology, Kharagpur||6/1999|
Affiliations9/2012 - current: Research Scientist at the Business Analytics and Mathematical Sciences Department at IBM India Research Lab, Bangalore.
6/2010 - 8/2012: Assistant Professor at the Department of Computer Science and Automation at the Indian Institue of Science, Bangalore.
4/2007 - 5/2010: Research Scientist at the Information Management group at IBM India Research Lab, New Delhi, working on pattern mining over large heterogeneous and noisy information sources. I investigated research challenges around information integration and data cleansing, clustering of heterogeneous data and cross-domain learning.
1/2003 - 02/2007: Research assistant at the Department of Computer Science, University of Maryland under Lise Getoor working on models and algorithms for collectively resolving references to real-world entities in structured and semi-structured domains, like bibliographic and natural language data. I have designed a relational clustering algorithm that takes domain relationships into account for iteratively clustering references into entities. In addition, I have proposed a probabilistic generative model that looks for hidden group structures among domain entities as evidence for resolving references. I have developed an efficient unsupervised inference algorithms for this model using Gibbs Sampling techniques. I have shown that both of these approaches improve performance over attribute baselines in multiple real-world and synthetic datasets. In addition to collective resolution over an entire database, I have investigated the problem of query-centric entity resolution. For the related problem of word sense disambiguation using multiple languages, I have developed generative models for bilingual corpora and have shown that they outperform existing sense disambiguation approaches in real datasets.
6/2002 - 12/2002: Research assistant at the Graphics Lab, University of Maryland, working on faster rendering techniques and compact representations for 3D models making use of local similarity in datasets.
6/2001 - 8/2001: Research intern at Virtio Corporation, Campbell, California, working on the design and implementation of a translator for Virtio's virtual prototyping language to SystemC.
6/1999 - 5/2000: Project officer at the Department of Computer Science and Engineering, IIT Kharagpur in the National Semiconductor Corporation funded "Virtual Silicon" project, working on verification techniques for the SDL-C prototyping language.
8/1998 - 5/1999: Undergraduate researcher at the Department of Computer Science and Engineering, IIT Kharagpur, working on "Similarity Retrieval from Image Databases" by rank-ordering images in a database with respect to the spatial and topological relations existing between objects. I focussed on developing fuzzy similarity measures to deal with uncertainty/vagueness in images.
- PROGRAM COMMITTEE MEMBER
- European Conference on Machine Learning (ECML-PKDD), 2011
- Conference on Artificial Intelligence (AAAI), 2011
- The IEEE International Conference on Data Mining (ICDM), 2010
- Workshop on Analytics for Noisy Unstructured Text Data (AND), 2010
- The IEEE International Conference on Data Mining (ICDM), 2009
- The International Joint Conference on Artificial Intelligence (IJCAI), 2009
- The International Conference on Machine Learning (ICML), 2009
- The ACM WebKDD Workshop, 2008
- The Annual Meeting of the Association for Computational Linguistics (ACL-HLT), 2008
- The ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) , 2008
- The Twelfth International Conference on Database Systems for Advance Applications (DASFAA) (Demonstrations Track) 2008
- The IEEE International Conference on Data Mining (ICDM), 2007
- The Twenty-Second Conference on Artificial Intelligence (AAAI), 2007
- CONFERENCE REVIEWER (Recent)
- Conference on Neural Information Processing Systems (NIPS), 2011
- IEEE International Conference on Data Mining (ICDM), 2011
- JOURNAL REVIEWER
- IEEE Transactions on Knowledge and Data Engineering, ACM Transactions on Database Systems, IEEE Transactions on Systems, Man, and Cybernetics, IEEE Transactions on Neural Networks, ACM Journal on Data and Information Quality, Pattern Analysis and Applications