Contact Information

Indrajit Bhattacharya
Research Scientist, Business Analytics and Mathematical Sciences
India Research Laboratory, Bangalore, India
indrajitbatin.ibm.com      +91dash97391dash88764


Tab navigation

Research Interest

My interest lies in reasoning under uncertainty for analysis and prediction over heterogeneous and relational data using probabilistic models. Tools and techniques that I use include Parametric and Non-parametric Bayesian models that allow incorporation of prior information and domain knowledge, and Probabilistic Graphical Models that use factorizations and graph structures for efficient learning and inference. I am also interested in the design of scalable inference and learning algorithms over large data. More recently, I have been investigating the problem of learning from weak or indirect supervision, such as inter-active learning, partially relevant supervision and use of various forms of background knowledge. I have explored a wide range of applications, including entity resolution and deduplication in databases, word-sense disambiguation, text mining, sentiment analysis and opinion mining, social media analysis, service mining from legacy software and system log analysis.

Education

PhD in Computer Science University of Maryland, College Park 12/2006
MS in Computer Science University of Maryland, College Park 6/2004
BTech in Computer Science Indian Institute of Technology, Kharagpur 6/1999

Affiliations

9/2012 - current: Research Scientist at the Business Analytics and Mathematical Sciences Department at IBM India Research Lab, Bangalore.

6/2010 - 8/2012: Assistant Professor at the Department of Computer Science and Automation at the Indian Institue of Science, Bangalore.

4/2007 - 5/2010: Research Scientist at the Information Management group at IBM India Research Lab, New Delhi, working on pattern mining over large heterogeneous and noisy information sources. I investigated research challenges around information integration and data cleansing, clustering of heterogeneous data and cross-domain learning.

1/2003 - 02/2007: Research assistant at the Department of Computer Science, University of Maryland under Lise Getoor working on models and algorithms for collectively resolving references to real-world entities in structured and semi-structured domains, like bibliographic and natural language data. I have designed a relational clustering algorithm that takes domain relationships into account for iteratively clustering references into entities. In addition, I have proposed a probabilistic generative model that looks for hidden group structures among domain entities as evidence for resolving references. I have developed an efficient unsupervised inference algorithms for this model using Gibbs Sampling techniques. I have shown that both of these approaches improve performance over attribute baselines in multiple real-world and synthetic datasets. In addition to collective resolution over an entire database, I have investigated the problem of query-centric entity resolution. For the related problem of word sense disambiguation using multiple languages, I have developed generative models for bilingual corpora and have shown that they outperform existing sense disambiguation approaches in real datasets.

6/2002 - 12/2002: Research assistant at the Graphics Lab, University of Maryland, working on faster rendering techniques and compact representations for 3D models making use of local similarity in datasets.

6/2001 - 8/2001: Research intern at Virtio Corporation, Campbell, California, working on the design and implementation of a translator for Virtio's virtual prototyping language to SystemC.

6/1999 - 5/2000: Project officer at the Department of Computer Science and Engineering, IIT Kharagpur in the National Semiconductor Corporation funded "Virtual Silicon" project, working on verification techniques for the SDL-C prototyping language.

8/1998 - 5/1999: Undergraduate researcher at the Department of Computer Science and Engineering, IIT Kharagpur, working on "Similarity Retrieval from Image Databases" by rank-ordering images in a database with respect to the spatial and topological relations existing between objects. I focussed on developing fuzzy similarity measures to deal with uncertainty/vagueness in images.

Professional Activities