Amrita Saha  Amrita Saha photo         

contact information

Research Scientist
India Research Laboratory, Bangalore, India




Research Interest

Amrita Saha is a Research Scientist at IBM Research since 2012, where she has worked on various research problems at the intersection of language and vision, developing machine learning and deep representation learning techniques for learning to answer questions or converse or even debate over a mix of structured or unstructured multi-modal data. Her experience at IBM started with working on a futuristic Grand Challenge research on building an artificial Debater, to an industry-impacting research on bringing cognitive computing technologies to the world of fashion and more recently, to leading a broader research agenda on building interactive multimodal AI capabilities in collaboration with top-tier universities. She obtained her Masters degree in Computer Science from Indian Institute of Technology Bombay, India in 2012, prior to joining IBM Research. Previous to her Masters degree, she had worked on coding theory for Wireless Sensor Network Security during her Bachelor Course in Information Technology. Over the years she has published in various reputed conferences and journals like TACL, ACL, AAAI, SDM, NeurIPS, COLING etc, as well as organized workshop or served as PC member in some of them. Additionally she has also filed multiple US patents during her time at IBM research. 

Complete Resume

Research Statement 


Academic Research Collaborations at IBM [2017-Onwards]

IBM AI Horizons: (In collaboration with Prof. Soumen Chakrabarty, Indian Institute of Technology Bombay, India) [2017-Onwards] I led and mentored a small team of IBM researchers and mentored external students to drive this collaborative work on the following research problems

Complex Program Induction for KB based Question Answering [2018] Proposed and led the very first attempt at training a model, Complex Imperative Program Induction from Terminal Rewards (CIPITR) to answer complex questions from different Complex KBQA datasets where questions may require upto 12 steps of reasoning. Most importantly, the model was trained with only the question-answer pairs as weak supervision and it learned to induce program from oracle query annotation, without the need for any oracle program during training.

The next step was to explore more challenging settings for training a noise-resilient version of CIPITR, where query annotations are not oracle but predictions of an unsupervised query annotator, which has such high degree of noise that only 10-20% of the data is left answerable.

 Complex KB Embeddings [2017]: Led the following research on improved KB embeddings:   1) Learning complex structured representations for Knowledge Bases using hyper-rectangular/subspace embeddings and structured sparsity based regularization constraints. The objective was to come up with more interpretable and semantically aware embeddings for KB artifacts and representation of functional operations over KB subgraphs.     2) Joint representation learning of KB artifacts and unstructured corpus for fine-type tagging tasks. Here we explored structured (hierarchical) attention and learnt dual (global and context-specific) representations of KB artifacts

IBM Open Science Collaboration: Neural Models and Datasets for Interactive AI (In collaboration with Prof. Mitesh M. Khapra, Indian Institute of Technology Madras, India) [2016-17] I proposed different interactive AI problems on the three following tasks and led a small team of IBM Research engineers and mentored external students towards achieving the objective.

  • Complex Conversational KB based Question Answering
  • Domain Specific Multimodal (Visual) Conversation
  • Question Answering by reading a Paraphrase Comprehension
  • Learning Disentangled Multimodal Representations for the Fashion Domain
  • A Correlational Encoder Decoder Architecture for Pivot Based Sequence Generation

The objective of these research were to understand the limitations of the different benchmarks the community has established in the different interactive tasks and thereby constructing novel and useful datasets that explore aspects and complexities of the task that contemporary datasets or models distinctly lack or cannot handle. Other than datasets, as part of building towards some of these complex models, I also proposed and led works on different paradigms of common representation learning between multimodal and multilingual data.


IBM Debating Technology: A Grand Challenge [2013-2016]

At IBM Research, I have been an integral part of the core team at IBM, which is developing a Computational Argumentation Framework for machines to argue and debate with humans over any open-ended topic of controversy. As a leading person on the module of Stance or Pro-Con analysis from the India Research Lab, who has been working since the inception of the grand challenge, I owned the module of Topic-Based Stance (Pro/Con) Classification of arguments in open-domain debates. More specifically I led the work on

  • Identification of free-form Topics in short claim sentences and long evidence passages and identifying the semantic relation between these open-domain topics
  • Modeling a Bayesian Non-Parametric solution for learning a knowledge graph of semantic relations (consistent or contrastive) between open domain concepts appearing in corpus like Wikipedia in a semi-supervised or unsupervised setting


IBM Visual Linguist: A Picture is worth a Thousand Words [2014 - 2015]

This was a far-reaching research project proposed by a very small (3-member) team who worked on a stretch to shape this project on understand open domain images and generating a natural language caption crisply describing it. As an integral part of that team, I owned the following modules

  • Led the work on a probabilistic graphical model based inference framework for a taxonomy-grounded aggregation of scores from multiple different classifiers pre-trained with different label sets
  • Implemented various Deep Learning modules for Image Understanding, Multimodal Representation Learning, Language Models and Corpus-Co-occurrence based action/attribute prediction in image
  • Built a Visual Search application for e-commerce (especially fashion) using the above image-understanding system (which was shortlisted in the top-9 out of over 60 submissions for the IBM Cognitive Hackathon, 2015 and later culminated in the project IBM Cognitive Fashion)


IBM Cognitive Fashion [2015 - 2016]

This project is aimed at bringing a plethora of cognitive computing technologies (machine learning, Deep Learning, image and natural language understanding and generation etc.) to the fashion world and leveraging the vast amounts of (structured and unstructured) fashion data available there. Again working in a very small team of 3 members, I have proposed and owned several modules on multimodal question-answering/dialogue-systems/recommender system/representation-learning/cross-domain retrieval, as well as

  • Co-organized a workshop “Machine Learning Meets Fashion” at the international conference of Knowledge Discovery and Data Mining 2016
  • Modeled a Deep Learning architecture on Fashion2Vec: Learning Joint multi-modal representations for cross-modal search in e-commerce in absence of catalogs
  • Modeled Multimodal dialogue systems that are enriched by structured sources like knowledge bases and catalogue data and unstructured sources like free-form description of products
  • Built interactive demos in various applications like multi-modal dialogue systems, cross-modal retrieval and visual search for different clients in the fashion/jewelry/e-commerce domain


Positions of Organizational Responsibility at IBM Research

  • Co-organized a first-of-a-kind workshop on Machine Learning meets Fashion, at Knowledge Discovery and Data Mining (KDD) conference, 2016
  • Lead organizer of the first workshop on Linguistics Meets Image and Video Retrieval at International Conference of Computer Vision (ICCV), 2019
  • Served as PC-member and in organizing and reviewing committee for workshops in several internationally acclaimed conferences like KDD and VLDB.
  • Organized the regular department meetings at IBM Research Lab for the Cognitive Technology and Services Team, for over two years
  • Handled miscellaneous responsibilities like hiring researchers and research software engineers and mentoring under-graduate students from various universities interning at IBM Research Lab over the last few years and training annotators to collect relevant data for various text/image analytics applications