Scalable Knowledge Intelligence - Projects

Main Research Projects:

Semantic PDF Understanding

PDF understanding is a general AI problem. In this project, we aim to provide an out-of-box solution to programmatically process PDF documents into a common target representation (e.g., HTML with extensions) for downstreaming applications. The main research challenges are (1) maintain formatting, reading order & structural metadata (e.g. lists, headings, font) and  (2) capture non-textual content (e.g. tabular content).

 Declarative Text Understanding for the Enterprise

SystemT, a declarative text understanding system for the enterprise, has been designed and developed to address pressing requirements to powering AI applications for business. It is based on the basic principle underlying relational database technology: complete separation of specification from execution. SystemT uses a declarative rule language, AQL, and an optimizer that generates high-performance algebraic execution plans for AQL algorithms. It makes text understanding orders of magnitude more scalable and easy to use, maintain and customize.

Video       Learn More

Human-in-the-Loop Entity Resolution and Structured Knowledge Curation

Entity resolution is the ability to link pieces of semantically related information across large amounts of data. Other related but important operations are: entity attribute normalization, data transformation, and entity fusion, all used towards building entity-centric views from underlying data. The two center-pieces of this project are: 1) a high-level language (called HIL) used as a scalable and explainable representation platform for entity-centric operations, and 2) interactive learning techniques (e.g., based on active learning) for mapping the domain expertise of a human into high-accuracy models or artifacts (e.g., in HIL).

Video      Learn More

SystemML: Scalable Machine Learning

SystemML is a Declarative Large-scale Machine Learning system. It aims at flexible specification of ML algorithms and automatic generation of hybrid runtime plans ranging from single node, in-memory computations to distributed computations on MapReduce or Spark. It is available as Apache SystemML and provides an optimal workplace for machine learning using big data. It can be run on top of Apache Spark, where it automatically scales your data, line by line, determining whether your code should be run on the driver or an Apache Spark cluster.

Video    Learn More

Content Services: Building and Interacting with Large-Scale Domain-Specific Knowledge Bases

The ability to create and interact with large-scale domain- specific knowledge bases from unstructured/semi-structured data is the foundation for many industry-focused AI systems. Content Services aims provides cloud services for creating and querying high- quality domain-specific knowledge bases by analyzing and integrating multiple (un/semi)structured content sources. We also work with domain experts to build instantiation of the system for different domains (e.g. finance).

Learn More

LUSTRE: Learning useful structured representations of Entities 

Many data analysis and data integration applications need to account for multiple representations of entities. The variations in entity mentions arise in complex ways that are hard to capture using a textual similarity function. More sophisticated functions require the knowledge of underlying structure in the presentation of entities. We have built LUSTRE, an active learning based system that can learn the structured representations of entities interactively from a few labels. In the background, it automatically generates programs to map entity mentions to their representations and to standardize them to a unique representation.

Video   Learn More


2021 - IBM Outstanding Research Accomplishment Award (upgrade): "Research Contributions to Watson NLP Stack"

2021 - IBM Research Accomplishment Award: "Research Contributions to Watson OneConversion"

2020 - IBM Research Accomplishment Award: "Research Contributions to Watson NLP Stack"

2020 - IBM Research Accomplishment Award: "Deep Thinking Question Answering"

2020 - IBM Special Division Accomplishment Award: "Research Contributions to the IBM COVID-19 Technology Taskforce"

2020 - ISWC Best Poster/Demo Award

2020 - Alonzo Church Award for Outstanding Contributions to Logic and Computation

2019 - IBM Research Accomplishment Award: "Expanded Shallow Semantic Parsing and its Transfer to Watson Products"

2019 - IBM Research Accomplishment Award: "Research Contributions to Document Understanding (Document Conversion, Compare and Table Understanding)"

2019 - IBM Research Accomplishment Award: "IBM Services Solution Advisor and Cognitive Document Risk Analyzer"

2019 - AKBC Best Application Paper Award

2019 - YMCA Silicon Valley Tribute to Women Award

2018 - IBM Research Accomplishment Award: "Foundations for Inverses of Schema Mappings and Applications to Schema Evolution Management"

2018 - IBM Research Accomplishment Award: "Declarative Machine/Deep Learning (SystemML)"

2018 - NAACL Test-of-Time Award

2017 - IBM Research Accomplishment Award: "High-Level Language for Entity Resolution and Integration"

2016 - IBM Research Accomplishment Award

2016 - VLDB Best Paper Award

2015 - ICDT Best Paper Award

2014 - ACM PODS Test-of-Time Award

2013 - IBM Research Outstanding Technical Accomplishment Award

2013 - IBM Research Accomplishment Award

2013 - ICDT Test-of-Time Award

2010 - IBM Research Accomplishment Award

2008 - IBM Research Accomplishment Award