SystemT - overview
New!
- Our recent work on HEIDL is covered by Inside Big Data and it talks about how we are exploring ways for AI to make tedious tasks like contract review easier, faster, and more accurate
- HEIDL is covered in this blog post, where IBM Research details several techniques that might improve natural language processing in the enterprise domain
- SystemT KDD'19 Hands-on Tutorial [here]
- SystemT Online Course: [here] (Updated version is now alive!)
- Blog post on Learning "Chinese Soundex"
- IBM SystemT is now in Dan Jurafsky’s 3rd edition of Speech and Language Processing
Highlights
- State-of-the-art AQL language for expressing NLP algorithms, optimizer and runtime engine for execution at scale, and easy to use user interface [demo]
- Publications in top NLP, database systems, hardware and HCI conferences
- Winner of multiple IBM Corporate Awards for its contributions to IBM products and clients
- Currently taught in multiple universities
- SystemT explained in 5 minutes
- We are publicly releasing Version 1.0 of the Universal Proposition Banks for multilingual semantic role labeling!
SystemT
Information extraction (IE) refers to the task of extracting structured information from unstructured or semi-structured data. In recent years, IE has become increasingly important to a wide array of enterprise applications, ranging from Business Intelligence to Data-as-a-Service. Such applications drive the following main requirements for IE systems: accuracy, productivity, scalability, expressiviity, transparency, and customizability.
SystemT, a declarative IE system, has been designed and developed to address these requirements. It is based on the basic principle underlying relational database technology: complete separation of specification from execution. SystemT uses a declarative rule language, AQL, and an optimizer that generates high-performance algebraic execution plans for AQL rules. It makes IE orders of magnitude more scalable and easy to use, maintain and customize.
SystemT ships today with multiple products across 4 IBM Software Brands. Furthermore, SystemT is used in multiple ongoing research projects and being taught in universities. Our ongoing research and development efforts focus on making SystemT more usable for both technical and business users, and continuing enhancing its core functionalities based on natural language processing, machine learning, and database technology.
Awards
2020 - IBM Research Accomplishment Award: "Research Contributions to Watson NLP Stack"
2020 - IBM Research Accomplishment Award: "Deep Thinking Question Answering"
2020 - IBM Special Division Accomplishment Award: "Research Contributions to the IBM COVID-19 Technology Taskforce"
2020 - ISWC Best Poster/Demo Award
2019 - IBM Research A-Level Accomplishment Award: "Expanded Shallow Semantic Parsing and its Transfer to Watson Products"
2019 - IBM Research A-Level Accomplishment Award: "Research Contributions to Document Understanding (Document Conversion, Compare and Table Understanding)"
2019 - IBM Research A-Level Accomplishment Award: "IBM Services Solution Advisor and Cognitive Document Risk Analyzer"
2019 - AKBC Best Application Paper Award
2018 - NAACL Test-of-Time Award
2014 - IBM Corporate Award
2013 - IBM Research Outstanding Technical Accomplishment Award
2008, 2010, 2013 - IBM Research A-Level Accomplishment Award
Recent News
8/2/19
Our recent work on HEIDL is covered by Inside Big Data and it talks about how we are exploring ways for AI to make tedious tasks like contract review easier, faster, and more accurate. Read it here
8/1/19
HEIDL is covered in this blog post, where IBM Research details several techniques that might improve natural language processing in the enterprise domain
5/9/19
Research paper "NormCo: Deep Disease Normalization for Biomedical Knowledge Base Construction" received the Best Application Paper Award at AKBC 2019
12/13/18
Yunyao gave a talk on Building Domain-Specific Knowledge with Human in the Loop at Robust Machine Learning Algorithms and Systems: Detection & Mitigation of Adversarial Attacks and Anomalies Workshop, National Academies
11/7/18
Yunyao gave a talk on Building Domain-Specific Knowledge with Human in the Loop at University of Michigan AI Lab
07/27/18
Research paper "DIMSIM: An Accurate Chinese Phonetic Similarity Algorithm based on Learned High Dimensional Encoding" is accepted at CONLL 2018 (IBM Research Blog Post).
05/16/18
Research paper "Exploiting Structure in Representation of Named Entities using Active Learning" is accepted at COLING 2018.
05/05/18
Officially joined NSF Center for Big Learning as an Industry Partner.
04/16/18
Demoed LUSTRE an interactive system for entity understanding and standardization at ICDE 2018
04/05/18
Hosted Stanford professor Mark Musen's visit to IBM Research - Almaden
03/26/18
Industry track paper on the design and implementation of SystemT is accepted at NAACL-HLT 2018 Industry Track (the very first industry track at a major NLP conference).
10/04/17
Hosted Univ. of Washington professor Luke Zettlemoyer's visit to IBM Research - Almaden
10/02/17
Yunyao is co-chairing the very first NAACL-HLT Industry Track
08/29/17
Demo paper on Creating and Interacting with Large-Scale Domain-Specific Knowledge Bases is presented at VLDB 2017 [video] [poster]
08/06/17
Research paper on Distant Meta-Path Similarities for Text-Based Heterogeneous Information Networks is accepted at CIKM 2017
06/30/17
Research paper on Crowd-in-the-loop: A Hybrid Approach for Annotating Semantic Roles is accepted at EMNLP 2017
06/08/17
Hosted Stanford professor Dan Jurafsky's visit to IBM Research - Almaden
05/31/17
Research paper on Hardware Compilation Framework for Text Analytics Queries is accepted to Journal of Parallel and Distributed Computing (JPDC)
05/16/17
SEER, a system on learning extractors from examples, presented at CHI and SIGMOD 2017 [video] [paper]
05/16/17
Workshop paper on understanding relationships in the financial domain presented at DSMM 2017 [paper]