SystemT - overview


We are publicly releasing Version 1.0 of the Universal Proposition Banks for multilingual semantic role labeling!

SystemT Online Classes: [here

Hiring Now!

Multiple positions available. Email your resume to Yunyao Li: yunyaoli{at} us{.}ibm{.}com



  • State-of-the-art AQL language for expressing NLP algorithms, optimizer and runtime engine for execution at scale, and easy to use user interface [demo]
  • Publications in top NLP, database systems, hardware and HCI conferences
  • Winner of multiple IBM Corporate Awards for its contributions to IBM products and clients
  • Currently taught in multiple universities
  • SystemT explained in 5 minutes



Information extraction (IE) refers to the task of extracting structured information from unstructured or semi-structured data. In recent years, IE has become increasingly important to a wide array of enterprise applications, ranging from Business Intelligence to Data-as-a-Service. Such applications drive the following main requirements for IE systems: accuracy, productivity, scalability, expressiviity, transparency, and customizability.

SystemT, a declarative IE system, has been designed and developed to address these requirements. It is based on the basic principle underlying relational database technology: complete separation of specification from execution. SystemT uses a declarative rule language, AQL, and an optimizer that generates high-performance algebraic execution plans for AQL rules. It makes IE orders of magnitude more scalable and easy to use, maintain and customize.

SystemT ships today with multiple products across 4 IBM Software Brands. Furthermore, SystemT is used in multiple ongoing research projects and being taught in universities. Our ongoing research and development efforts focus on making SystemT more usable for both technical and business users, and continuing enhancing its core functionalities based on natural language processing, machine learning, and database technology.










2013 - IBM Research Outstanding Technical Accomplishment Award

2008, 2010, 2013 - IBM Research A-Level Accomplishment Award

Recent News


Hosted Univ. of Washington professor Luke Zettlemoyer's visit to IBM Research - Almaden


Yunyao is co-chairing the very first NAACL-HLT Industry Track


Demo paper on Creating and Interacting with Large-Scale Domain-Specific Knowledge Bases is presented at VLDB 2017 [video] [poster]


Research paper on Distant Meta-Path Similarities for Text-Based Heterogeneous Information Networks is accepted at CIKM 2017


Research paper on Crowd-in-the-loop: A Hybrid Approach for Annotating Semantic Roles is accepted at EMNLP 2017


Hosted Stanford professor Dan Jurafsky's visit to IBM Research - Almaden


Research paper on Hardware Compilation Framework for Text Analytics Queries is accepted to Journal of Parallel and Distributed Computing (JPDC)


SEER, a system on learning extractors from examples, presented at CHI and SIGMOD 2017 [video] [paper]


Workshop paper on understanding relationships in the financial domain presented at DSMM 2017 [paper]


Talk on Crosslingual Text Analytics at the Natural Language and Dialog Systems Lab, UC Santa Cruz


Demo paper on creating and interacting with large-scale knowledge bases is accepted in VLDB 2017


Research paper on Active Learning for Black-box Semantic Role Labeling is accepted at IJCAI 2017


Lecture on SystemT at NYU Abu Dhabi in February 2017


Talk on Declarative Information Extraction and Multilingual SRL at the Stanford Logic Seminar.