SystemT - Annotated Publications


Below is a selected list of SystemT publications with brief annotations (descriptions), and links to each manuscript.

 

SystemT Overview

  1. Latest overview of SystemT, in NAACL 2018 Industry Track [pdf]

  2. ACL 2010 paper on the overview of SystemT, fundamental results on expressivity and performance of SystemT, and comprehensive evaluation with GATE (quality or results, speed of execution and memory consumption) [pdf] and follow-up Technical report comparing with ANNIE/Jape+ in  Gate 7 (released Feb 15 2012): [pdf]

  3. SIGMOD Record 2008 paper on an early overview of SystemT [pdf]

 

SystemT Core

  1. ICDE 2008 paper on the SystemT underlying algebra and optimizer [pdf]

  2. ACL 2010 paper on the overview of SystemT, fundamental results on expressivity and performance of SystemT, and comprehensive evaluation with GATE (quality or results, speed of execution and memory consumption) [pdf] and follow-up Technical report comparing with ANNIE/Jape+ in  Gate 7 (released Feb 15 2012): [pdf

  3. EMNLP 2010 paper which demonstrates domain customization of the named-entity extractors built in SystemT have quality outperforming best state-of-the-art published results [pdf]

  4. ICDE 2011 paper on selectivity estimation for extraction operators [pdf]

  5. ACL 2013 paper on preprocessing noisy text to improve the quality of information extraction results [pdf]

  6. NAACL 2015 paper on the impact of normalizing social media text for different applications [pdf]

  7. Springer's Encyclopedia of Database Systems - chapter on Web Information Extraction [pdf]

 

SystemT Multilingual

  1. ACL 2015 paper on automatically generating high-quality training data for multilingual semantic role labeling [pdf]

  2. ACL 2016 demo paper on multilingual SRL using a unified language-invariant abstraction [video] [pdf]

  3. EMNLP 2016 paper on semi-automatic generation of proposition banks for low-resource languages [pdf]

  4. COLING 2016 demo paper on Multilingual Information Extraction in AQL [video] [pdf]

  5. COLING 2016 paper on K-SRL: Instance-based Learning for Semantic Role Labeling [pdf]

  6. COLING 2016 paper on Multilingual Aliasing for Auto-Generating Proposition Banks [pdf]

  7. IJCAI 2017 paper on Active Learning for Black-box Semantic Role Labeling with neural Factors [pdf]

 

SystemT Tooling 

  1. EMNLP 2008 paper on how to automatically learn high-quality regular expressions based on an initial user-given regular expression and labeled data [pdf]

  2. VLDB 2010 paper on how to automatically refine AQL rules for information extraction [pdf]

  3. CIKM 2011 paper on how to facilitate pattern discovery for relation extraction using a novel clustering algorithm [pdf]

  4. EMNLP 2012 paper on rule induction for AQL [pdf]

  5. CHI 2013 paper on tooling for novice text analytics developer [pdf]

  6. SIGMOD 2013 paper on refining dictionaries for information extraction [pdf]

  7. ACL 2013 paper on detecting ambiguous terms to improve the quality of extraction results [pdf]

  8. CHI 2017 paper on learning declarative NLP specifications from very few examples [video] [pdf] [slides in pdf]

 

Theoretical Foundations

  1. PODS 2013 paper on the theoretical foundations of AQL [pdf]

  2. PODS 2014 paper on theoretical foundations of the AQL consolidate primitive [pdf]

 

Hardware Acceleration 

  1. JPDC 2017 journal paper on Hardware Compilation Framework for Text Analytics Queries [to appear]

  2. IEEE Micro 2014 journal paper on executing AQL extractors on combined software-hardware architecture [pdf]

  3. FPL 2013 paper on regular expression evaluation on FPGA [pdf]

  4. FPL 2014 paper on compiling operator graphs to FPGA [pdf]

 

Tutorials and Opinion Paper

  1. SIGMOD 2010 tutorial on Enterprise Information Extraction [pdf] [slides in pdf

  2. EMNLP 2013 opinion paper on why rule-based information extraction systems are NOT dead [pdf]

  3. EMNLP 2015 tutorial on transparent machine learning for information extraction [link] [pdf] [slides in pdf]
  4. CIKM 2008 tutorial on Evolution of Rule-based Information Extraction: from Grammars to Algebra [slides in pdf]
  5. SIGMOD 2006 tutorial on Managing Information Extraction [pdf] [slides in pdf]

  

Demo 

  1. ACL 2011 SystemT demo [pdf]
  2. SIGMOD 2011 SystemT Tooling demo [pdf]
  3. ACL 2012 SystemT interactive development environment [pdf]
  4. VLDB 2015 VINERy: A Visual IDE for Information Extraction [video] [pdf]
  5. SIGMOD 2017 demo on learning declarative NLP specifications from very few examples [video] [pdf]

 

Applications

  1. DSMM 2017 workshop paper on Re-defining Relation Understanding in Financial Domain [pdf]
  2. VLDB 2017 demo on Building and Interacting with Large-scale Knowledge Bases [pdf]

 

 

 

 

 

 

 




Awards

2013 - IBM Research Outstanding Technical Accomplishment Award

2008, 2010, 2013 - IBM Research A-Level Accomplishment Award


Recent News

07/27/18

Research paper "DIMSIM: An Accurate Chinese Phonetic Similarity Algorithm based on Learned High Dimensional Encoding" is accepted at CONLL 2018.

05/16/18

Research paper "Exploiting Structure in Representation of Named Entities using Active Learning" is accepted at COLING 2018.

05/05/18

Officially joined NSF Center for Big Learning as an Industry Partner.

04/16/18

Demoed LUSTRE an interactive system for entity understanding and standardization at ICDE 2018

04/05/18

Hosted Stanford professor Mark Musen's visit to IBM Research - Almaden

03/26/18

Industry track paper on the design and implementation of SystemT is accepted at NAACL-HLT 2018 Industry Track (the very first industry track at a major NLP conference).

10/04/17

Hosted Univ. of Washington professor Luke Zettlemoyer's visit to IBM Research - Almaden

10/02/17

Yunyao is co-chairing the very first NAACL-HLT Industry Track

08/29/17

Demo paper on Creating and Interacting with Large-Scale Domain-Specific Knowledge Bases is presented at VLDB 2017 [video] [poster]

08/06/17

Research paper on Distant Meta-Path Similarities for Text-Based Heterogeneous Information Networks is accepted at CIKM 2017

06/30/17

Research paper on Crowd-in-the-loop: A Hybrid Approach for Annotating Semantic Roles is accepted at EMNLP 2017

06/08/17

Hosted Stanford professor Dan Jurafsky's visit to IBM Research - Almaden

05/31/17

Research paper on Hardware Compilation Framework for Text Analytics Queries is accepted to Journal of Parallel and Distributed Computing (JPDC)

05/16/17

SEER, a system on learning extractors from examples, presented at CHI and SIGMOD 2017 [video] [paper]

05/16/17

Workshop paper on understanding relationships in the financial domain presented at DSMM 2017 [paper]