SystemT - Annotated Publications


Below is a selected list of SystemT publications with brief annotations (descriptions), and links to each manuscript.

 

SystemT Overview

  1. ACL 2010 paper on the overview of SystemT, fundamental results on expressivity and performance of SystemT, and comprehensive evaluation with GATE (quality or results, speed of execution and memory consumption) [pdf] and follow-up Technical report comparing with ANNIE/Jape+ in  Gate 7 (released Feb 15 2012): [pdf]

  2. SIGMOD Record 2008 paper on an early overview of SystemT [pdf]

 

SystemT Core

  1. ICDE 2008 paper on the SystemT underlying algebra and optimizer [pdf]

  2. ACL 2010 paper on the overview of SystemT, fundamental results on expressivity and performance of SystemT, and comprehensive evaluation with GATE (quality or results, speed of execution and memory consumption) [pdf] and follow-up Technical report comparing with ANNIE/Jape+ in  Gate 7 (released Feb 15 2012): [pdf

  3. EMNLP 2010 paper which demonstrates domain customization of the named-entity extractors built in SystemT have quality outperforming best state-of-the-art published results [pdf]

  4. ICDE 2011 paper on selectivity estimation for extraction operators [pdf]

  5. ACL 2013 paper on preprocessing noisy text to improve the quality of information extraction results [pdf]

  6. NAACL 2015 paper on the impact of normalizing social media text for different applications [pdf]

  7. Springer's Encyclopedia of Database Systems - chapter on Web Information Extraction [pdf]

 

SystemT Multilingual

  1. ACL 2015 paper on automatically generating high-quality training data for multilingual semantic role labeling [pdf]

  2. ACL 2016 demo paper on multilingual SRL using a unified language-invariant abstraction [video] [pdf]

  3. EMNLP 2016 paper on semi-automatic generation of proposition banks for low-resource languages [pdf]

  4. COLING 2016 demo paper on Multilingual Information Extraction in AQL [video] [pdf]

  5. COLING 2016 paper on K-SRL: Instance-based Learning for Semantic Role Labeling [pdf]

  6. COLING 2016 paper on Multilingual Aliasing for Auto-Generating Proposition Banks [pdf]

  7. IJCAI 2017 paper on Active Learning for Black-box Semantic Role Labeling with neural Factors [pdf]

 

SystemT Tooling 

  1. EMNLP 2008 paper on how to automatically learn high-quality regular expressions based on an initial user-given regular expression and labeled data [pdf]

  2. VLDB 2010 paper on how to automatically refine AQL rules for information extraction [pdf]

  3. CIKM 2011 paper on how to facilitate pattern discovery for relation extraction using a novel clustering algorithm [pdf]

  4. EMNLP 2012 paper on rule induction for AQL [pdf]

  5. CHI 2013 paper on tooling for novice text analytics developer [pdf]

  6. SIGMOD 2013 paper on refining dictionaries for information extraction [pdf]

  7. ACL 2013 paper on detecting ambiguous terms to improve the quality of extraction results [pdf]

  8. CHI 2017 paper on learning declarative NLP specifications from very few examples [video] [pdf] [slides in pdf]

 

Theoretical Foundations

  1. PODS 2013 paper on the theoretical foundations of AQL [pdf]

  2. PODS 2014 paper on theoretical foundations of the AQL consolidate primitive [pdf]

 

Hardware Acceleration 

  1. JPDC 2017 journal paper on Hardware Compilation Framework for Text Analytics Queries [to appear]

  2. IEEE Micro 2014 journal paper on executing AQL extractors on combined software-hardware architecture [pdf]

  3. FPL 2013 paper on regular expression evaluation on FPGA [pdf]

  4. FPL 2014 paper on compiling operator graphs to FPGA [pdf]

 

Tutorials and Opinion Paper

  1. SIGMOD 2010 tutorial on Enterprise Information Extraction [pdf] [slides in pdf

  2. EMNLP 2013 opinion paper on why rule-based information extraction systems are NOT dead [pdf]

  3. EMNLP 2015 tutorial on transparent machine learning for information extraction [link] [pdf] [slides in pdf]
  4. CIKM 2008 tutorial on Evolution of Rule-based Information Extraction: from Grammars to Algebra [slides in pdf]
  5. SIGMOD 2006 tutorial on Managing Information Extraction [pdf] [slides in pdf]

  

Demo 

  1. ACL 2011 SystemT demo [pdf]
  2. SIGMOD 2011 SystemT Tooling demo [pdf]
  3. ACL 2012 SystemT interactive development environment [pdf]
  4. VLDB 2015 VINERy: A Visual IDE for Information Extraction [video] [pdf]
  5. SIGMOD 2017 demo on learning declarative NLP specifications from very few examples [video] [pdf]

 

Applications

  1. DSMM 2017 workshop paper on Re-defining Relation Understanding in Financial Domain [pdf]
  2. VLDB 2017 demo on Building and Interacting with Large-scale Knowledge Bases [pdf]

 

 

 

 

 

 

 




Awards

2013 - IBM Research Outstanding Technical Accomplishment Award

2008, 2010, 2013 - IBM Research A-Level Accomplishment Award


Recent News

10/04/17

Hosted Univ. of Washington professor Luke Zettlemoyer's visit to IBM Research - Almaden

10/02/17

Yunyao is co-chairing the very first NAACL-HLT Industry Track

08/29/17

Demo paper on Creating and Interacting with Large-Scale Domain-Specific Knowledge Bases is presented at VLDB 2017 [video] [poster]

08/06/17

Research paper on Distant Meta-Path Similarities for Text-Based Heterogeneous Information Networks is accepted at CIKM 2017

06/30/17

Research paper on Crowd-in-the-loop: A Hybrid Approach for Annotating Semantic Roles is accepted at EMNLP 2017

06/08/17

Hosted Stanford professor Dan Jurafsky's visit to IBM Research - Almaden

05/31/17

Research paper on Hardware Compilation Framework for Text Analytics Queries is accepted to Journal of Parallel and Distributed Computing (JPDC)

05/16/17

SEER, a system on learning extractors from examples, presented at CHI and SIGMOD 2017 [video] [paper]

05/16/17

Workshop paper on understanding relationships in the financial domain presented at DSMM 2017 [paper]

05/12/17

Talk on Crosslingual Text Analytics at the Natural Language and Dialog Systems Lab, UC Santa Cruz

05/04/17

Demo paper on creating and interacting with large-scale knowledge bases is accepted in VLDB 2017

04/24/17

Research paper on Active Learning for Black-box Semantic Role Labeling is accepted at IJCAI 2017

02/21/17

Lecture on SystemT at NYU Abu Dhabi in February 2017

01/19/17

Talk on Declarative Information Extraction and Multilingual SRL at the Stanford Logic Seminar.