SystemT Publications



2017

Crowd-in-the-loop: A hybrid approach for annotating semantic roles
Chenguang Wang, Alan Akbik, Laura Chiticariu, Yunyao Li, Fei Xia, and Anbang Xu
EMNLP, 2017

Active learning for black-Box semantic role labeling with neural factors
Chenguang Wang, Laura Chiticariu, and Yunyao Li
IJCAI, 2017
Abstract

A Hardware Compilation Framework for Text Analytics Queries
R. Polig, K. Atasu, H. Giefers, C. Hagleitner, L. Chiticariu, F. R. Reiss, H. Zhu, P. H. Hofstee.
Journal of Parallel and Distributed Computing (JPDC), 2017

A Rectangle Mining Method for Understanding the Semantics of Financial Tables
Xilun Chen, Laura Chiticariu, Marina Danilevsky, Alexandre Evfimievski and Prithviraj Sen
International Conference on Document Analysis and Recognition (ICDAR) (to appear), 2017

SEER: Auto-Generating Information Extraction Rules from User-Specified Examples
Maeda Hanafi, Azza Abouzied, Laura Chiticariu, Yunyao Li
ACM CHI Conference on Human Factors in Computing Systems, 2017

Synthesizing Extraction Rules from User Examples with SEER
Maeda F. Hanafi, Azza Abouzied, Laura Chiticariu, Yunyao Li
SIGMOD, pp. 1687--1690, 2017

Towards Re-defining Relation Understanding in Financial Domain
Chenguang Wang, Doug Burdick, Laura Chiticariu, Rajasekar krishnamurthy, Yunyao Li, Huaiyu Zhu
Data Science for Macro-Modeling with Financial and Economic Datasets (DSMM), collocated with ACM SIGMOD, 2017

Creation and Interaction with Large-scale Domain-Specific Knowledge Bases
Shreyas Bharadwaj et al.
VLDB (demonstrations), 2017


2016

Polyglot: Multilingual Semantic Role Labeling with Unified Labels
Alan Akbik, Yunyao Li
ACL 2016, 54th Annual Meeting of the Association for Computational Linguistics, pp. to appear

Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages
Alan Akbik, Kumar Vishwajeet, Yunyao Li
EMNLP 2016, Conference on Empirical Methods on Natural Language Processing, pp. to appear

Multilingual Information Extraction with PolyglotIE
Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Huaiyu Zhu
COLING 2016, 26th International Conference on Computational Linguistics, pp. to appear


Multilingual Aliasing for Auto-Generating Proposition Banks
Alan Akbik, Xinyu Guan, Yunyao Li
COLING 2016, 26th International Conference on Computational Linguistics, pp. to appear

Web Information Extraction
Laura Chiticariu, Marina Danilevsky, Howard Ho, Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, Shivakumar Vaithyanathan, Huaiyu Zhu
Encyclopedia of Database Systems, pp. 1--9, Springer New York, 2016


2015



VINERy: A Visual IDE for Information Extraction
Yunyao Li, Elmer Kim, Marc A. Touchette, Ramiya Venkatachalam, Hao Wang
PVLDB 8(12), 1948--1959, 2015

Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling
Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan, Huaiyu Zhu
ACL, pp. 397--407, 2015


2014

Cleaning inconsistencies in information extraction via prioritized repairs
Ronald Fagin, Benny Kimelfeld, Frederick Reiss, Stijn Vansummeren
Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 164--175, 2014

Understand users’ comprehension and preferences for composing information visualizations
Huahai Yang, Yunyao Li, Michelle X Zhou
ACM Transactions on Computer-Human Interaction (TOCHI) 21(1), 6, ACM, 2014

VLDB 2014 Ph.D. Workshop - An Overview.
Yunyao Li, Erich J. Neuhold
Proceedings of the VLDB Endowment 7(13), 2014

Compiling text analytics queries to FPGAs
Raphael Polig, Kubilay Atasu, Heiner Giefers, Laura Chiticariu
24th International Conference on Field Programmable Logic and Applications, FPL 2014, Munich, Germany, 2-4 September, 2014, pp. 1--6


Giving text analytics a boost
Raphael Polig, Kubilay Atasu, Laura Chiticariu, Christoph Hagleitner, H Peter Hofstee, Frederick R Reiss, Huaiyu Zhu, Eva Sitaridi
IEEE Micro 34(4), 6--14, IEEE, 2014


2013

I can do text analytics!: designing development tools for novice developers
Huahai Yang, Daina Pupons-Wickham, Laura Chiticariu, Yunyao Li, Benjamin Nguyen, Arnaldo Carreno-Fuentes
Proceedings of the 2013 ACM annual conference on Human factors in computing systems, pp. 1599--1608
slideshare


Hardware-Accelerated Regular Expression Matching for High-Throughput Text Analytics
Kubilay Atasu, Raphael Polig, Christoph Hagleitner and Frederick. R. Reiss
23rd International Conference on Field Programmable Logic and Applications, pp. 1--7, IEEE, 2013

I can do text analytics!: designing development tools for novice developers
Huahai Yang, Daina Pupons-Wickham, Laura Chiticariu, Yunyao Li, Benjamin Nguyen, Arnaldo Carreno-Fuentes
CHI, pp. 1599-1608, 2013

OpinionBlocks: a crowd-powered, self-improving interactive visual analytic system for understanding opinion text
Mengdie Hu, Huahai Yang, Michelle X Zhou, Liang Gou, Yunyao Li, Eben Haber
Human-Computer Interaction--INTERACT 2013, pp. 116--134, Springer

Identifying user needs from social media
Huahai Yang, Yunyao Li
Technical Report, IBM Tech Report. goo. gl/2XB7NY, 2013

Automatic Term Ambiguity Detection
Tyler Baldwin, Yunyao Li, Bogdan Alexe, Ioana R Stanoi
Proceedings of ACL, 2013

Adaptive Parser-Centric Text Normalization
Congle Zhang, Tyler Baldwin, Howard Ho, Benny Kimelfeld, Yunyao Li
Proceedings of ACL, pp. 1159--1168, 2013
slides

Spanners: A Formal Framework for Information Extraction
Ronald Fagin, Benny Kimelfeld, Frederick Reiss, Stijn Vansummeren
PODS, 2013



2012

WizIE: a best practices guided development environment for information extraction
Yunyao Li, Laura Chiticariu, Huahai Yang, Frederick R Reiss, Arnaldo Carreno-Fuentes
Proceedings of the ACL 2012 System Demonstrations, pp. 109--114

Dictionary refinement for information extraction
Laura Chiticariu, Vitaly Feldman, Frederick R Reiss, Sudeepa Roy, Huaiyu Zhu
US Patent App. 13/480,974

Refining a dictionary for information extraction
Laura Chiticariu, Vitaly Feldman, Frederick R Reiss, Sudeepa Roy, Huaiyu Zhu
US Patent App. 13/598,946

WizIE: A Best Practices Guided Development Environment for Information Extraction
Yunyao Li, Laura Chiticariu, Huahai Yang, Frederick Reiss, Arnaldo Carreno-Fuentes
ACL (Demonstration), pp. 109-114, 2012

Gumshoe quality toolkit: administering programmable search
Zhuowei Bao, Benny Kimelfeld, Yunyao Li, Sriram Raghavan, Huahai Yang
21st ACM International Conference on Information and Knowledge Management, CIKM'12, Maui, HI, USA, October 29 - November 02, 2012, pp. 2716--2718

Towards Efficient Named-Entity Rule Induction for Customizability
Ajay Nagesh, Ganesh Ramakrishnan, Laura Chiticariu, Rajasekar Krishnamurthy, Ankush Dharkar, Pushpak Bhattacharyya
EMNLP-CoNLL, pp. 128-138, 2012

Automatic Suggestion of Query-Rewrite Rules for Enterprise Search
Z. Bao, B. Kimelfeld, Y. Li
preparation for ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2012


2011

The SystemT IDE: an integrated development environment for information extraction rules
Laura Chiticariu, Vivian Chu, Sajib Dasgupta, Thilo W Goetz, Howard Ho, Rajasekar Krishnamurthy, Alexander Lang, Yunyao Li, Bin Liu, Sriram Raghavan, others
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 1291--1294

A Graph Approach to Spelling Correction in Domain-Centric Search.
Zhuowei Bao, Benny Kimelfeld, Yunyao Li
ACL, pp. 905--914, 2011

The SystemT IDE: an integrated development environment for information extraction rules
Laura Chiticariu, Vivian Chu, Sajib Dasgupta, Thilo W. Goetz, Howard Ho, Rajasekar Krishnamurthy, Alexander Lang, Yunyao Li, Bin Liu, Sriram Raghavan, Frederick Reiss, Shivakumar Vaithyanathan, Huaiyu Zhu
SIGMOD (Demonstration), pp. 1291-1294, 2011


Rewrite rules for search database systems
R. Fagin, B. Kimelfeld, Y. Li, S. Raghavan, S. Vaithyanathan
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 271--282, 2011

Facilitating pattern discovery for relation extraction with semantic-signature-based clustering
Yunyao Li, Vivian Chu, Sebastian Blohm, Huaiyu Zhu, Howard Ho
Proceedings of the 20th ACM international conference on Information and knowledge management, pp. 1415--1424, 2011

Selectivity estimation for extraction operators over text data
Daisy Zhe Wang, Long Wei, Yunyao Li, Frederick Reiss, Shivakumar Vaithyanathan
Data Engineering (ICDE), 2011 IEEE 27th International Conference on, pp. 685--696, Citeseer


2010

Enterprise information extraction: recent developments and open challenges
Laura Chiticariu, Yunyao Li, Sriram Raghavan, Frederick Reiss
SIGMOD (Tutorial), pp. 1257-1258, 2010

Understanding queries in a search database system
R Fagin, B Kimelfeld, Y Li, S Raghavan, S Vaithyanathan
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 273--284, ACM, 2010

Refining Information Extraction Rules using Data Provenance
Bin Liu, Laura Chiticariu, Vivian Chu, H. V. Jagadish, Frederick Reiss
IEEE Data Eng. Bull. 33(3), 17-24, Citeseer, 2010

Automatic Rule Refinement for Information Extraction
Bin Liu, Laura Chiticariu, Vivian Chu, H. V. Jagadish, Frederick Reiss
Proceedings of the VLDB Endowment Journal 3(1), 588-597, VLDB Endowment, 2010

Enterprise information extraction: recent developments and open challenges
Laura Chiticariu, Yunyao Li, Sriram Raghavan, Frederick R Reiss
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 1257--1258, ACM
Abstract

Domain adaptation of rule-based annotators for named-entity recognition tasks
Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Frederick Reiss, Shivakumar Vaithyanathan
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1002--1012

SystemT: an algebraic approach to declarative information extraction
Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick R Reiss, Shivakumar Vaithyanathan
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 128--137, Association for Computational Linguistics, 2010
Abstract


2009

User-Guided Regular Expression Learning
Rajasekar Krishmamurthy, Yunyao Li, Sriram Raghavan, Shivakumar Vaithyanathan
US Patent App. 12/369,216

Towards a Scalable Enterprise Content Analytics Platform
K B V E R Krishnamurthy, S R J R F Reiss, E J S D Simmen, S T S V H Zhu
Data Engineering, 28, 2009

Enabling enterprise mashups over unstructured text feeds with infosphere mashuphub and systemt
David E Simmen, Frederick Reiss, Yunyao Li, Suresh Thalamati
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 1123--1126, ACM
Abstract


2008

An algebraic approach to rule-based information extraction
Frederick Reiss, Sriram Raghavan, Rajasekar Krishnamurthy, Huaiyu Zhu, Shivakumar Vaithyanathan
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, pp. 933--942, IEEE

Regular expression learning for information extraction
Yunyao Li, Rajasekar Krishnamurthy, Sriram Raghavan, Shivakumar Vaithyanathan, H V Jagadish
Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 21--30, Association for Computational Linguistics, 2008
Abstract

SystemT: a system for declarative information extraction
Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, Shivakumar Vaithyanathan, Huaiyu Zhu
ACM SIGMOD Record 37(4), 7--13, ACM, 2008
Abstract


2006

Avatar semantic search: a database approach to information retrieval
Eser Kandogan, Rajasekar Krishnamurthy, Sriram Raghavan, Shivakumar Vaithyanathan, Huaiyu Zhu
Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 790--792

Avatar Information Extraction System.
TS Jayram, Rajasekar Krishnamurthy, Sriram Raghavan, Shivakumar Vaithyanathan, Huaiyu Zhu
IEEE Data Eng. Bull. 29(1), 40--48, 2006

Getting work done on the web: supporting transactional queries
Y Li, R Krishnamurthy, S Vaithyanathan, HV Jagadish
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 557--564, 2006


2005

AVATAR: Using text analytics to bridge the structured--unstructured divide
Huaiyu Zhu, Sriram Raghavan, Shivakumar Vaithyanathan, Jayram S Thathachar, Rajasekar Krishnamurthy, Prasad Deshpande, Rahul Gupta, Krishna P Chitrapura
Almaden. ibm.[Online]. Available: http://www. almaden. ibm. com/cs/projects/avatar/techrep04. pdf, 2005




Awards

2013 - IBM Research Outstanding Technical Accomplishment Award

2008, 2010, 2013 - IBM Research A-Level Accomplishment Award


Recent News

10/04/17

Hosted Univ. of Washington professor Luke Zettlemoyer's visit to IBM Research - Almaden

10/02/17

Yunyao is co-chairing the very first NAACL-HLT Industry Track

08/29/17

Demo paper on Creating and Interacting with Large-Scale Domain-Specific Knowledge Bases is presented at VLDB 2017 [video] [poster]

08/06/17

Research paper on Distant Meta-Path Similarities for Text-Based Heterogeneous Information Networks is accepted at CIKM 2017

06/30/17

Research paper on Crowd-in-the-loop: A Hybrid Approach for Annotating Semantic Roles is accepted at EMNLP 2017

06/08/17

Hosted Stanford professor Dan Jurafsky's visit to IBM Research - Almaden

05/31/17

Research paper on Hardware Compilation Framework for Text Analytics Queries is accepted to Journal of Parallel and Distributed Computing (JPDC)

05/16/17

SEER, a system on learning extractors from examples, presented at CHI and SIGMOD 2017 [video] [paper]

05/16/17

Workshop paper on understanding relationships in the financial domain presented at DSMM 2017 [paper]

05/12/17

Talk on Crosslingual Text Analytics at the Natural Language and Dialog Systems Lab, UC Santa Cruz

05/04/17

Demo paper on creating and interacting with large-scale knowledge bases is accepted in VLDB 2017

04/24/17

Research paper on Active Learning for Black-box Semantic Role Labeling is accepted at IJCAI 2017

02/21/17

Lecture on SystemT at NYU Abu Dhabi in February 2017

01/19/17

Talk on Declarative Information Extraction and Multilingual SRL at the Stanford Logic Seminar.