Knowledge Induction Team @ IBM Research AI Publications
2020
Covering the News with (AI) Style
Michele Merler, Cicero Nogueira dos Santos, Mauro Martino, Alfio M. Gliozzo, John R. Smith
arXiv preprint arXiv:2002.02369, 2020
Abstract special section, natural language processing, generative grammar, discriminative model, computer science, artificial intelligence
We introduce a multi-modal discriminative and generative frame-work capable of assisting humans in producing visual content re-lated to a given theme, starting from a collection of documents(textual, visual, or both). This framework can be used by edit or to generate images for articles, as well as books or music album covers. Motivated by a request from the The New York Times (NYT) seeking help to use AI to create art for their special section on Artificial Intelligence, we demonstrated the application of our system in producing such image.
special section, natural language processing, generative grammar, discriminative model, computer science, artificial intelligence
Taxonomy Construction of Unseen Domains via Graph-based Cross-Domain Knowledge Transfer
Chao Shang, Sarthak Dash, Md Faisal Mahbub Chowdhury, Nandana Mihindukulasooriya, Alfio Gliozzo
ACL, 2020
2019
Latent Relational Model for Relation Extraction
Gaetano Rossiello, Alfio Gliozzo, Nicolas R. Fauceglia, Giovanni Semeraro
European Semantic Web Conference, pp. 283-297, 2019
Abstract relationship extraction, relational model, natural language processing, information retrieval, information extraction, distributional semantics, computer science, computational linguistics, artificial intelligence, analogy, analogical reasoning
Analogy is a fundamental component of the way we think and process thought. Solving a word analogy problem, such as mason is to stone as carpenter is to wood, requires capabilities in recognizing the implicit relations between the two word pairs. In this paper, we describe the analogy problem from a computational linguistics point of view and explore its use to address relation extraction tasks. We extend a relational model that has been shown to be effective in solving word analogies and adapt it to the relation extraction problem. Our experiments show that this approach outperforms the state-of-the-art methods on a relation extraction dataset, opening up a new research direction in discovering implicit relations in text through analogical reasoning.
doi
relationship extraction, relational model, natural language processing, information retrieval, information extraction, distributional semantics, computer science, computational linguistics, artificial intelligence, analogy, analogical reasoning
Automatic Taxonomy Induction and Expansion
Nicolas Rodolfo Fauceglia, Alfio Gliozzo, Sarthak Dash, Md Faisal Mahbub Chowdhury, Nandana Mihindukulasooriya
2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing - System Demonstrations, pp. 25 - 30, Association for Computational Linguistics
Abstract
The Knowledge Graph Induction Service (KGIS) is an end-to-end knowledge induction system. One of its main capabilities is to automatically induce taxonomies from input documents using a hybrid approach that takes advantage of linguistic patterns, semantic web and neural networks. KGIS allows the user to semi-automatically curate and expand the induced taxonomy through a component called Smart SpreadSheet by exploiting distributional semantics. In this paper, we describe these taxonomy induction and expansion features of KGIS. A screencast video demonstrating the system is available in https://ibm.box.com/v/emnlp-2019-demo .
2018
Learning Relational Representations by Analogy using Hierarchical Siamese Networks
Gaetano Rossiello, Alfio Gliozzo, Robert Farrell, Nicolas Fauceglia, Michael Glass
2018
Discovering Implicit Knowledge with Unary Relations
Michael Glass, Alfio Gliozzo
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1585--1594, 2018
A dataset for web-scale knowledge base population
Michael Glass, Alfio Gliozzo
European Semantic Web Conference, pp. 256--271, 2018
Inducing implicit relations from text using distantly supervised deep nets
Michael Glass, Alfio Gliozzo, Oktie Hassanzadeh, Nandana Mihindukulasooriya, Gaetano Rossiello
International Semantic Web Conference, pp. 38--55, 2018
Inducing Implicit Relations from Text using Distantly Supervised Deep Nets
Michael Glass, Alfio Gliozzo, Oktie Hassanzadeh, Nandana Mihindukulasooriya, Gaetano Rossiello
Proceedings of International Semantic Web Conference , pp. 38--55, Springer, 2018
Abstract
Knowledge Base Population (KBP) is an important problem in Semantic Web research and a key requirement for successful adoption of semantic technologies in many applications. In this paper we present Socrates, a deep learning based solution for Automated Knowledge Base Population from Text. Socrates does not require manual annotations which would make the solution hard to adapt to a new domain. Instead, it exploits a partially populated knowledge base and a large corpus of text documents to train a set of deep neural network models. As a result of the training process, the system learns how to identify implicit relations between entities across a highly heterogeneous set of documents from various sources, making it suitable for large-scale knowledge extraction from Web documents. Main contributions of this paper include (a) a novel approach based on composite contexts to acquire implicit relations from Title Oriented Documents, and (b) an architecture for unifying relation extraction using binary, unary, and composite contexts. We provide an extensive evaluation of the system across three different benchmarks with different characteristics, showing that our unified framework can consistently outperform state of the art solutions. Remarkably, Socrates ranked first in both the knowledge base population and attribute validation track at the Semantic Web Challenge at ISWC 2017.
Semantic Concept Discovery Over Event Databases
Oktie Hassanzadeh, Shari Trewin and Alfio Gliozzo
The Semantic Web. ESWC 2018. Lecture Notes in Computer Science, pp. 288-303, Springer, Cham
Abstract (Winner of Best In Use Paper Award)
In this paper, we study the problem of identifying certain types of concept (e.g., persons, organizations, topics) for a given analysis question with the goal of assisting a human analyst in writing a deep analysis report. We consider a case where we have a large event database describing events and their associated news articles along with meta-data describing various event attributes such as people and organizations involved and the topic of the event. We describe the use of semantic technologies in question understanding and deep analysis of the event database, and show a detailed evaluation of our proposed concept discovery techniques using reports from Human Rights Watch organization and other sources. Our study finds that combining our neural network based semantic term embeddings over structured data with an index-based method can significantly outperform either method alone.
doi
(Winner of Best In Use Paper Award)
2017
Towards Comprehensive Noise Detection in Automatically Created Knowledge Graphs
Nandana Mihindukulasooriya, Oktie Hassanzadeh, Sarthak Dash, Alfio Gliozzo
Proceedings of the ISWC 2017 (Posters & Demonstrations and Industry Tracks), CEUR Workshop Proceedings
Abstract
Knowledge Graphs (KGs) play a key role in many artificial intelligence applications. Large KGs are often constructed through a noisy automatic knowledge extraction process. Noise detection is, therefore, an important task for having high-quality KGs. We argue that the current noise detection approaches only focus on a specific type of noise (i.e., fact-checking) whereas knowledge extraction methods result in more than one type of noise. To this end, we propose a classification of noise found in automatically-constructed KGs, and an approach for noise detection focused on specific types of noise.
Semantic Concept Discovery Over Event Data
Oktie Hassanzadeh, Shari Trewin, Alfio Massimiliano Gliozzo
ISWC (Industry Track), 2017
Towards Comprehensive Noise Detection in Automatically Created Knowledge Graphs
Nandana Mihindukulasooriya, Oktie Hassanzadeh, Sarthak Dash, Alfio Gliozzo
ISWC 2017
Semantic Concept Discovery Over Event Data
Hassanzadeh, Oktie and Trewin, Shari and Gliozzo, Alfio
16th International Semantic Web Conference (ISWC), 2017
Abstract
Preparing a comprehensive, accurate, and unbiased report on a given topic or question is a challenging task. The first step is often a daunting discovery task that requires searching through an overwhelming number of information sources without introducing bias from the
2016
JOINT LEARNING OF LOCAL AND GLOBAL FEATURES FOR ENTITY LINKING VIA NEURAL NETWORKS
Thien Huu Nguyen, Nicolas Fauceglia, Mariano Rodriguez Muro, Oktie Hassanzadeh, Alfio Massimiliano Gliozzo, Mohammad Sadoghi
recurrent neural network, pattern recognition, machine learning, entity linking, convolution, computer science, computer program, artificial neural network, artificial intelligence
Abstract
A system, method and computer program product for disambiguating one or more entity mentions in one or more documents. The method facilitates the simultaneous linking entity mentions in a document based on convolution neural networks and recurrent neural networks that model both the local and global features for entity linking. The framework uses the capacity of convolution neural networks to induce the underlying representations for local contexts and the advantage of recurrent neural networks to adaptively compress variable length sequences of predictions for global constraints. The RNN functions to accumulate information about the previous entity mentions and/or target entities, and provide them as the global constraints for the linking process of a current entity mention.
Personalized Tolerance Prediction of Adverse Drug Events
Mohammad Sadoghi, Achille Fokoue-Nkoutche, Ping Zhang, Oktie Hassanzadeh, Meinolf Sellmann
Abstract
Embodiments include method, systems and computer program products for predicting adverse drug events on a computational system. Aspects include receiving a personalized data set including a plurality of real-time drug doses for a first drug or drug combination and a plurality of corresponding real-time adverse drug reaction tolerance data for the first drug or drug combination for a patient. Aspects also include receiving known drug data for a candidate drug or drug pair. Aspects also include calculating, based upon the known drug data and the personalized data set, a predicted adverse drug reaction tolerance for the candidate drug or drug pair at a candidate dosage, wherein the predicted adverse drug reaction tolerance is personalized to the patient.
Joint Learning of Local and Global Features for Entity Linking via Neural Networks.
Nguyen, Thien Huu and Fauceglia, Nicolas and Rodriguez-Muro, Mariano and Hassanzadeh, Oktie and Gliozzo, Alfio Massimiliano and Sadoghi, Mohammad
COLING, pp. 2310--2320, 2016
Abstract
Abstract Previous studies have highlighted the necessity for entity linking systems to capture the local entity-mention similarities and the global topical coherence. We introduce a novel framework based on convolutional neural networks and recurrent neural networks to
An entity-focused approach to generating company descriptions
Saldanha, Gavin and Biran, Or and McKeown, Kathleen and Gliozzo, Alfio
The 54th Annual Meeting of the Association for Computational Linguistics, pp. 243, 2016
Abstract
Abstract Finding quality descriptions on the web, such as those found in Wikipedia articles, of newer companies can be difficult: search engines show many pages with varying relevance, while multi-document summarization algorithms find it difficult to distinguish
EXTRACTION OF SEMANTIC RELATIONS USING DISTRIBUTIONAL RELATION DETECTION
Bornea, Mihaela A and Fan, James J and Gliozzo, Alfio M and Welty, Christopher A
US Patent 20,160,148,116
Abstract
Abstract: According to an aspect, a pair of related entities that includes a first entity and a second entity is received. Distributional relations are detected between the first entity and the second entity. The detecting includes identifying two sets of entities in a corpus, the first
2015
Word sense disambiguation via propstore and ontonotes for event mention detection
Fauceglia, Nicolas R and Lin, Yiu-Chang and Ma, Xuezhe and Hovy, Eduard
Proceedings of the The 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation, pp. 11--15, 2015
Abstract
Abstract In this paper, we propose a novel approach for Word Sense Disambiguation (WSD) of verbs that can be applied directly in the event mention detection task to classify event types. By using the PropStore, a database of relations between words, our approach
CMU System for Entity Discovery and Linking at TAC-KBP 2015
Fauceglia, Nicolas and Lin, Yiu-Chang and Ma, Xuezhe and Hovy, Eduard
Proceedings of the Eighth Text Analysis Conference (TAC2015)
Abstract
Abstract This paper describes CMU's system for the Tri-lingual Entity Discovery and Linking (TEDL) task at TAC-KBP 2015. Our system is a unified graph-based approach which is able to do concept disambiguation and entity linking simultaneously, leveraging the ontology built
Querying and integrating structured and unstructured data
Bornea, Mihaela Ancuta and Duan, Songyun and Fan, James J and Fokoue-Nkoutche, Achille and Gliozzo, Alfio M and Kalyanpur, Aditya and Kementsietsidis, Anastasios and Srinivas, Kavitha and Ward, Michael J
US Patent 9,037,615
Abstract
A computer-implemented method, system, and article of manufacture for querying and integrating structured and unstructured data. The method includes: receiving entity information that is extracted from a first set of unstructured data using an open domain
2014
Word Semantic Representations using Bayesian Probabilistic Tensor Factorization.
Zhang, Jingwei and Salwen, Jeremy and Glass, Michael R and Gliozzo, Alfio Massimiliano
EMNLP, pp. 1522--1531, 2014
Abstract
Abstract Many forms of word relatedness have been developed, providing different perspectives on word similarity. We introduce a Bayesian probabilistic tensor factorization model for synthesizing a single word vector representation and per-perspective linear
Lexical Substitution for the Medical Domain.
Riedl, Martin and Glass, Michael R and Gliozzo, Alfio Massimiliano
EMNLP, pp. 610--614, 2014
Abstract
Abstract In this paper we examine the lexical substitution task for the medical domain. We adapt the current best system from the open domain, which trains a single classifier for all instances using delexicalized features. We show significant improvements over a strong
2013
JoBimText visualizer: a graph-based approach to contextualizing distributional similarity
Biemann, Chris and Coppola, Bonaventura and Glass, Michael R and Gliozzo, Alfio and Hatem, Matthew and Riedl, Martin
Proceedings of TextGraphs-8 Graph-based Methods for Natural Language Processing, 6--10, 2013
Abstract
Abstract We introduce an interactive visualization component for the JoBimText project. JoBim-Text is an open source platform for large-scale distributional semantics based on graph representations. First we describe the underlying technology for computing a
2012
When did that happen?: linking events and relations to timestamps
Hovy, Dirk and Fan, James and Gliozzo, Alfio and Patwardhan, Siddharth and Welty, Chris
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 185--193, 2012
Abstract
Abstract We present work on linking events and fluents (ie, relations that hold for certain periods of time) to temporal information in text, which is an important enabler for many applications such as timelines and reasoning. Previous research has mainly focused on
Structured term recognition
Glass, Michael R and Gliozzo, Alfio M
US Patent App. 13/667,729
Abstract
A method, system and computer program product for recognizing terms in a specified corpus. In one embodiment, the method comprises providing a set of known terms t∈ T, each of the known terms t belonging to a set of types Γ (t)={γ1,...}, wherein each of the terms
2011
2009
Semantic domains in computational linguistics
Gliozzo, Alfio and Strapparava, Carlo
2009 - books.google.com, Springer Science & Business Media
Abstract
Semantic fields are lexically coherent–the words they contain co-occur in texts. In this book the authors introduce and define semantic domains, a computational model for lexical semantics inspired by the theory of semantic fields. Semantic domains allow us to exploit
2008
LMM: an OWL-DL MetaModel to Represent Heterogeneous Lexical Knowledge.
Picca, Davide and Gliozzo, Alfio Massimiliano and Gangemi, Aldo
LREC, 2008
Abstract
Abstract In this paper we present a Linguistic Meta-Model (LMM) allowing a semiotic-cognitive representation of knowledge. LMM is freely available and integrates the schemata of linguistic knowledge resources, such as WordNet and FrameNet, as well as foundational
2007
Instance Based Lexical Entailment for Ontology Population.
Giuliano, Claudio and Gliozzo, Alfio Massimiliano
EMNLP-CoNLL, pp. 248--256, 2007
Abstract
Abstract In this paper we propose an instance based method for lexical entailment and apply it to automatic ontology population from text. The approach is fully unsupervised and based on kernel methods. We demonstrate the effectiveness of our technique largely surpassing
2005
Crossing Parallel Corpora and Multilingual Lexical Databases for WSD.
Gliozzo, Alfio Massimiliano and Ranieri, Marcello and Strapparava, Carlo
CICLing, pp. 242--245, 2005
Abstract
Word Sense Disambiguation (WSD) is the task of selecting the correct sense of a word in a context from a sense repository. Typically, WSD is approached as a supervised classification task to get state-of-the-art performance (eg [1]), and thus a large amount of sense-tagged
Automatic Assessment of Students' Free-Text Answers Underpinned by the Combination of a BLEU-Inspired Algorithm and Latent Semantic Analysis.
P{\'e}rez, Diana and Gliozzo, Alfio Massimiliano and Strapparava, Carlo and Alfonseca, Enrique and Rodriguez, Pilar and Magnini, Bernardo
FLAIRS conference, pp. 358--363, 2005
Abstract
Abstract In previous work we have proved that the BLEU algorithm (Papineni et al. 2001), originally devised for evaluating Machine Translation systems, can be applied to assessing short essays written by students. In this paper we present a comparative evaluation between
Instance pruning by filtering uninformative words: an information extraction case study
Gliozzo, Alfio Massimiliano and Giuliano, Claudio and Rinaldi, Raffaella
International Conference on Intelligent Text Processing and Computational Linguistics, pp. 498--509, 2005
Abstract
Abstract In this paper we present a novel instance pruning technique for Information Extraction (IE). In particular, our technique filters out uninformative words from texts on the basis of the assumption that very frequent words in the language do not provide any specific
Automatic acquisition of domain specific lexicons
Gliozzo, Alfio and Strapparava, Carlo and d’Avanzo, Ernesto and Magnini, B
The IST Programme Shared-Cost RTD MEANING Developing Multilingual Web-scale Language Technologies, 2005
Abstract
Abstract In this paper we present the results of three years of experiments about automatic acquisition of domain specific terminology from corpora. We present an analysis of the potentiality and limitations of the Term Categorization approach to lexical acquisition, and
2004
Unsupervised Domain Relevance Estimation for Word Sense Disambiguation.
Gliozzo, Alfio Massimiliano and Magnini, Bernardo and Strapparava, Carlo
EMNLP, pp. 380--387, 2004
2001
Using domain information for word sense disambiguation
Magnini, Bernardo and Strapparava, Carlo and Pezzulo, Giovanni and Gliozzo, Alfio
The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems, pp. 111--114, 2001
Abstract
Abstract The major goal in ITC-irst's participation at Senseval-2 was to test the role of domain information in word sense disambiguation. The underlying working hypothesis is that domain labels, such as Medicine, Architecture and Sport provide a natural way to
Year Unknown
Towards Comprehensive Noise Detection in Automatically-Created Knowledge Graphs
Mihindukulasooriya, Nandana and Hassanzadeh, Oktie and Dash, Sarthak and Gliozzo, Alfio
iswc2017.ai.wu.ac.at
Abstract
Abstract. Knowledge Graphs (KGs) play a key role in many artificial intelligence applications. Large KGs are often constructed through a noisy automatic knowledge extraction process. Noise detection is, therefore, an important task for having high-quality KGs. We argue that