NLP for Knowledge Extraction, Active Learning, and Explainable Argumentation Mining - overview


NLP for Knowledge Extraction, Active Learning, and Explainable Argumentation Mining

Description: Nowadays, Natural Language Processing (NLP) is one of the key aspects in AI Research. At the Dublin Research Lab, we exploit NLP in several projects and we are interested in exploring novel and competitive solutions to NLP tasks. From research to product, we are interested in scientific contribution to the field, as well as real world innovative solutions. 

Our researchers contribute and participate to the main NLP conferences like ACL, EMNLP, COLING, as well as related conferences like SIGIR, AAAI, etc, on topics that vary from Information Extraction, Argumentation Mining, Sentiment Analysis, Information Retrieval and many others.

Different and varied internships opportunities are open in the area:


1. Active learning for complex annotation tasks

Active learning has largely been a success in the literature in reducing the annotation effort needed to train supervised machine learning, and thereby reducing the time and expense of getting high quality annotated data. Active learning can generally help supervised machine learning tasks, but we focus in particular on natural language processing and text mining tasks in medical documents because of the difficulty of the annotation and the added expense of expert medical annotation.

The active learning literature in NLP has produced very positive results across NLP tasks (e.g., part-of-speech tagging, named entity recognition, parsing, information extraction) and with different annotation selection strategies (e.g., uncertainty, query by committee, information density). However, the literature typically uses simulated annotation scenarios where part of the labelled dataset is treated as unlabelled and the labels are selectively added back to mimic annotation. This simulation is not realistic however as the units of annotation they select are out of context. In addition, state of the art approaches tend to treat machine learning instances to be the units of annotation. Real-world annotation would require at least the surrounding sentence and likely more, and this would correspond to multiple instances for a machine learning algorithm (e.g., a paragraph may be needed for accurate annotation, but the trained classifier handles one sentence at a time). These more complex units of annotation are often necessary in practice, but have not really been explored in the literature.

This internship aims to tackle these complex annotation tasks with active learning and develop new selection strategies that work over documents, entities, or other annotation units.  The results of the internship will mainly target to AI/NLP conferences (e.g., ACL, EMNLP, etc.), but could extend to UI or HCI conferences (e.g., IUI).

Required Skills:
- Research expertise or experience in NLP, Machine Learning
- Strong programming skills in Python
- Good communication skills

 

2. Constructing Knowledge Graphs for Scientific Literature

The scientific literature holds our understanding of every field of research. With the exponential growth in research volumes, it has become difficult for researchers to keep track of the quickly expanding knowledge in their own field. Therefore, it has become crucial to provide tools for scientists/practitioners to organize and integrate these vast amount of information. In NLP, the research on Information Extraction (IE) and KBP (Knowledge Base Population) has been mainly focused on the news domain and the public common knowledge bases, such as Freebase and Yago. Constructing knowledge graphs for various scientific domains is less well studied.

This internship aims to develop new IE algorithms to extract and understand information from scientific papers,  on a large corpus of Computer Science/NLP scientific papers, as well as on Medical domain. This task presents several challenges that can drive interesting research questions, such as: 1) the identification of outcomes in scientific papers; 2) the interpretation of tables; 3) cross-document entity resolution; 4) cross-sentence relation extraction; 5) concept description extraction from scientific literature; 6) constructing knowledge graphs from scientific papers. 

The end goal of the internship is a research prototype and submissions in top key NLP conferences (ACL, EMNLP, NAACL, COLING).

Required Skills:
- Research expertise or experience in NLP, Machine Learning, Information extraction
- Strong programming skills in Java/Python
- Good communication skills

 

3. Explainable Argumentation Mining

Computational argumentation focus on analysing argumentation in text and developing tools to automatically extract, aggregate, summarize and reason about arguments in natural language. Recent advances in computational argumentation mainly focus on "prediction", such as argument unit (e.g., claims, premises) identification, argumentative relation (e.g., support, attack) prediction between argument units from the same or different text. However, the inferences required to explain the relations between argument units in natural language is still not well explored. For instance, why a premise supports a claim? And why claim A (genetically modified foods reduce the damage caused by pesticides to wildlife) attacks claim B (genetically modified foods mix with native plants and reduce genetic diversity)?

This project aims to explore frameworks to "explain" explicit and implicit reasoning occurring in natural language argumentation. We will create topic specific knowledge base for argumentation and apply it to argumentation mining and argumentation generation. The research topics include: 1) argumentation knowledge graph construction and evaluation; 2) argument relation explanation using knowledge graph; 3) counter-argument generation.

Required Skills:
- Strong Programming skills in Java/Python
- Knowledge about Natural Language Processing and Machine Learning
- Good communication skills