AAAI 2019 Reasoning and Complex QA Workshop - Schedule
Detailed Agenda (Draft)
Workshop Date: January 28, 2019 (Monday)
Venue: Hilton Hawaiian Village, Honolulu
Room: Nautilus 1, Sixth Floor, Mid Pacific Conference Center
Proceedings: PDF Proceedings
- 09:00 - 09:15 Workshop welcome, preliminaries, etc.
- 09:15 - 10:15 Invited Talk: Eduard Hovy
- 10:15 - 10:30 Paper 5: Translating Natural Language to SQL using Pointer-Generator Networks and How Decoding Order Matters
- 10:30 - 11:00 Coffee Break (30 mins) -- Poster Session
- 11:00 - 11:15 Paper 7: An Automated Question-Answering Framework Based On Evolution Algorithm
- 11:15 - 11:30 Paper 6: TallyQA: Answering Complex Counting Questions
- 11:30 - 12:00 Invited Talk: Ken Forbus
- 12:00 - 12:30 Invited Talk: Ashish Sabharwal
- 12:30 - 14:00 Lunch Break
- 14:00 - 14:15 Paper 8: Natural Language Question Answering over BI data using a Pipeline of Multiple Sequence tagging Networks
- 14:15 - 14:30 Paper 2: Understanding Complex Multi-sentence Entity seeking Questions
- 14:30 - 15:00 Invited Talk: Chitta Baral
- 15:00 - 15:15 Paper 4: Specific Question Generation for Reading Comprehension
- 15:15 - 15:45 Coffee Break (30 mins) -- Poster Session
- 15:45 - 16:00 Paper 10: Question Relatedness on Stack Overflow: The Task, Dataset, and Corpus-inspired Models
- 16:00 - 16:30 Invited Talk: Michael Witbrock
- 16:30 - 17:00 Panel
- From Simple to Complex QA [slides]
- Speaker: Eduard Hovy
- Abstract: In modern automated QA research, what are the criteria that differentiate simple/shallow from complex/deep QA? Early QA research developed pattern-learning and -matching techniques to identify the appropriate factoid answer(s), and this work has been taken a step further by recent neural architectures that seem to learn and apply more-flexible generalized word/type-sequence ‘patterns’. But the point of this workshop is that there is more to QA than patterns: crucially, that many QA tasks require some sort of intermediate reasoning or other inference procedures more complex than word and phrase generalization. Typical current approaches focus on the automated construction of small procedures to access the answer in structured resources like tables or databases. But much (or most) knowledge is not structured, and what to do in this case is unclear. The main problem facing this line of research is the difficulty in defining exactly what kinds of reasoning are relevant, and what knowledge resources are required to support them. If all relevant knowledge is apparent on the surface of the question material, then shallow pattern-matching techniques (perhaps involving combinations of patterns) can surely be developed using simple/shallow methods. But if not, then (at least some of) the relevant knowledge is either internal to the QA system or resides in some additional, external resource (like the web), which makes the design and construction of general comprehensive datasets and evaluations very difficult. (In fact, the same problem faces all in-depth semantic analysis research topics, including entailment, machine reading, semantic information extraction, and more.) How should the Complex-QA community respond to this conundrum? In this talk I outline the problem and propose a general direction for future research.
- Analogical Training for Question-Answering [slides]
- Speaker: Ken Forbus
- Abstract: Human question answering capabilities remain more sophisticated and flexible than today’s AI systems, and human learning is far more data-efficient than deep learning. Our hypotheses are that (a) people use multiple layers of rich, relational representations to encode language and knowledge, and (b) they use analogical processing heavily in learning and reasoning. Building on these hypotheses, we have developed a technique, analogical QA training, which can provide state of the art (or near state of the art) performance on multiple tasks. This talk will outline the basic approach to analogical QA and some prior experiments, summarize step semantics, a layer of representation for processes that combines discrete and continuous models, and how we have applied these ideas to the ProPara dataset. This work has been done in collaboration with Max Crouse, Danilo Ribeiro, Tom Hinrichs, Maria Chang, and Michael Witbrock.
- Bio: Kenneth D. Forbus is the Walter P. Murphy Professor of Computer Science and Professor of Education at Northwestern University. He received his degrees from MIT (Ph.D. in 1984). His research interests include qualitative reasoning, analogical reasoning and learning, spatial reasoning, sketch understanding, natural language understanding, cognitive architecture, reasoning system design, intelligent educational software, and the use of AI in interactive entertainment. He is a Fellow of the Association for the Advancement of Artificial Intelligence, the Cognitive Science Society, and the Association for Computing Machinery. He is the inaugural recipient of the Herbert A. Simon Prize, a recipient of the Humboldt Research Award and served as Chair of the Cognitive Science Society.
- Multihop Reasoning: Datasets, Models, and Leaderboards [slides]
- Speaker: Ashish Sabharwal
- Abstract: Despite remarkable success of neural models on several question answering (QA) datasets, especially in the reading comprehension setting, reliably performing multihop reasoning over text (i.e., combining multiple pieces of textual knowledge) remains a formidable challenge. This is especially true when the question requires background knowledge or a simple theory of how the world operates. This talk will summarize three pieces of recent work at AI2 that focus on this challenge, and end with an open discussion of a fourth, often-neglected aspect. I'll start with OpenBookQA (including in-progress v2), a dataset designed to probe the understanding of a basic scientific principle by asking indirectly about it, via a combination with a piece of common knowledge. I'll describe MulTeE, a recent neural model that fulfills the promise of Textual Entailment models (in particular, ESIM) being a valuable sub-module for solving challenging end-tasks, in this case OpenBookQA and MultiRC. I'll then briefly describe ProPara, a challenge dataset for reasoning about states and actions in procedural text, ranging from photosynthesis to how a dishwasher works. Finally, I'll end with an open discussion of an increasingly prevalent yet little discussed piece of the QA ecosystem: design choices for Leaderboards.
- Bio: As a research scientist at the Allen Institute for AI (AI2), Ashish Sabharwal works on semi-structured knowledge representations and scalable reasoning mechanisms. He is interested in building systems that have a formal underpinning and that can incorporate human insight and biases, learn from limited amounts of data, leverage large knowledge sources, and provide some form of explanation or reasoning supporting a conclusion. His research pushes the boundaries of inference techniques in combinatorial and probabilistic spaces, graphical models, and discrete optimization, and uses advances in these core methods to help solve key challenges in machine intelligence, particularly involving natural language understanding and reasoning. Prior to joining AI2, Ashish spent over three years at IBM Watson and five years at Cornell University, after obtaining his Ph.D. from the University of Washington in 2005. Ashish has co-authored over 90 publications, was on the IBM team that won in SAT Competitions in 2011-2013, and has been the recipient of six best paper awards and runner-up prizes at venues such as AAAI, UAI, and IJCAI.
- Question Answering that requires reasoning, common-sense and deeper understanding of the world [slides]
- Speaker: Chitta Baral
- Abstract: In recent years several Question answering challenges have been proposed to benchmark progress in specific AI disciplines (such as nature language understanding and image understanding) as well as general AI. This includes the Winograd Schema Challenge that has been proposed as an alternative to the Turing test. It seems many of these challenges require reasoning, common-sense, and deeper understanding of the world. In this talk we will discuss some of these challenges and point out aspects of these challenges that necessitate reasoning and common-sense knowledge. We will also discuss our thoughts on common-sense knowledge acquisition and various kinds of reasoning needed for various different forms of knowledge.
- Integrating Careful Reasoning and Learning for CQA [slides]
- Speaker: Michael Witbrock
- Abstract: Despite tremendous progress in AI, our most advanced systems are relatively narrow and far from their full potential. At IBM Research AI, our goal is to push the boundaries of AI by moving away from opaque, task-specific, end-to-end models, and toward explainable, multi-task models that learn and reason in human-like ways. To advance toward this goal, we are exploring prospects for integrating deep learning with careful and deliberate reasoning. Such hybrid systems can combine the benefits of deep learning (highly specialized skills from the abundance of data available to us) with the benefits of reasoning, including the acquisition of portable, explainable skills that can be shared and communicated with human collaborators. We believe that this area, and reasoning in particular, is ripe for rapid progress. In this talk, we share our progress combining these approaches in complementary ways. Specifically, we have found that deep learning over structured knowledge representations (such as graphs or logical clauses) can be used not only for end-to-end question answering systems, but also for question answering or reasoning subtasks, like natural language entailment, link prediction, and premise selection. We are also using neural networks for rule induction and theory learning, which can be used for downstream explainable symbolic reasoning. Moving forward, we believe that a commitment to broad, composable skills and representations will be essential to tackling other forms of complex problem solving.