The DeepQA Research Team - DeepQA

Real language is real hard for computers to grasp. The meaning behind the words is implicit, ambiguous and highly contextual.

The underlying philosophy of our research approach is that true intelligence will emerge from the development and integration of many different algorithms each looking at the data from different perspectives. No one programmer, no one program design from top to bottom will have all it needs to understand language. Rather a system must evolve from the continuous contribution of many different algorithms. These must all balance and combine to form a holistic and accurate interpretation of the intended meaning.

DeepQA is a software architecture for deep content analysis and evidence-based reasoning that embodies that philosophy.

DeepQA Architecture

It represents a powerful capability that uses advanced natural language processing, semantic analysis, information retrieval, automated reasoning and machine learning. DeepQA deeply analyzes natural language input to better find, synthesize, deliver and organize relevant answers and their justifications from the wealth of knowledge available in a combination of existing natural language text and databases.

The DeepQA architecture views the problem of Automatic Question Answering as a massively parallel hypothesis generation and evaluation task. As a result DeepQA is not just about question-in/answer-out – rather it can be viewed as a system that performs differential diagnosis: it generates a wide range of possibilities and for each develops a level of confidence by gathering, analyzing and assessing evidence based on available data.

With a question, a topic, a case or a set of related questions, DeepQA finds the important concepts and relations in the input language, builds a representation of the user’s information need and then through search generates many possible responses. For each possible response it spawns independent and competing threads that gather, evaluate and combine different types of evidence from structured and unstructured sources. It can deliver a ranked list of responses each associated with an Evidence Profile describing the supporting evidence and how it was weighted by DeepQA’s internal algorithms.

There are many different tasks that can help drive the technology of automatic, open-domain question answering. Many lessons were learned from prior attempts at Question Answering systems directed at specific tasks. They all ultimately influenced DeepQA’s directions on what to do and on what not to do.

In this white paper several challenge problems are discussed. The overarching theme however is that the best way to achieve rapid and general advancement in the field is to evolve a common architecture that can easily adapt to many different challenge problems.

With the desire to rapidly and efficiently advance the field of question-answering but more generally to advance the field of natural language understanding we took on the Jeopardy! Challenge. Jeopardy! forced us to break the mold and create something novel because of its requirements for answering questions expressed in rich language over a very broad domain with high precision, accurate confidence estimations and speed.