The DeepQA Research Team - Real Language is Real Hard


Understanding natural language, what we humans use to communicate with one another every day, is a notoriously difficult challenge for computers. Language to us is simple and intuitive and ambiguity is often a source of humor and not frustration.

"One morning I shot an elephant in my pajamas. How he got in my pajamas, I don't know."
- Groucho Mark, Animal Crackers

When we read "I shot an elephant in my pajamas," our brains don't even consider that the pajamas are on the elephant because it's absurd. But why is it absurd? Without knowledge of the relative sizes of humans and elephants and the greater tendency for humans to wear pajamas than elephants, the alternative interpretations seem equally likely. So why is language easy for us?

As we humans process language, we pare down alternatives using our incredible abilities to reason based on our knowledge. We also use any context couching the language to further promote certain understandings. These things allow us to deal with the implicit, highly contextual, ambiguous and often imprecise nature of language.

So is the solution to build a knowledge representation of the world and use that when processing natural language with the computer? In limited subject domains this can be very useful and in the Watson project we integrate this approach. For example, in the Jeopardy! category "The Northernmost Capital City," we use precise and precisely accessible geographic data to determine that the Northernmost of Manila, Kathmandu and Jakarta is Kathmandu. (We're ignoring that we had to decide on Manila, Philippines rather than the same-named city in Utah or Arizona!)

But what about this clue?

"If you're standing, it's the direction you should look to check out the wainscoting."

How extensive would our knowledge representation of the world need to be before we could reason that people are usually taller than wainscoting and so "down" is the direction you would need to look before seeing it? Far too extensive for us to be able to compete in The Jeopardy! Challenge this year. So we can't solely rely on structured data for our performance; it would simply be too limiting.

Another way to look at the problem of natural language is to put the shoe on the other foot. Let's consider a question that a computer finds very simple but that we find difficult.

ln((12546798π)^2)/34567.46

Here's a question that a computer can almost instantaneously tell you is about 0.00885. We humans would be hard pressed just to say if the answer were greater than or less than one!

When a computer is presented a very precisely described question and it can achieve the answer in a very precisely defined way, then its performance is unbeatable. But if either the question or the answer is in natural language, the computer's strengths are no longer directly applicable. With Watson and The Jeopardy! Challenge this is always the case as the question is always in natural language and often we have to find the answer from among the natural language texts Watson has read. For example, let's consider dealing with this very clear question:

Where was Albert Einstein born?

With a database of birthplaces, an example of structured data, this would be very simple to answer. We'd simply find "Albert Einstein" in the column of names and reply with whatever we found in the corresponding column of birthplaces:

Name Birthplace
Albert Einstein Ulm

But what if the answer were embedded in natural language:

One day, from among his city views of Ulm, Otto chose a water color to send to Albert Einstein as a remembrance of Einstein's birthplace.

The answer is in here but what is it exactly? First, how did we even choose to pay attention to this sentence? We had to read lots of sentences to decide this one might be relevant. And even once we have a passage that may have the correct answer, extracting that answer precisely is hardly easy; there are mentions of people and places and unwinding their relationships to put Einstein's birthplace in Ulm is difficult.

Of course not every text is so difficult to understand. From this passage for example, it's quite clear that Jack Welch was a heckuva painter.

If leadership is an art then surely Jack Welch has proved himself a master painter during his tenure at GE.

The approaches to this difficult problem are myriad and so our first task was to build a system that would allow us to organize and coordinate many different algorithms in a way that would ultimately lead us to a solution to the Question Answering task in general and to The Jeopardy! Challenge specifically. That solution is what we call DeepQA.