Conversational UX Design       


Conversational UX Design - Navigation

2. Conversation Navigation

With any computer interface, users must learn how to navigate the space. In command-line interfaces, users learn to navigate directories through cryptic commands. In graphical interfaces, users learn to drag files on a desktop to folders. In web interfaces, users learn to jump from page to page with URLs and hypertext. And in mobile interfaces, users learn to touch the screen, rotate it and "pinch" the images. But how should users navigate a conversational interface? What are the basic actions that they can always rely on at any point to navigate the conversation space?

Natural human conversation contains devices for its own management, as we see with sequence expansions. We propose a subset of these as 6 basic actions for Conversation Navigation:

6 Basic Navigation Actions
1. What can you do?     help (global)
2. What did you say?    back
3. What do you mean?    help (local)
4. Okay/Thanks          forward
5. Never mind           escape
6. Goodbye              end

Capability: Users should always be able to ask the agent what it can do. "What can you do?" is perhaps the most general request for a description of the system's scope and functionality. It should give the user enough guidance to use the app or to ask more specific questions about capability.

Repeat: In voice interfaces, unlike text, utterances are transient; therefore, users must be able to elicit repeats of all or part of the agent's utterances. "What did you say?" is the natural way to request a full repeat of the prior utterance. In voice interfaces, requesting repeats is like 'going back' in other interfaces.

Paraphrase: While capability checks provide global help to the user, paraphrase devices provide local help on a turn-level basis. "What do you mean?" is elicits an elaboration of the prior utterance. Elaborations should use plainer language to express the same intent as the more efficient, standard response.

Close Sequence: Users should be able to close the current sequence when they receive an adequate response and move on to the next sequence. "Okay" or "thanks" are natural ways to for the user to signal the completion of the current sequence and invite the agent to move on. This can also be a good place to reset context variables that may conflict with subsequent requests.

Abort Sequence: When users fail to elicit an adequate response from the agent, they should be able to abort the current sequence and move on to the next. "Never mind" in a conversation functions somewhat like 'escape' in a computer interface. It enables the user to give up and move on.

Close Conversation: As in a human conversation, users should be encouraged to close their interaction with the system. "Goodbye" is the natural way to move to end a conversation. The agent should treat the user's attempt to close the conversation as a pre-closing. The pre-closing gives the agent the opportunity to bring up a last topic before returning a "goodbye" to the user.

After any agent utterance in a conversation, users should be able to do any of the above actions. At first they must be taught that, unlike Google or even Siri, your conversational agent will recognize these 6 basic actions and respond appropriately and usefully. Because they are based on human conversation, they should already be familiar to the user and natural to perform.

Take the following two working examples from Alma...

Conversation Navigation A
01 U:  what can you do?
02 A:  I haven't been trained yet. I can only make small
03     talk, answer or ask some trivia questions and
04     tell some jokes.
05 U:  ok who invented the hard disk?
06 A:  It was invented by IBM in 1956.
07 U:  say again
08 A:  It was invented by IBM in 1956.
09 U:  what do you mean?
10 A:  The hard disk was invented in 1956 by engineers
11     at IBM's laboratory in San Jose, California.
12 U:  thanks
13 A:  You're welcome!
14 A:  Anything else for you?
15 U:  goodbye
16 A:  Did you find what you were looking for?

In excerpt A, the user relies on 5 of the basic actions to navigate the conversation. At the beginning, he checks the agent's capabilities (lines 1-4) and then does an action within the scope of that response, a technology trivia question (line 5). In response to the answer, the user then requests a repeat (line 7), followed by an elaboration request (line 9). The user then closes the trivia-question sequence with an appreciation (line 12) and moves to close the conversation (line 15). Instead of completing the closing sequence, the agent treats it as a pre-closing and brings up a last topic, a success check (line 16). Now contrast excerpt B...

Conversation Navigation B
01 U:  show me flights
02 A:  What do you mean?
03 U:  show me flights from SFO to JFK
04 A:  I'm sorry, I still don't understand.
05 U:  never mind
06 A:  Whatever you say, friend.
07 A:  Want to know what I can do?
08 U:  yes please
09 A:  Okay.
10 A:  I haven't been trained yet. I can only 
11     make small talk, answer or ask some 
12     trivia questions and tell some jokes.

The user in excerpt B fails to check the agent's capabilities at the beginning of the conversation and instead launches into a flight request (line 1). This time the agent responds with an elaboration request (line 2) to which the user offers an elaboration (line 3). This still fails to enable the agent to understand (line 4) so the user aborts the flight request sequence (line 5). In response, the agent offers to describe its capabilities (line 7), which the user accepts (line 8) and can use to regain alignment with the agent.

The 6 basic navigation actions enable the user to get in and out of sequences and the conversation itself. They also enable the user to get help globally and on a local, turn-by-turn basis. With this set of actions, users can thus navigate the conversation space of the application and recover when they get stuck. And because the 6 basic actions reflect corresponding actions in natural human conversation, they are already familiar to users.

Our first implementation of the Natural Conversation Framework was for IBM's What's in Theaters app in 2015 (try it here if it's still up). What's in Theaters was built on the Watson Dialog service (the precursor to Watson Conversation) as a simple demonstration of how to integrate the service with other components into a web app. But it also demonstrates an early version of our Conversation Navigation method. Take the following working script for example...

What's in Theaters (2015)
01 U:  what can you do?
A:  I can look up current and upcoming movies by Genre or
03     MPAA Rating and show you trailers for them. But I'm
04     afraid I cannot search by number of stars or by movie
05     titles or actor and director names at this time.

06 U:  ok show me current movies
07 A:  Are you in the mood for a specific genre?
08 U:  say again
09 A:  I said... Are you in the mood for a specific genre?
10 U:  what are my choices?
11 A:  Action, adventure, animated, comedy, crime, documentary,
12     drama, family, fantasy, foreign, historical, horror,
13     music, mystery, romance, science fiction, TV movie,
14     thriller, war movies and western.

15 U:  science fiction
16 A:  Do you prefer a certain movie rating?
17 U:  what do you mean by movie rating?
18 A:  The rating is a recommendation by the Motion Picture
19     Association of America about the suitability of a
20     movie's content for particular age groups. For example,
21     G is for general audiences, while R is restricted to
22     people 17 and older. Do you prefer a certain movie rating?

23 U:  I don't care
24 A:  Good choice, Bob! I found 24 results for Current Science
25     Fiction movies.

26     ((first movie results))
27 U:  no I mean PG-13
28 A:  Good choices, Bob! I found 1 results for Current
29     PG-13-rated Science Fiction movies.

30     ((Justice League))
31 U:  thanks
32 A:  Is there anything else I can help you with?
33 U:  where is AMC Saratoga?
34 A:  I'm sorry, I don't understand. Try What's playing?
35 U:  what's the address for AMC Saratoga?
36 A:  I'm afraid I don't understand. I can't recognize movie
37     titles at this time.
38 U:  never mind
39 A:  Okay. Whatever you say, Bob! Is there anything else
40     I can help you with?
41 U:  no
42 A:  Okay. Did you find what you were looking for, Bob?
43 U:  yes
44 A:  Great! Goodbye.
45 U:  bye
46 A:  Say anything to continue.

In What's in Theaters, we can see demonstrations of all 6 conversation navigation actions: capability check (line 1), repeat request (line 8), paraphrase request (line 17), sequence closing (line 31), sequence aborting (line 38) and conversation closing (line 45). It also supports selected detail elicitations (line 10), no-answer responses (line 23) and self corrections (line 27). Although the functional scope of What's in Theaters was always limited, as a proof-of-concept app, it nonetheless demonstrates expandable sequences and Conversation Navigation.

Continue on to learn more about Common Activities...

Project Members

Dr. Robert J. Moore
Conversation Analyst, Lead

Eric Young Liu
Design Technologist

Saurabh Mishra
Software Engineer

Dr. Guang-Jie Ren

Upcoming Talks