Conversational UX Design     


Conversational UX Design - Navigation

Conversation Navigation

With any computer interface, users must learn how to navigate the space. In command-line interfaces, users learn to navigate directories through cryptic commands. In graphical interfaces, users learn to drag files on a desktop to folders. In web interfaces, users learn to jump from page to page with URLs and hypertext. And in mobile interfaces, users learn to touch the screen, rotate it and "pinch" the images. But how should users navigate a conversational interface? What are the basic actions that they can always rely on at any point to navigate the conversation space?

Natural human conversation contains devices for its own management, as we see with sequence expansions. We propose a subset of these as 6 basic actions for Conversation Navigation:

6 Basic Navigation Actions                    Similar to...
1. What can you do?     capability           global help
2. What did you say?    repeat               back
3. What do you mean?    paraphrase           local help
4. Okay/Thanks          close sequence       close window
5. Never mind           abort sequence       escape
6. Goodbye              close conversation   quit


Capability: Users should always be able to ask the agent what it can do. "What can you do?" is perhaps the most general request for a description of the system's scope and functionality. It should give the user enough guidance to use the app or to ask more specific questions about capability.

Repeat: In voice interfaces, unlike text, utterances are transient; therefore, users must be able to elicit repeats of all or part of the agent's utterances. "What did you say?" is the natural way to request a full repeat of the prior utterance. In voice interfaces, requesting repeats is like 'going back' in other interfaces.

Paraphrase: While capability checks provide global help to the user, paraphrase devices provide local help on a turn-level basis. "What do you mean?" is elicits an elaboration of the prior utterance. Elaborations should use plainer language to express the same intent as the more efficient, standard response.

Close Sequence: Users should be able to close the current sequence when they receive an adequate response and move on to the next sequence. "Okay" or "thanks" are natural ways to for the user to signal the completion of the current sequence and invite the agent to move on. This can also be a good place to reset context variables that may conflict with subsequent requests.

Abort Sequence: When users fail to elicit an adequate response from the agent, they should be able to abort the current sequence and move on to the next. "Never mind" in a conversation functions somewhat like 'escape' in a computer interface. It enables the user to give up and move on.

Close Conversation: As in a human conversation, users should be encouraged to close their interaction with the system. "Goodbye" is the natural way to move to end a conversation. The agent should treat the user's attempt to close the conversation as a pre-closing. The pre-closing gives the agent the opportunity to bring up a last topic before returning a "goodbye" to the user.

After any agent utterance in a conversation, users should be able to do any of the above actions. At first they must be taught that, unlike Google or even Siri, your conversational agent will recognize these 6 basic actions and respond appropriately and usefully. Because they are based on human conversation, they should already be familiar to the user and natural to perform.

Take the following two working examples from Alma...

Conversation Navigation A
01 U:  what can you do?
02 A:  I haven't been trained yet. I can only make small
03     talk, answer or ask some trivia questions and
04     tell some jokes.
05 U:  ok who invented the hard disk?
06 A:  It was invented by IBM in 1956.
07 U:  say again
08 A:  It was invented by IBM in 1956.
09 U:  what do you mean?
10 A:  The hard disk was invented in 1956 by engineers
11     at IBM's laboratory in San Jose, California.
12 U:  thanks
13 A:  You're welcome!
14 A:  Anything else for you?
15 U:  goodbye
16 A:  Did you find what you were looking for?

In excerpt A, the user relies on 5 of the basic actions to navigate the conversation. At the beginning, he checks the agent's capabilities (lines 1-4) and then does an action within the scope of that response, a technology trivia question (line 5). In response to the answer, the user then requests a repeat (line 7), followed by an elaboration request (line 9). The user then closes the trivia-question sequence with an appreciation (line 12) and moves to close the conversation (line 15). Instead of completing the closing sequence, the agent treats it as a pre-closing and brings up a last topic, a success check (line 16). Now contrast excerpt B...

Conversation Navigation B
01 U:  show me flights
02 A:  What do you mean?
03 U:  show me flights from SFO to JFK
04 A:  I'm sorry, I still don't understand.
05 U:  never mind
06 A:  Whatever you say, friend.
07 A:  Want to know what I can do?
08 U:  yes please
09 A:  Okay.
10 A:  I haven't been trained yet. I can only 
11     make small talk, answer or ask some 
12     trivia questions and tell some jokes.

The user in excerpt B fails to check the agent's capabilities at the beginning of the conversation and instead launches into a flight request (line 1). This time the agent responds with an elaboration request (line 2) to which the user offers an elaboration (line 3). This still fails to enable the agent to understand (line 4) so the user aborts the flight request sequence (line 5). In response, the agent offers to describe its capabilities (line 7), which the user accepts (line 8) and can use to regain alignment with the agent.

The 6 basic navigation actions enable the user to get in and out of sequences and the conversation itself. They also enable the user to get help globally and on a local, turn-by-turn basis. With this set of actions, users can thus navigate the conversation space of the application and recover when they get stuck. And because the 6 basic actions reflect corresponding actions in natural human conversation, they are already familiar to users.

Our first implementation of the Natural Conversation Framework was for IBM's What's in Theaters app in 2015 (try it here if it's still up). What's in Theaters was built on the Watson Dialog service (the precursor to Watson Conversation) as a simple demonstration of how to integrate the service with other components into a web app. But it also demonstrates an early version of our Conversation Navigation method. Take the following working script for example...

What's in Theaters (2015)
01 U:  what can you do?
A:  I can look up current and upcoming movies by Genre or
03     MPAA Rating and show you trailers for them. But I'm
04     afraid I cannot search by number of stars or by movie
05     titles or actor and director names at this time.

06 U:  ok show me current movies
07 A:  Are you in the mood for a specific genre?
08 U:  say again
09 A:  I said... Are you in the mood for a specific genre?
10 U:  what are my choices?
11 A:  Action, adventure, animated, comedy, crime, documentary,
12     drama, family, fantasy, foreign, historical, horror,
13     music, mystery, romance, science fiction, TV movie,
14     thriller, war movies and western.

15 U:  science fiction
16 A:  Do you prefer a certain movie rating?
17 U:  what do you mean by movie rating?
18 A:  The rating is a recommendation by the Motion Picture
19     Association of America about the suitability of a
20     movie's content for particular age groups. For example,
21     G is for general audiences, while R is restricted to
22     people 17 and older. Do you prefer a certain movie rating?

23 U:  I don't care
24 A:  Good choice, Bob! I found 24 results for Current Science
25     Fiction movies.

26     ((first movie results))
27 U:  no I mean PG-13
28 A:  Good choices, Bob! I found 1 results for Current
29     PG-13-rated Science Fiction movies.

30     ((Justice League))
31 U:  thanks
32 A:  Is there anything else I can help you with?
33 U:  where is AMC Saratoga?
34 A:  I'm sorry, I don't understand. Try What's playing?
35 U:  what's the address for AMC Saratoga?
36 A:  I'm afraid I don't understand. I can't recognize movie
37     titles at this time.
38 U:  never mind
39 A:  Okay. Whatever you say, Bob! Is there anything else
40     I can help you with?
41 U:  no
42 A:  Okay. Did you find what you were looking for, Bob?
43 U:  yes
44 A:  Great! Goodbye.
45 U:  bye
46 A:  Say anything to continue.

In What's in Theaters, we can see demonstrations of all 6 conversation navigation actions: capability check (line 1), repeat request (line 8), paraphrase request (line 17), sequence closing (line 31), sequence aborting (line 38) and conversation closing (line 45). It also supports selected detail elicitations (line 10), no-answer responses (line 23) and self corrections (line 27). Although the functional scope of What's in Theaters was always limited, as a proof-of-concept app, it nonetheless demonstrates expandable sequences and Conversation Navigation.

Because users will not necessarily assume that a conversational agent can do the 6 navigation actions, it is helpful to provide a tutorial. Alma provides the following interactive tutorial to the user.

Navigation Tutorial
1 A:  Hi there. Would you like to take a quick tutorial?

2 U:  no thanks
3 A:  Okay. Let me know if you change your mind.
4 U:  let's do the tutorial
5 A:  Okay. Say 'okay' to get me to continue or just wait.
6 U:  ok
7 A:  You can always ask me about what I can do.
8 U:  what can you do?
9 A:  I haven't been trained yet. I can only make small talk,
      tell some jokes or ask some trivia questions. I can tell
      you more about each of these.

10     (4.0)
11 A:  You can always ask me to repeat what I just said.
12 U:  repeat what you just said
13 A:  You can always ask me to repeat what I just said.
14     (4.0)
15 A:  Or you can ask me to clarify what I just said.
16 U:  clarify what you just said
17 A:  If you didn't understand what I said, you can ask
       me to say it a different way.

18     (4.0)
19 A:  You can close a sequence when it is successful.
20 U:  what do you mean by close a sequence?
21 A:  Oh, sorry! Closing a sequence means saying 'ok' or
       'thanks' when my answer or response is adequate.

22 U:  ok
23 A:  You can also abort a sequence when we get stuck.
24 U:  what do you mean?
25 A:  Say 'never mind' when I fail to answer your question 
       and you want to give up.

26 U:  never mind
27 A:  Shall I stop?
28 U:  no
29 A:  Okay. I'll continue. And you can say 'goodbye' at
       any time to close the whole conversation.

30 U:  goodbye
31 A:  Good job! That's it!
32 U:  thanks
33 A:  You're welcome!

Note: this tutorial uses primarily the Extended Telling (A3) and User Repair (B2) patterns from the Natural Conversation Framework.

Project Members

Dr. Robert J. Moore
Conversation Analyst, Lead

Eric Young Liu
Design Technologist

Saurabh Mishra
Software Engineer

Dr. Guang-Jie Ren

Upcoming Talks