Conversational UX Design - Navigation
With any computer interface, users must learn how to navigate the space. In command-line interfaces, users learn to navigate directories through cryptic commands. In graphical interfaces, users learn to drag files on a desktop to folders. In web interfaces, users learn to jump from page to page with URLs and hypertext. And in mobile interfaces, users learn to touch the screen, rotate it and "pinch" the images. But how should users navigate a conversational interface? What are the basic actions that they can always rely on at any point to navigate the conversation space?
Natural human conversation contains devices for its own management, as we see with sequence expansions. We propose a subset of these as 6 basic actions for Conversation Navigation:
6 Basic Navigation Actions Similar to...
Capability: Users should always be able to ask the agent what it can do. "What can you do?" is perhaps the most general request for a description of the system's scope and functionality. It should give the user enough guidance to use the app or to ask more specific questions about capability.
Repeat: In voice interfaces, unlike text, utterances are transient; therefore, users must be able to elicit repeats of all or part of the agent's utterances. "What did you say?" is the natural way to request a full repeat of the prior utterance. In voice interfaces, requesting repeats is like 'going back' in other interfaces.
Paraphrase: While capability checks provide global help to the user, paraphrase devices provide local help on a turn-level basis. "What do you mean?" is elicits an elaboration of the prior utterance. Elaborations should use plainer language to express the same intent as the more efficient, standard response.
Close Sequence: Users should be able to close the current sequence when they receive an adequate response and move on to the next sequence. "Okay" or "thanks" are natural ways to for the user to signal the completion of the current sequence and invite the agent to move on. This can also be a good place to reset context variables that may conflict with subsequent requests.
Abort Sequence: When users fail to elicit an adequate response from the agent, they should be able to abort the current sequence and move on to the next. "Never mind" in a conversation functions somewhat like 'escape' in a computer interface. It enables the user to give up and move on.
Close Conversation: As in a human conversation, users should be encouraged to close their interaction with the system. "Goodbye" is the natural way to move to end a conversation. The agent should treat the user's attempt to close the conversation as a pre-closing. The pre-closing gives the agent the opportunity to bring up a last topic before returning a "goodbye" to the user.
After any agent utterance in a conversation, users should be able to do any of the above actions. At first they must be taught that, unlike Google or even Siri, your conversational agent will recognize these 6 basic actions and respond appropriately and usefully. Because they are based on human conversation, they should already be familiar to the user and natural to perform.
Take the following two working examples from Alma...
Conversation Navigation A
In excerpt A, the user relies on 5 of the basic actions to navigate the conversation. At the beginning, he checks the agent's capabilities (lines 1-4) and then does an action within the scope of that response, a technology trivia question (line 5). In response to the answer, the user then requests a repeat (line 7), followed by an elaboration request (line 9). The user then closes the trivia-question sequence with an appreciation (line 12) and moves to close the conversation (line 15). Instead of completing the closing sequence, the agent treats it as a pre-closing and brings up a last topic, a success check (line 16). Now contrast excerpt B...
Conversation Navigation B
The user in excerpt B fails to check the agent's capabilities at the beginning of the conversation and instead launches into a flight request (line 1). This time the agent responds with an elaboration request (line 2) to which the user offers an elaboration (line 3). This still fails to enable the agent to understand (line 4) so the user aborts the flight request sequence (line 5). In response, the agent offers to describe its capabilities (line 7), which the user accepts (line 8) and can use to regain alignment with the agent.
The 6 basic navigation actions enable the user to get in and out of sequences and the conversation itself. They also enable the user to get help globally and on a local, turn-by-turn basis. With this set of actions, users can thus navigate the conversation space of the application and recover when they get stuck. And because the 6 basic actions reflect corresponding actions in natural human conversation, they are already familiar to users.
Our first implementation of the Natural Conversation Framework was for IBM's What's in Theaters app in 2015 (try it here if it's still up). What's in Theaters was built on the Watson Dialog service (the precursor to Watson Conversation) as a simple demonstration of how to integrate the service with other components into a web app. But it also demonstrates an early version of our Conversation Navigation method. Take the following working script for example...
What's in Theaters (2015)
In What's in Theaters, we can see demonstrations of all 6 conversation navigation actions: capability check (line 1), repeat request (line 8), paraphrase request (line 17), sequence closing (line 31), sequence aborting (line 38) and conversation closing (line 45). It also supports selected detail elicitations (line 10), no-answer responses (line 23) and self corrections (line 27). Although the functional scope of What's in Theaters was always limited, as a proof-of-concept app, it nonetheless demonstrates expandable sequences and Conversation Navigation.
Because users will not necessarily assume that a conversational agent can do the 6 navigation actions, it is helpful to provide a tutorial. Alma provides the following interactive tutorial to the user.
Note: this tutorial uses primarily the Extended Telling (A3) and User Repair (B2) patterns from the Natural Conversation Framework.