Conversational UX Design - NC Framework
The Natural Conversation Framework
Although current chatbot and conversational agent platforms offer powerful natural language support, they do not provide much support for natural conversation. In other words, they leave it to designers to create their own interaction style. But just as natural language is a complex system to which whole scientific disciplines are devoted, so is natural conversation. The systems of how humans take turns and sequentially organize conversations are formally studied in the social sciences, especially in the field of Conversation Analysis (CA). At IBM Research-Almaden, our scientists trained in Conversation Analysis are applying the formal, qualitative models and findings from their field to the design of conversational agents. The result of their work is the Natural Conversation Framework.
The Natural Conversation Framework (NCF) is a design framework for conversational user experience. It provides a library of generic conversational UX patterns that are independent of any particular technology platform and that are inspired by natural human conversation patterns documented in the Conversation Analysis literature. The Natural Conversation Framework so far has been implemented on the IBM Watson Dialog service (now shuttered) and the current IBM Watson Conversation service. But in principle it can be implemented on other platforms as well. These implementations provide a starting point for designers so they don't have to reinvent the nuts and bolts of conversational structure.
The Natural Conversation Framework consists of four main components: 1) an Interaction Model, 2) Conversation Navigation, 3) Common Activities and 4) Sequence Metrics.
1. Interaction Model
The building blocks of human conversation are sequences. They are like tools or devices that can be used and reused in all kinds of different situations and settings, for all kinds of different purposes. There is more than one type of conversational sequence. "Storytelling" is one type, but by far the most prominent is the "adjacency pair."1 This type of sequence consists of recognizable pairs of social actions. For example, a greeting-greeting or a farewell-farewell or an inquiry-answer or an offer-accept/reject or a request-grant/deny or an invitation-accept/decline and more. When someone initiates the pair, it creates an expectation, and an obligation, for someone else to complete it. Sequences are the primary vehicles through which we build up conversations, turn-by-turn, and achieve a wide range of social activities.
Now when someone initiates a sequence, the recipient does not always complete it in the next turn. This is because conversational sequences are expandable.2 Expansions are sequences that operate on other sequences. While social action pairs can stand on their own, expansion pairs cannot. For example...
An utterance like "thanks!" (line 4) cannot stand on its own. It is inherently responsive to something else, something prior, in this case the agent's answer (line 3). And it does a particular job in the conversation: it closes the prior sequence.
Sequence expansions enable speakers to manage the conversation itself. In addition to closing a sequence, they may be used for screening, eliciting, repeating or paraphrasing. The following excerpt demonstrates each of these expansion types in a single sequence. It is a working example from "Alma," our implementation of the Natural Conversation Framework on the Watson Conversation service. (A refers to the automated agent. U refers to the user.)
5 Sequence Expansion Types
We can see an example of screening (line 1) in which the user does a preliminary inquiry into the capabilities of the agent. Such pre-expansions check conditions upon which the first part of the base sequence (line 5) depend. If the agent were to respond, "I can look up current and upcoming movies" instead, the user would not ask for a restaurant recommendation next (line 5).
In between the two parts of the base sequence, we see two expansions that do eliciting (lines 6-9). First, the agent proposes that it needs an additional detail, a cuisine preference (line 6), as a condition for granting the user's request. Second, as a condition for answering the elicitation of a cuisine preference, the user proposes that he needs to know the cuisine choices (line 7).
The remaining sequence expansions (lines 3 & 11) are examples of what conversation analysts call "repair."3 They are used to remedy normal troubles in hearing or speaking. In the first case, the user requests a repeat of part of the agent's prior response (line 3), namely, the part that came after the "a few." In the second case, the user requests a paraphrase of all of the agent's prior response (line 11). Repairs of hearing or understanding troubles can come after any utterance in a conversation.
The interaction model of the Natural Conversation Framework thus consists of expandable sequences, like an accordion. Compact sequences are common, but each sequence can be expanded by either party as needed. In the excerpt above, the whole thing is one sequence (all 15 lines), an expanded sequence. The expansions are parts of the base sequence (lines 5 & 10). The expansion types, screening, repeating and closing, enable basic coordination in and out of sequences, while the expansion types, eliciting and paraphrasing, enable the parties to compensate for emergent asymmetries in their knowledge. Taken together, these sequences that operate on other sequences enable conversation management. Because they are designed to manage the interaction itself, these actions are unique to conversation and not found in other forms of natural language use.
Seqence expansions enable conversational systems to adapt to the particular user on a local, turn-by-turn basis. For example, depending on how detailed the user's initial request is, the agent can elicit additional details as needed, rather than accepting only a complete request.
Here the user does not mention the type of cuisine she prefers in her initial request (lines 1-2) so the agent elicits that detail instead (line 3). The user then provides the detail in a separate turn (line 4). This makes the agent flexible and more like a human speaker than a database.
In addition, some users may need more help than others. This may be due to differences in their knowledge or just to idiosyncratic confusions local to the conversation. Sequence expansions enable users to get localized help. For example...
User Elaboration Request
In this case, the user requests a paraphrase (line 3) of the agent's request for a distance preference (line 2). Perhaps it is a question he did not expect or perhaps "walking distance" is not a phrase with which he is familiar. The agent then paraphrases its prior question (lines 4-5) and that enables the user to answer it (line 6). Rather than designing every response of the agent in the simplest, elaborated form, which would be long and cumbersome, especially for voice interfaces, sequence expansions enable the agent's initial responses to be shorter. This makes the conversation faster and more efficient. Then if a few users encounter trouble responding, understanding or hearing these more streamlined responses, they can expand the sequence as needed. This is how natural human conversation is organized: with a preference for minimization.4 That is, speakers should try the shortest utterance that they think the recipient can understand first, see if it succeeds and then expand only if necessary.
Natural Conversation Understanding: Support for sequence expansion is critical in Conversational UX Design. One of the distinctive goals of conversation is mutual understanding. Accurate information alone is not enough. If the user or the agent cannot understand what the other said, the conversation has failed. Analyzing the user's utterance with Natural Language Understanding tools (NLC and entity extraction) is only the first step! Mutual understanding can only be determined when the recipient responds in "third position." For example, if a user makes an inquiry, the agent answers and the user says, "thanks!," then there is an indication of mutual understanding. But if the user says, "what do you mean?," then the user does not understand the agent's answer. If the user says, "no, I mean X," then the agent did not understand the user's inquiry. And if the user says, "never mind," then mutual understanding has failed and the user is giving up. Sequence expansions provide organic indicators of understanding on a turn-by-turn basis. Mutual understanding cannot be achieved in one turn; it requires dialog and support for sequence expansion. We use the term "Natural Conversation Understanding," then to refer to sequence-expansion and repair features that enable user and agent to achieve mutual understanding.
1 Schegloff, Emanuel A. and Harvey Sacks. 1973. Opening up closings, Semiotica 7: 289–327.
2 For a comprehensive introduction to conversational sequences, see Schegloff, Emanuel A. 2007. Sequence Organization in Interaction: A Primer in Conversation Analysis, vol 1. Cambridge: Cambridge Univ. Press.
3 Schegloff, Emanuel A., Gail Jefferson, Harvey Sacks. 1977. The preference for self-correction in the organization of repair in conversation, Language 53: 361-82.
4 Sacks, Harvey, & Schegloff, Emanuel A. 1979. Two preferences in the organization of reference to persons in conversation and their interaction. In G. Psathas (ed.), Everyday language: Studies in ethnomethodology, 15–21. New York: Irvington.