Voice assistants that are more conversational deliver improved experiences and greater user engagement. Bixby has a built-in conversational model designed to allow users to interact with the assistant in a natural, conversational way. This blog explains the various features of Bixby’s conversational model using specific implementation examples from the earthquakeFinder sample capsule.
There are times when Bixby needs additional information in order to complete a task. In those cases, Bixby prompts the user in order to elicit the required information. Prompts are an integral part of Bixby’s conversational model. Bixby will automatically prompt the user in either of the following scenarios:
- If some information is required, but missing – this is called a value prompt. In this case, Bixby might say: I need a
- If there is more than one option in context, but at most one is allowed – this is called a selection prompt. In such a case, Bixby might ask the user: Which of these
To generate a value prompt, on an action’s input, simply add the constraint
min(Required). If that input value is missing, a prompt will be generated asking the user for the missing value.
To generate a selection prompt, on an action’s input, add the constraint
max(One). If more than one value is present in context, setting
max(One) will provoke the system to ask which of the contextual values is desired.
The Earthquake Finder capsule allows users to search for earthquakes around the globe based on various search criteria, such as geographic location, date-time, and minimum magnitude. The
findEarthquakes action asks for specific inputs that can be used to find earthquakes based on different input parameters. One of these inputs is the minimum magnitude (
minMagnitude). For example, users can say “find earthquakes with a minimum magnitude of 3.0 or greater” and this will be used as an input search parameter for the API call. If
minMagnitude is defined as
min(Optional), then by definition this is not a required input. If the user doesn’t specify
minMagnitude, Bixby will omit this from the list of input search terms. Conversely, if
minMagnitude is defined as
min(Required), then the user will be prompted for a
minMagnitude if it is not included in the natural language request. See below for the code snippet and UI screen for the “find earthquakes” command with
minMagnitude set as
min(Required) in the findEarthquakes action model.
There is a cost associated with every turn in the conversation and developers should consider minimizing the number of turns as much as possible. You can accomplish this by setting a default value. If your capsule always requires a certain input such as
minMagnitude, but you do not want to prompt the user every time, one option is to provide a default initialization for that input, which sets it to a specific default value. This is implemented with the
default-init block. See below for the
minMagnitude example, where
minMagnitude will default to a value of 3.0 if the user does not provide an alternate input.
The previous example described the functionality of inputs with single cardinality, namely each input has only a single value. Alternatively, inputs can be defined as multi-cardinal using
max(Many). These types of inputs can take multiple parameters. In the
findEarthquakes action, there is an input called
eventType defined as
max(Many) which then allows users to search for earthquakes based on one or more specific types as defined in the USGS API. Some examples of event types include, earthquakes, quarry blasts, explosions, and ice quakes. When users specify multiple event types in their search query, for example if a user says “what quarry blasts and explosions happened this year,” the
Having to loop through multiple API calls can become increasingly complicated, especially when your model has multiple, multi-cardinal inputs. Bixby has a built-in feature called
iterable rather than
max(Many) in the action model. This will result in the action being called multiple times, one for each input of that type which presents itself. In the case of
eventType is marked as
iterable and defined as
eventType inputs, which simplifies the code. See below for the
Another built-in attribute of Bixby’s conversational model is the concept of replacement. More specifically, previous values for an action’s input will automatically be replaced if a user specifies a new input as a follow-up utterance. This replacement mechanism can be implemented using continuations in your training. For example, if a user says “find earthquakes with a minimum magnitude of 4.5 or greater” and then issues a follow-up command of “how about ones with minimum magnitude of 6.0 instead,” Bixby will automatically replace the
minMagnitude input of 4.5 with 6.0 and rerun the search. In the case of a multi-cardinal input, all new contextual input(s) will replace any prior values.
See below for example continuation training from Bixby Developer Studio:
Input constraints are designed to allow users to refine their search results, but as more and more constraints are contextually applied to a query, sometimes this can lead to an empty list of results. In order to handle these cases in a graceful way and avoid Bixby responding with I couldn’t find any results that meet your criteria, Bixby supports a feature called relaxation. When a search action returns no results, the search constraints can be relaxed by either dropping an input value or replacing an input value with a less-restrictive one. The user experience would then produce something like I couldn’t find any
<constraint3> but here are
<new constraint list>.
To implement relaxation, you must add an
on-empty block on the output of an action. Three separate relaxation techniques are described below for the
The first method utilizes
drop-contextual-inputs. If the action returns zero results based on the search constraints, then context is cleared and all previous inputs are dropped with the most recent utterance treated like a new, top-level query. For example, if the user says “find earthquakes in Los Angeles last week” a list of earthquakes is returned. They then follow-up with “with greater than 3.0 magnitude” which filters the list to only show earthquakes in Los Angeles last week with the matching magnitude constraint. Then a subsequent follow-up “how about in San Francisco” does not return any results, because there were no earthquakes in San Francisco greater than 3.0 last week. With the
drop-contextual-inputs tag, Los Angeles, last week, and 3.0 magnitude are all removed from the search constraints and the query/API call is reissued with only the San Francisco query input and the following Bixby response: I didn’t find any 3.0+ earthquakes last week in San Francisco, but here are earthquakes in San Francisco. See below for the syntax.
Instead of dropping all contextual inputs, developers can selectively decide which inputs to drop, and in which order. For example, if the user says “search for earthquakes in Los Angeles with 6.0 magnitude or greater” and if there are no earthquakes with such a high magnitude, then Bixby will remove the
minMagnitude search constraint and rerun the query/API call with Los Angeles as the search region. If instead no earthquakes were found in Los Angeles (although highly unlikely), then the
searchRegion would also be dropped and a search would be rerun with no input constraints. See below for the implementation.
Another way to implement relaxation is by replacing an input value with a less-restrictive one. For the
findEarthquakes action, if the user specifies an earthquake search radius such as, “find earthquakes within 3 miles”, and no results are returned, the code below will replace the user’s specified search radius with 25 miles.
In general, relaxation is designed to prevent Bixby from returning an empty list of results. It is often better to return some kind of result rather than nothing, even if that result is not exactly aligned with the user’s specific search constraints, as long as the user is informed that what is being shown is not what they originally asked for.
This blog described the various features of Bixby’s conversational model, including:
- Value prompts for required missing inputs or selection prompts for disambiguation of multiple contextual inputs;
- Default initialization to prevent unnecessary user prompting for required inputs;
- Replacement functionality for action inputs; and,
- Relaxing constraints when an empty list of results is returned, by dropping inputs or replacing inputs with less-restrictive ones.
Developers should factor in the conversational behaviors described above when designing their user interaction models. In the next blog, I will provide an alternate approach to Bixby conversations and overall context management that gives the developer even more control over how contextual inputs get managed during a conversation. You can download the complete sample capsule code for the earthquakeFinder capsule from Github here. Or for more in-depth tutorials, sample capsules, guides and videos hop on over to the Bixby Developer Center.
If you’re a developer who thinks they have what it takes, and this tutorial helped you develop a killer capsule, we want to hear from you! Get in on the very first wave of the Bixby Marketplace and apply to the Premier Developer Program today.