Machine Learning

Concerning Chatbots

By Daniel Maschmeier

They’re Everywhere

With COVID seemlingly lurking around every corner, the demand for online help has exploded. Companies have had to find ways to protect customer service employees either through remote work or reduced in-office capacity. Even before COVID, chatbots were becoming a popular choice for interacting with users. The current environment has seemingly forced an acceleration of chatbot adoption, especially in areas like hotel bookings, insurance claims, search & FAQ interfaces, and general customer support.

Even though chatbots are increasing in popularity, there’s still a fair amount that is not understood about these little apps. Companies often approach them as magic bullets aiming to solve poor user experience issues elsewhere on their websites, and developers see them as daunting constructs that don’t conform to the typical “if/then” design of most applications. This knowledge gap can unfortunately result in some poor experiences. Users have high expectations, but often end up disappointed by the results. So in this piece, I’m going to attempt to get everybody on the same page about what chatbots are and what they’re not, what they can and cannot do.

Like most internet users, I have encountered chatbots quite often. While there’s a wow factor when interacting with them initially, I often find that after the first minute or so, I’m left disappointed. I usually find them to be about as helpful as an interactive voice response (IVR) system with a web interface.

Being a curious developer, I pulled back the curtain on Amazon Lex to peek at how some common chatbots are made, and I built a better appreciation for some of the challenges that make so many real-world chatbots so maddening. Amazon Lex is a service provided within the larger Amazon Web Services cloud which supports basic chat and natural language processing capabilities. In other words, it knows what you mean when you type to it or talk at it. Even though my most recent experience is with Amazon Lex, chatbots tend to be pretty similar in usage and design, so many of the concepts will carry over even if the terms may differ a little.

Chatbots and Their Work

To start with, let’s take a look at what chatbots are not. They are not endless fonts of wisdom and support. Despite their growing complexity, they cannot pass a Turing Test and they are certainly not a place for reports or large outputs of search results. Some day they will be fully backed by an AI and will be able to do all of those things, but contrary to what marketing videos might tell you, we’re not quite there yet.

That being said, they can get pretty close — if done right. Chatbots can be quite powerful, and given the right industry and a tailored implementation, good ones can still result in a very impressive return on investment, wide user adoption, and high user satisfaction.

So now let’s level-set on what chatbots are. Put simply, a chatbot is a conversationally based interface for user interaction. As mentioned above, they can be used for anything where having a conversation might be more useful to a user than a series of navigational links or search filters. This usually comes from the user being able to ask some questions which the bot can use to get the user information that they need.

The easier the question — or the shorter the conversation — the more useful a chatbot can be. For example, asking about the status of an order is something that is a much better fit for a chatbot than asking for a list of all orders for the last year. In general, chatbots are great for short, succinct interactions, and not so great for larger, data-heavy responses.

Context & Statements (Intent & Utterances)

To get started understanding chatbots and their limits, let’s begin with context. Context is of primary importance when it comes to programming. It’s why when writing code we have squiggly braces, specific rules about indenting, semi-colons, etc. With chatbots, context takes on a slightly different form — but it’s a form that everyone is likely familiar with, whether you are a technologist or not: conversational context.

What do I mean by conversational context? You are already familiar with it if you’ve conversed with other people. Say you happen to catch part of a conversation between two people and you hear one of them say “No, thank you.” You have no idea what that refers to, as you are lacking the context of the whole conversation. But if just before “No, thank you,” you heard the other person say “Would you like to try a free sample?” Now you have context; you know to what the person is saying no.

Context also matters with chatbots, especially because computers (unlike people) cannot read body language or environmental clues. Amazon Lex calls this “intent” but we can stick with context for now. Context is set based on the first interaction a user has with the chatbot.

In some implementations, if the user clicks on a “Virtual Chat Support” link, the chatbot knows that the context will revolve around support. When that happens, any questions or statements made by the user will be used by the chatbot to search against a database for FAQs, known issues, knowledge base articles, etc.

At other times, a user can also trigger a context by typing in a question or statement. Amazon Lex calls this an “utterance” which, in my opinion, is a good term for this concept. An utterance is essentially whatever the user typed into the chatbot (or sometimes a choice the user made via button selection). In linguistic terms, it is minimally a fragment of a sentence.

For instance, say that we are on a hotel booking site and the user opens up the window for a “Virtual Concierge” chatbot. Unlike a support bot, there isn’t really an implied context. Just “Hotels” is a little too vague, so the user will need to provide something else. Often, there will be a prompt or message of some sort, but until you provide more context, it’s not sure what you want. To start the conversation and set context, you might type something like “I would like to book a hotel” or “confirm a reservation.” Each of those utterances would trigger a different context for the chatbot, and each context would steer the conversation in a different direction, to keep things relevant to the given context.

Further chatbot interactions will depend on context set previously. When you’ve indicated that you’re booking a hotel, for example, following up with “Paris, France” is likely to refer to where you want to stay. If you’ve indicated that you’re checking on a reservation, then instead it likely refers to an existing reservation for a hotel in Paris that the system can look up. The context will tell the chatbot which conversation flow to use when responding and prompting you with additional questions as well. For example, having the bot ask “What is your reservation number?” only makes sense in the latter context.

Conversation Flow

Now that we understand the basics of context, let’s move on to the next bit: conversation flow. Chatbots are expected to be conversational, and to make interactions feel as if you’re talking to a person. Computers aren’t quite as dynamic as we would like to believe however, so we need to think about how a conversation with one might go and plan it out ahead of time.

Take a moment and think about what you might need to ask when a person wants to book a hotel. Your list probably includes asking for city, room size, check-in date, check-out date, and maybe a few other details. Now let’s think about how a user might respond.

The user is going to respond by typing something. Hopefully, something like “London, England,” “Queen,” “Mar 1,” and “Mar 10” respectively, and for now let’s assume they have. Similarly to first setting the context with “I’d like to book a hotel,” responses to all other chatbot prompts are also called utterances. Now, however, we already have a context — two of them as a matter of fact. The first context is the general one of booking a hotel. The second context is a specific question, e.g. “What city would you like to go to?”

Continuing with the idea that we’ve been asked about the city, we can generally expect an utterance along the lines of a location like “London.” As a note, what the user types is still referred to as an utterance, even though we know the context. This is because the user might say something that changes the context and we should keep that in mind. In general, you can’t always assume that what a user answers is specific to the current context. We’ll get into that more later, but for now just remember that anything that the user types is considered to be an utterance.

We humans generally refer to the above exchange as a question and answer. As we’ve seen so far, the chatbot will consider the answer an utterance. But as a developer, this becomes a parameter, or “slot” in Lex parlance. Filling in these answers/parameters/slots allows us to write methods like the one below:

public Reservation BookHotel(string city, string roomSize, DateTime checkin, DateTime checkout) { ... }

Looking at this hypothetical method signature, it becomes apparent why we need context and why we need to understand the conversation flow; we have to be able to determine which user responses are intended to answer which questions. For programmers, this determines how each answer maps to a specific method parameter.

Utterances and Natural Language Processing (NLP)

Reviewing everything above, you might be thinking that a chatbot would actually be a pretty easy thing to put together, and on the surface you would be right. After all, when booking a hotel, there are only a handful of relevant questions to ask. You can plug-in those answers as method arguments and run it like any other method — but that’s only half of it. The other, more complex part is understating what a user means. This gets very complicated, very quickly.

How many ways are there to ask for a hotel room? How about “book a room,” “get hotel,” “make reservation,” “reserve two nights,” etc.

Consider that if a person says “Paris” when asked for a city, a chatbot will not necessarily know what that means. Even though it assumes Paris is a location, there are multiple locales named Paris — Illinois, Texas, Maine, and 20 more in the US alone! Our bot either needs to ask for clarification, or further assume that the user meant a specific Paris.

When asking for a date, it would be ideal if the user provided specifically formatted input, such as “01-Mar-2021” but they likely won’t. Because this is a chatbot, we also want to stay away from requiring input via a date selector widget, and instead stick to plain text. Unfortunately, this forces us to handle all sorts of inputs, even something like “next Tuesday.”

Let’s also consider how a chatbot will handle responses that don’t fit the question being asked. For example, think about a chatbot’s behavior when trying to parse a date out of a response such as “can we change the city to London?” Or what happens if the user at some point just types “Cancel,” “help,” “change city,” or anything that isn’t part of a nice orderly conversation flow? We need to understand that the user wants to cancel out of their current process or redirect the conversation elsewhere, rather than answering the question that the chatbot asked.

It’s ok. Relax and take a deep breath. This is where the power of chatbot software like Amazon Lex comes into play, using a very powerful natural language processor. You don’t want to have to type in every variation on every way to ask for a room or specify a date. You don’t want to have to check every utterance to see if the user meant to jump over to a totally different context or cancel the process all together.

As mentioned earlier, Amazon Lex uses natural language processing (NLP) and some fun machine learning behind the scenes to handle much of this for us. Any real chatbot platform will use NLP to make easy work of normalizing user input. By handling things like parsing through bad spelling and deciphering typos, it will ultimately provide cleaner data to our chatbot’s backend. There will still some work to be done by the developer, but at least some of the really hard stuff can be done for us.

Summary

Where does this leave us?

First and foremost, chatbots are concerned with context. Cues and responses will always be defined within a particular context, and must conform to the expectations of the context in which they’re defined. Make sure you understand what information you need from your users, and then try to put together flows which feel natural when the chatbot asks questions to gather this info. Make sure you consider various meanings a given utterance might have, and think through different ways a user may want to switch context. Furthermore, keep in mind what type of interaction you’re trying to share with a user, and if a chatbot is even the best way to provide a positive experience.

With people spending more time at home and customer service use increasing, technology will continue to be the main public face of many brands. Fewer in-person interactions puts pressure on companies to design software that provides a better automated experience. Websites need to work and chatbots need to chat; if you really want the wow factor, they need to do it together seamlessly.