At their best, chatbots help you get things done. At their worst, they spew toxic nonsense. Whether we call them chatbots, intelligent agents, or virtual agents, the basic idea is that you shouldn’t need to bother with human interaction for things that computers can do quickly and efficiently: ask questions about a flight, manage your expenses, order a pizza, tell you the weather, and apply for a job. A lot of these are handy but may not feel quite like artificial intelligence–later in this post, we’ll tackle the relationship between detecting intentions, having conversations and building trust as the core pieces that make a chatbot feel more like artificial intelligence.
But first, here is an example of a simple chatbot for flight information. Back in March, Facebook released a new feature for its Messenger app that allows KLM Royal Dutch Airlines customers to receive flight details through its Messenger bot.
Plenty of chatbots are very simple—they help with search or match natural language questions to already-written FAQs. One level up from that are those that can answer questions about accounts and then those that can make changes to accounts.
This post is going to focus on virtual agents in customer support, which are among the most sophisticated. By Gartner’s estimates, about 60% of customer service interactions needed a human in 2014 but they predict that’ll be down to 33% by 2017. To get down to that number—to give good customer experiences and to make the human effort more efficient—systems need to understand what customers are trying to do. That means understanding customer intents. Intentions are at the heart of artificial intelligence.
If you’re over at a close friend’s house and they say, There’s beer in the fridge, you probably don’t wonder why they are making existential observations. You know it’s an invitation. For Paul Grice, to mean something is to have an intention and to understand what someone means is to recognize an intention.
The default assumption in conversations is that there is some intention behind your words that makes it relevant at the time you’re saying it. So you can create havoc by messing around with these expectations. An example with dinosaurs:
There are a lot of intentions in the world, which is part of what makes it hard for computers to understand humans. They’re a lot better in smaller domains. Which is usually fine: if you sell pizzas, people don’t call you to order panda onesies, tell you jokes, or ask why on earth people like French bulldogs.
Trust requires understanding
Communication relies on people generally believing that other people are not villainously saying things they know that aren’t true. If you disrupt that basic expectation, it has huge consequences. Financial service firms are regulated so that they don’t lie or make promises they can’t keep (“this stock is going to skyrocket, you’ll make a ton of money!”). Romantic relationships where you can’t trust I love you are chaotic.
Part of the reason that people hate interactive voice systems (IVR) is because what they said didn’t get reliably understood–if you don’t even understand someone’s words you can’t understand their intentions. Speech recognition is much better now, and a lot of IVR systems have much better designs so that you don’t have to wait to hear a menu of buttons. But early and frequent experiences with bad user experiences turn a lot of people off and those negative feelings persist into new contexts.
There are a lot of efforts to make it easier to build lightweight chatbots and most of them focus on text, but there are plenty of ways of not understanding intentions even if you aren’t confused by acoustics or accents. One of the featured chatbots for Facebook tells you about the weather.
Facebook explicitly talks about how their platform makes it possible to build bots “to engage in conversation with people.” In the example above, the response Cool. to my question means something like ‘I recognize you just said something’, though in this context, it has the breezy quality of a friend who really wasn’t listening to you but knows it’s their turn to say something. The chatbot doesn’t know enough to understand my specific intention, but it’s also not really acknowledging that I’ve asked a question.
We’ve had the ‘talk’ meaning of conversation since the 1570s, but when it came into English in the mid-1300s it was about living together and having dealings with each other. In the 1700s, criminal conversation was even the legal term for adultery. In other words, conversation is intimate. It involves the back-and-forth recognition of intentions, which is part of how people build connection and trust. For what it’s worth, I never got the bot in this example to give me forecasts for Thursday to Saturday. The stakes of this particular exchange were very low. They’re much higher when you’re trying to help someone choose a refrigerator or contest a service fee. In those examples, even if economics and user desires are pushing you towards automation, you need to have the right automation. The most fail-proof approach to this is a combination of good machine learning that also knows when to get a human involved. Until we have chatbots talking to chatbots, there will always be at least one human-in-the-loop. And there are inherently humans-in-the-loop as designers and engineers of a system. When chatbots are using machine learning, there are also humans-in-the-loop for training data. So it makes sense that when the goal is trust, that systems back off to humans, too.
Finding intentions in financial services
I’ve seen financial services firms that have about 3,000 categories for classifying customer communications. That’s almost certainly overkill for a quarterly report to executives, but if you’re building a system that helps customers get information and take action, you may find yourself tracking between 30,000 and 300,000 intentions.
Let’s jump into some data: a tiny sample of a few thousand calls to a credit card company. Most of these are very short (median of 4 words). Frequent terms are going to be bill, charge, payment, and representative.
One way to know your customers’ intentions is to look at what you’re already doing. How many different flags do customer support agents have? How many actions do they take? If you’re building a chatbot, it makes sense to plan for it to take over the things that your system already makes it possible for humans to do.
You can also use topic modeling to make sure you’re detecting all the intents that are in the data or are emerging in real-time. One of my favorite clusters for customer support puts together mentions of calls that have words like god, goddamn, fucking, talk, stupid, and representative. This cluster is almost inevitable. There are a lot of people who really don’t like talking to computers. We could debate whether the emotionality is really an intention at the same level as “connect me to a human”, but it can be useful to distinguish people who want to talk to a person from people who want to talk to a person and are fuming.
Companies want to automate customer service in order to save money, but it’s not in their best interest to create a bunch of friction and frustration. Machine learning systems do best when they are trained on the data they are supposed to classify. Happily, a lot of customer service categories can be classified with a level of precision of 0.80 to 0.95 after just 300 human annotations. Again, that’s because even with banks that have mortgages, credit cards, ATMs, airline miles, foreign exchange services and thousands of other things…it’s still more constrained than “everything”.
An additional helpful constraint is that people who are trying to deal with their mortgage don’t usually use phrases like credit card. Part of the reason why it’s easy to know when someone wants to talk to a representative/agent/operator is that talk to a is usually in the top five trigrams. Machine learning algorithms can pick up that and other patterns (e.g., to speak to) automatically very easily. By the time you have 300 annotations for the customer representative requests, 95 out of 100 things that your model routes to human agents will be solidly correct for that category.
What about something a bit more difficult: reactivating a credit card. While some people call in knowing that this is what they want to do, mostly the intention is “Make my card work”. From the company’s perspective, this customer intent needs diagnosing. It’s usually a mistake to think that customer intents themselves are as granular as the-kinds-of-actions-a-company-can-actually-take. The bridge between a customer intent and a company action is conversation.
Humans are great at managing this. But the easy and repetitive tasks can be taken over by computers. So the steps are: get training data, build a machine learning model, use it to classify incoming requests, but back-off to human judgments whenever the model is confused. Statistical models can tell you how confident they are about their classifications. That’s important information to build a system that instills trust. Without human-in-the-loop you won’t build that trust since the model will make too many mistakes, especially at the beginning.
This post was mostly about customer support—in the next one, I’ll look more at intelligent agents like Siri, Tay, Ana, Nina, Amelia, Samantha, and Eva. And we can ask: why are so many of these services portrayed as being women?