Schedule rides in advance

COTA: Improving Uber Customer Care with NLP & Machine Learning

January 3, 2018 / Global

Share
Facebook
Twitter
Linkedin
Envelope
To facilitate the best end-to-end experience possible for users, Uber is committed to making customer support easier and more accessible. Working toward this goal, Uber’s Customer Obsession team leverages five different customer-agent communication channels powered by an in-house platform that integrates customer support ticket context for easy issue resolution. With hundreds of thousands of tickets surfacing daily on the platform across 400+ cities worldwide, this team must ensure that agents are empowered to resolve them as accurately and quickly as possible.
Enter COTA, our Customer Obsession Ticket Assistant, a tool that uses machine learning and natural language processing (NLP) techniques to help agents deliver better customer support. Leveraging our Michelangelo machine learning-as-a-service platform on top of our customer support platform, COTA enables quick and efficient issue resolution for more than 90 percent of our inbound support tickets.
In this article, we discuss our motivations behind creating COTA, outline its backend architecture, and showcase how the powerful tool has led to increased customer satisfaction.
 
Customer support before COTA
When customers contact Uber for support, it is important that we route them to the best possible resolution in a timely manner. One way to facilitate this is to have users click through a hierarchy of issue types when they report an issue; this provides our agents with additional context around the issue, thereby enabling them to solve it more quickly, as detailed in Figure 1, below:
Figure 1: Uber’s customer in-app support flow presents users with an intuitive and easy-to-use interface that highlights trip details and suggests issue types to help with routing.Although this provides important context, not all of the information needed for solving an issue is obtainable through this process, particularly given the wide variety of possible solutions available. Moreover, the diversity of ways a customer can describe an issue associated with a ticket further complicates the ticket resolution process. 
As Uber continues to grow at scale, support agents must be able to handle an ever-increasing volume and diversity of support tickets, from technical errors to fare adjustments. In fact, when an agent opens a ticket, the first thing they need to do is  determine the issue type out of thousands of possibilities—no easy task! Reducing the amount of time agents spend identifying tickets is important because it also decreases the time it takes to resolve issues for users. 
Once an issue type is chosen, the next step is to identify the right resolution, with each ticket type possessing a different set of protocols and solutions. With thousands of possible resolutions to choose from, identifying the proper fix to each issue is also a time-intensive process.
 
Introducing COTA: Customer Obsession Ticket Assistant 
We designed COTA to help our customer support representatives improve their speed and accuracy, resulting in an improved customer experience. 
In short, COTA leverages Michelangelo to simplify, expedite, and standardize the ticket-resolution workflow. While the current version of COTA is comprised of a set of models that recommend solutions to agents for English-language support tickets, we are in the process of building models that can process Spanish and Portuguese-language tickets, too. 
Built on top of our support platform, our Michelangelo-powered models suggest the three most likely issue types and solutions based on ticket content and trip context, depicted below: 
Figure 2: The COTA system architecture is composed of a seven-step workflow.As depicted in Figure 2, the general COTA architecture follows a seven-step flow: 
Once a new ticket enters the customer support platform (CSP), the back-end service collects all relevant features of the ticket.
The back-end service then sends these features to the machine learning model in Michelangelo.
The model predicts scores for each possible solution. 
The back-end service receives the predictions and scores, and saves them to our Schemaless data store.
Once an agent opens a given ticket, the front-end service triggers the back-end service to check if there are any updates to the ticket. If there are no updates, the back-end service will retrieve the saved predictions; if there are updates, it will fetch the updated features and go through steps 2-4 again.
The back-end service returns the list of solutions ranked by the predicted score to the frontend.
The top three ranked solutions are suggested to agents; from there, agents make a selection and resolve the support ticket.
Results are promising; COTA can reduce ticket resolution time by over 10 percent while delivering service with similar or higher levels of customer satisfaction, as measured by customer service surveys. By empowering customer support agents to deliver quicker and more accurate solutions, COTA’s powerful ML models make the Uber support experience more enjoyable.
 
Building the COTA backend with NLP and ML
From the outside looking in, COTA takes in contextual information about support issue and returns possible solutions, but there is much more going on behind the scenes. At its core, the COTA backend is responsible for accomplishing two tasks: identifying ticket issue type and determining the most sensible solutions for them. To accomplish this, our machine learning model leverages features extracted from customer support messages, trip information, and customer selections in the ticket issue submission hierarchy outlined earlier. 
According to the feature importance scores generated by our model (and unsurprisingly), the most valuable feature for identifying issue type is the message customers send to agents about their issue before officially submitting their ticket through the hierarchy. Since the messages users send are useful for understanding what issue they are dealing with, we built a NLP pipeline to transform text across several different languages into useful features for our machine learning models downstream.
NLP models can be built to translate and interpret different elements of text, including phonology, morphology, grammar, syntax, and semantics. Depending on the building units, NLP can also register character-level, word-level, phrase-level, or sentence/document-level language modeling. Traditional NLP models are built by leveraging human expertise in linguistics to engineer handcrafted features. With the recent upsurge in end-to-end training for deep learning models, researchers have even begun to develop models that can decipher full chunks of text without having to explicitly parse out relationships between different words within a sentence, instead using raw text directly.
For our use case, we decided to first build an NLP model that analyzes text at the word-level to better understand the semantics of text data. One popular approach to NLP is topic modeling, which aims to understand the meaning of sentences using the counting statistics of the words. Although topic modeling does not take into account word ordering, it has been proven very powerful for tasks such as information retrieval and document classification.
Figure 3: The NLP pipeline we built for ticket issue identification and solution selection is composed of three distinct steps: preprocessing, feature engineering, and computation via pointwise ranking algorithm.In COTA, we use the following topic-modeling-based NLP pipeline to handle text messages, as outlined in Figure 3:
Preprocessing
We first clean the text by removing HTML tags. Next, we tokenize the message’s sentences and remove stopwords. Then, we conduct lemmatization to convert words in different inflected forms into the same base form. Finally, we convert the documents into a collection of words (a so-called bag of words) and build a dictionary of those words.
Topic modeling
To understand our user intent, we then perform topic modeling on the bag of words after preprocessing. Specifically, we use TF-IDF (term frequency-inverse document frequency) and LSA (latent semantic analysis) to extract topics. Figure 4a, below, shows some examples of the types of topics we might obtain from topic modeling:
Figure 4: a) Topic modeling: we use TF-IDF and LSA to extract topics from rich text data in customer support tickets processed by our customer support platform. b) Feature engineering: all the solutions and tickets are mapped to the topic vector space, and cosine similarity between solution and ticket pairs are computed.Feature engineering
Topic modeling enables us to directly use the topic vectors as features to perform downstream classifications for issue type identification and solution selection. However, this direct approach suffers from a sparsity of topic vectors; in order to form a meaningful representation of these topics, we typically need to keep hundreds or even thousands of dimensions of topic vectors with many dimensions having values close to zero. With a very high-dimensional feature space and large amount of data to process, training these models becomes quite challenging.
With these considerations in mind, we decided to use topic modeling in an indirect fashion: performing further feature engineering by computing cosine similarity features, as illustrated in Figure 4b. Using solution selection as an example, we collect the historical tickets of each solution and form the bag-of-word representation of such a solution. 
In this scenario, a topic modeling transformation is carried out on the bag-of-word representation, which gives us a vector Ti  for solution i. We conduct this transformation across all of our solutions. We can map any new incoming ticket, j, to the topic vector space of the solution, T1, T2… Tm, where m is the total number of possible solutions to use. This results in a vector tj for ticket j. Cosine similarity score sij can be computed between Ti and tj to represent the similarity between solution i and ticket j, which reduces the feature space from hundreds or thousands of dimensions to a handful. 
Pointwise ranking algorithm
Again, we illustrate how our ML algorithm works using solution selection as an example. To design this algorithm, we combined cosine similarity features together with other ticket and trip features that matches tickets to solutions. With over 1,000 possible solutions for hundreds of ticket types, COTA’s large solution space offers a challenge for our algorithm of distinguishing the fine differences between these solutions. 
To identify the best possible recommendations for support agents, we apply a learning-to-rank approach and build a retrieval-based pointwise ranking algorithm. Specifically, we label the correct match between solution and ticket pair as positive (1), and we sample a random subset of solutions that do not match with the ticket and label the pairs negative (0). Using the cosine similarity as well as ticket and trip features, we can build a binary classification algorithm that leverages the random forest technique to classify whether or not each solution-ticket combination matches. Once the algorithm scores each possible match, we can rank the scores and suggest the three top-ranked solutions.
Figure 5, below, compares the performance of a classical multi-class classification algorithm using topic vector features directly against the pointwise ranking algorithm using engineered cosine-similarity features:  
Figure 5: Pointwise ranking is 25 percent more accurate than the multi-class classification on the solution selection task.The cosine-similarity-based pointwise ranking algorithm outperforms the multi-class classification algorithm with direct topic vectors, with a 25 percent relative improvement  in accuracy. This comparison, conducted on the same dataset using the same type of algorithm (random forest) with the same hyperparameters, highlights the benefit of using engineered cosine-similarity features in a ranking framework. As evidenced in Figure 5, using the pointwise ranking algorithm not only speeds up the training process by 70 percent, but also significantly improves model performance. 
 
Easier and faster ticket solving = better customer support
COTA’s promising results were only meaningful if they translated to a real world setting. To measure COTA’s impact on our customer support experience, we conducted several controlled A/B test experiments online on English language tickets. In those experiments, we included thousands of agents and randomly assigned them into either control or treatment groups. Agents in the control group were exposed to the original workflow, while agents in the treatment group were shown a modified user interface containing suggestions on issue types and solutions. We collected tickets solved solely by either agents in the control or treatment group, and measured a few key metrics, including model accuracy, average handle time, and customer satisfaction score. 
The test proceeded as follows: 
We first measured the online model performance for both groups and compared them with offline performance. We found that the model performance is consistent from offline to online. 
Then, we measured customer satisfaction scores and compared them across control and treatment groups. In general, we found that customer satisfaction often increased by a few percentage points. This finding indicates that COTA delivers the same or slightly higher quality of customer service. 
Finally, to determine how much COTA affected ticket resolution speed, we compared the average ticket handling time between the control and treatment groups. We determined that, on average, this new feature reduced ticket handling time by about 10 percent.
By improving agent performance and speeding up ticket resolution times, COTA helps our Customer Obsession team better serve our users, leading to increased customer satisfaction. Moreover, COTA’s ability to expedite ticket resolution saves Uber tens of millions of dollars every year. 
 
Deep learning for the next generation of COTA
The success of COTA convinced us to continue experimenting with our machine learning stack to improve system accuracy and provide an even better experience for both agents and end users. 
Recent advancements in text classification, summarization, machine translation, and many auxiliary NLP tasks (syntactic and semantic parsing, recognizing textual entailment, named entity recognition, and linking) have been obtained using deep learning architectures, so it seemed like a natural fit to start experimenting with them for our own models.
Deep learning experiments with various architectures
With the support of researchers from Uber AI Labs, we experimented with applying deep learning to the next generation of models for issue type identification and solution suggestion. We implemented several architectures based on convolutional neural networks (CNNs), recurrent neural networks (RNNs), and several different combinations of the two, including hierarchical and attention-based architectures. 
Using deep learning frameworks, we were able to train our models in a multi-task learning fashion, with a single model capable of both identifying the issue type and suggest the best possible solution. Since issue types are organized into a hierarchy, we determined that we could train the model to predict the path in the hierarchy with a recurrent decoder using beam search, similar to the decoding component of a sequence to sequence model, and allowed for even more accurate predictions. 
Hyperparameter optimization to select the best model
To nail down on the best deep learning architecture, we performed large scale hyperparameter optimization for all types of architectures, training them in parallel on our GPU cluster. The final results suggest that the most accurate architecture is one that applies both CNNs and RNNs, but for the purpose of our research, we decided to pursue a simpler CNN architecture that was slightly less accurate but had more advanced computational properties in terms of training and inference time. In the end, the model we settled on provides about 10 percent greater accuracy with respect to the original random forest model. 
In Figure 6, below, we show the tradeoff between data coverage (in other words, the percentage of tickets that a model is processing, the x-axis) and accuracy (the y-axis) on that subset of tickets. As depicted in Figure 6, below, both models became more accurate as the data coverage decreased, but our deep learning model exhibited higher accuracy for the same coverage and higher coverage for the same accuracy.
Figure 6: A comparison between the ability of our deep learning model and classical model (random forest) to identify issue type reveals that the deep learning model achieves greater data coverage and accuracy.In collaboration with Uber’s Michelangelo team, we are in the final phase of productization of these deep learning models.
 
Next steps 
Needless to say, we are excited about the opportunity to further leverage these technologies to making the customer support experiences of our agents and users even more seamless. Stay tuned for future updates on our analyses and experiments as we continue exploring the world of deep learning for NLP! 
If you are interested in tackling engineering challenges that drive business impact at scale, consider applying for a role on our Applied Machine Learning team or our San Francisco and Palo Alto-based Customer Obsession Engineering teams. If you are interested in machine learning and natural language processing research, learn about job opportunities with Uber AI Labs. 



Huaixiu Zheng and Yi-Chia Wang are data scientists on Uber’s Applied Machine Learning team, and Piero Molino is a research scientist with Uber AI Labs. COTA is a cross-functional collaboration between Customer Support Platform, Applied Machine Learning, Michelangelo, and Uber AI Labs. Hongwei Li, Andy Harris, Monis Ahmed Khan, Alexandru Grigoras, Viresh Gehlawat, Basab Maulik, Chinmay Maheshwari, and Ron Tal also made important contributions to this project.

Huaixiu Zheng

Huaixiu Zheng is a senior data scientist at Uber, working on projects in the domains of deep learning, reinforcement learning, natural language processing and conversational AI systems.

Yi-Chia Wang

Yi-Chia Wang is a research scientist at Uber AI, focusing on the conversational AI. She received her Ph.D. from the Language Technologies Institute in School of Computer Science at Carnegie Mellon University. Her research interests and skills are to combine language processing technologies, machine learning methodologies, and social science theories to statistically analyze large-scale data and model human-human / human-bot behaviors. She has published more than 20 peer-reviewed papers in top-tier conferences/journals and received awards, including the CHI Honorable Mention Paper Award, the CSCW Best Paper Award, and the AIED Best Student Paper Nomination.

Piero Molino

Piero is a Staff Research Scientist in the Hazy research group at Stanford University. He is a former founding member of Uber AI where he created Ludwig, worked on applied projects (COTA, Graph Learning for Uber Eats, Uber’s Dialogue System) and published research on NLP, Dialogue, Visualization, Graph Learning, Reinforcement Learning and Computer Vision.

Posted by Huaixiu Zheng, Yi-Chia Wang, Piero Molino

Category:

DragonCrawl: Generative AI for High-Quality Mobile Testing

April 23 / Global

Engineering, AI, Data / ML

Scaling AI/ML Infrastructure at Uber

March 28 / Global

Engineering, AI, Backend, Data / ML

DataCentral: Uber’s Big Data Observability and Chargeback Platform

February 1 / Global

Engineering, AI

Palette Meta Store Journey

January 18 / Global

COTA: Improving Uber Customer Care with NLP & Machine Learning

Customer support before COTA

Introducing COTA: Customer Obsession Ticket Assistant

Building the COTA backend with NLP and ML

Preprocessing

Topic modeling

Feature engineering

Pointwise ranking algorithm

Easier and faster ticket solving = better customer support

Deep learning for the next generation of COTA

Deep learning experiments with various architectures

Hyperparameter optimization to select the best model

Next steps

Related articles

DragonCrawl: Generative AI for High-Quality Mobile Testing

Scaling AI/ML Infrastructure at Uber

DataCentral: Uber’s Big Data Observability and Chargeback Platform

Palette Meta Store Journey

Cinnamon Auto-Tuner: Adaptive Concurrency in the Wild

Most popular

Load Balancing: Handling Heterogeneous Hardware

Using Uber: your guide to the Pace RAP Program

Balancing HDFS DataNodes in the Uber DataLake

Model Excellence Scores: A Framework for Enhancing the Quality of Machine Learning Systems at Scale

Products

Company

Sign up to drive

Sign up to ride