Start ordering with Uber Eats

Order now
AI, Culture

First Uber Science Symposium: Discussing the Next Generation of RL, NLP, ConvAI, and DL

February 12, 2019 / Global

At Uber, hundreds of data scientists, economists, AI researchers and engineers, product analysts, behavioral scientists, and other practitioners leverage scientific methods to solve challenges on our platform. From modeling and experimentation to data analysis, algorithm development, and fundamental research, the methods used by our teams reflect a commitment to advancing scientific discovery beyond the scope of our own use cases.

On November 28, 2018, members of Uber’s scientific community hosted the first-ever Uber Science Symposium, a periodic event dedicated to discussing and collaborating on the latest innovations in data science, machine learning (ML), artificial intelligence (AI), economics, applied behavioral science, and other scientific fields. For our inaugural symposium, attendees from both academia and industry met for a full-day of presentations, workshops, and conversations around ML and AI focused on three tracks: reinforcement learning (RL), natural language processing (NLP) and conversational AI, and deep learning and deep learning infrastructure.

With our first symposium and those that follow, we hope to further engage with the external community to forge connections and spread ideas at the forefront of science.

Zoubin Ghahramani, Uber’s Chief Scientist, speaks during the opening session of the symposium.


First Uber Science Symposium Flyer


Reinforcement learning

Reinforcement learning (RL) is a hugely active area of research driving many exciting recent breakthroughs in ML and AI. Despite the slew of successes, several major challenges to using RL, such as sample efficiency, reward sparsity, and safety, have been identified which have prevented its more widespread adoption in industry. The importance of these topics was reflected by the research of the speakers.

Many talks at the Symposium addressed challenges related to sample efficiency, which is the desire for agents to learn high quality policies using few real-world interactions. Topics like hierarchical skill training, transfer learning, and meta-learning actively seek to allow agents to learn faster.

Another prominent theme was the use of alternative reward mechanisms to tackle reward sparsity (the condition where positive rewards are rarely experienced and can be hard to initially discover). Some of these approaches try to provide easier and more reliable learning, while others encourage the agent to explore previously unseen regions of the state space. Additionally, speakers discussed how to help align the behaviors of RL agents with our intentions regarding what tasks we want them to perform.

Ilya Sutskever of OpenAI presents on his team’s latest research.


Ilya Sutskever, Chief Scientist of OpenAI, kicked off the track by highlighting three exciting new research projects coming out of OpenAI: their world-class DOTA playing agents, work they have done on complex dexterous robotic manipulation, and curiosity-driven agents which have made progress on notoriously challenging domains like Montezuma’s Revenge.

He ended his presentation by making the case for why people shouldn’t dismiss the possibility of near-term development of artificial general intelligence, despite high uncertainty, arguing that in many domains, such as image generation, machine translation, and game playing, progress over the past six years has been astonishingly rapid. He concluded that if algorithmic progress continues, combined with orders of magnitude increases in available compute power, near-term Artificial General Intelligence (AGI) is a serious possibility. He argued that this necessitates proactively planning for risks that AGI might create.

Anca Dragan of UC Berkeley delivered the next talk in this track, titled “Optimizing Robot Action for and around People.” She presented her work on AI safety by presenting it through the paradigm of building artificial agents with a “theory of mind” for people. In this framework, agents maintain a view of humans as approximately optimizing some utility function, which helps them better understand and anticipate human behavior, enabling agents to better coordinate with humans.

To depict this, Anca gave the example of an autonomous driver who needs to be able to edge into a busy lane on a highway. She also made the point that if the robot estimates online to what extent the model captures the current person’s behavior, it will also be able to stay safe in situations where people deviate significantly from its assumptions of approximate optimality.

The final portion of her talk focused on the challenge of specifying a cost function to achieve the desired behavior. She made the case that cost design is itself a human-robot interaction: robots must understand that the given cost function is only a proxy for true desired behavior and should collaborate with humans to optimize what the humans want.

Anca Dragan of UC Berkeley discusses AI safety.


Next we had a panel discussion moderated by Jeff Clune of Uber AI Labs featuring RL experts Satinder Singh of the University of Michigan, Karol Hausman of Google Brain, and Abhishek Gupta of UC Berkeley. The discussion focused on some of the key existential questions in the RL community: What explains the rarity of public examples of RL being used in industry? How can we improve sample efficiency? Why do we not see more hierarchical and model-based RL? Is RL a type of algorithm or a type of problem? There were interesting discussions on all of these points.

Abhishek Gupta of UC Berkeley, Karol Hausman of Google Brain, Satinder Singh of Michigan, and moderator Jeff Clune of Uber AI Labs discuss related to RL research.


The afternoon RL session began with back-to-back talks from Satinder Singh and John Schulman of OpenAI. The first centered around the reward stream, a fundamental component of reinforcement learning. Professor Singh started by proposing a reformulation of the RL problem: Instead of treating rewards as purely exogenous, agents instead have an internal critic which supplies them a private reward signal. He then introduced the Optimal Reward Problem where an agent designer must choose a reward stream to induce a given agent to accomplish the agent designer’s goals. An example of this would be a very sparse reward environment where the agent designer can provide breadcrumbs that lead the agent toward the desired behavior. He gave an overview of one approach to this problem from his previous research, which used policy gradients for reward design. He closed this portion of the talk with a description of a recent paper in the same vein that was presented at NeurIPS last December.

John Schulman’s talk was titled “Faster Reinforcement Learning via Transfer.” He explained the motivation for work in the field of meta-learning by pointing out the large quantity of data needed by RL algorithms like policy gradients. Meta RL attempts to minimize the amount of training required on a new task by learning to maximize performance over a distribution of related tasks. John then described OpenAI’s Sonic contest, a competition in which he and his colleagues invited the machine learning community to train agents on a large body of Sonic video game levels, then tested the agents on an unseen set of brand new levels that were built for the contest. This setup stresses the need for agents to be able to meta-learn how to quickly learn to play different instances of a game.

Interestingly, a broad takeaway was that despite much interest in meta-learning and belief in its ultimate promise, currently meta-learning algorithms do not seem to gain much lift from meta-learning on a distribution of different versions of the task before attempting to learn the new task, versus just learning on the new task from scratch, suggesting that much important research remains to be done to harness the benefits of meta-learning.

John Schulman of OpenAI presents his research in reinforcement learning.


Abhishek Gupta then gave an exciting talk titled “Simplifying Supervision in Deep Reinforcement Learning,” which detailed ways in which the AI community can reduce the burden of human supervision in RL. For example, specifying reward functions can be a challenge–high-level requests (e.g., putting a glass on a shelf) are often sparse and hard for an agent to learn to maximize. Reward shaping can make this easier, but can also be burdensome and prone to unintended consequences. The work of Abhishek (and others) to combine sparse rewards with human demonstrations has helped alleviate this problem. He also discussed the work he has done on imitating human behavior from raw video, guiding policies with natural language, and learning skills without a reward function.

Concluding the day’s RL talks, Karol Hausman spoke on “Discovering Latent Structure in Deep Robotic Learning.” According to Karol, the key idea is to learn multiple re-usable skills and embed them in a skill space using RL with variational inference. This approach has the potential to allow agents to use, combine, and hybridize rich, different skills from a large distribution of skills, which helps robots learn to solve complex tasks. Karol showed impressive results on a variety of simulated tasks. His recent work has further demonstrated the capacity to use task representations to improve sim-to-real transfer (here and here). Karol concluded by highlighting the exciting future directions for improving and exploiting this approach.

Natural language processing and conversational AI

Our symposium’s second track, Natural Language Processing and Conversational AI, was organized by Gokhan Tur and Mahdi Namazifar, the director and the tech lead of the NLP and ConvAI team at Uber, respectively, and had a jam packed schedule of 15 presentations. Due to the high number of presentations in this track we will not go too deep into the content of each in this article, instead offering a high-level view of what was presented.

There were three general themes around the presentations in this track:

  1. Dialog systems
  2. Automatic Speech Recognition (ASR)
  3. Natural Language Processing (NLP)

Gokhan opened the track with a presentation that provided an overview of the past, present, and future of the field, as well as discussing some of Uber’s use cases for NLP and Conversational AI.

Dialog systems

The event hosted several presentations focused on dialog systems. Professor Mari Ostendorf of the University of Washington presented the efforts of her team around their Alexa Prize competition project that won first place in 2017.

Jianfeng Gao from Microsoft Research presented work on Deep DynaQ, which applies model-based and model-free RL to train dialog systems. He also presented the current and future directions that his team is taking with regards to this line of research.

Raefer Gabriel from Amazon Alexa presented Amazon’s efforts around the Alexa Prize competitions and advances that are made in different metrics of evaluating NLP and dialog systems through these efforts.

Ian Lane from Carnegie Mellon University presented his lab’s research around joint modeling of users and agents in end-to-end training of dialog systems. Ian’s former PhD student, Bing Liu, who is currently with the Facebook Conversational AI team, presented his research on end-to-end training of task-oriented dialog systems using GANs.

Ryan Lowe from McGill University talked about challenges in evaluating the performance of dialog systems and highlighted the issue with current approaches and the necessity of developing new evaluation methods.

Adwait Ratnaparkhi, Director of Conversational AI at Roku, presented Roku’s efforts on providing dialog-based user interfaces in its products. Trung Bui from Adobe Research presented work on multi-modal conversational systems for image editing in Adobe products through conversations as opposed to the conventional means of traversing menus, clicking, etc.

Antoine Raux, the CTO of, presented the company’s approach to building high coverage, flexible, and context-aware conversational system through hierarchy of conversational skills.

Siva Reddy from the Stanford NLP group presented CoQA, a new dataset for conversational question answering and the group’s work on building conversational question answering systems.

Abhinav Rastogi from Google AI presented approaches to address scalability and efficiency issues in language understanding and dialog state tracking as well as new approaches for response generation.

Ryan Lowe from McGill University discusses challenges in evaluating the performance of dialog systems.

Automatic speech recognition

During our automatic speech recognition (ASR) track, Chiori Hori of Mitsubishi Research Labs presented a brief history of ASR as well as her newer work on multi-modal dialog systems and audio visual sense-aware dialogs.

Patrick Nguyen of Google AI presented the past, present, and potential future of ASR. Patrick gave an overview of classical approaches to ASR as well as modern sequence-to-sequence approaches and their challenges.

Patrick Nguyen from Google AI discusses his team’s work develop automatic speech recognition technologies.

Natural language processing

Ves Stoyanov from Facebook’s Applied Machine Learning team talked about the challenges of building NLP products for a vastly multilingual platform, such as Facebook, and their approaches to building universal word embeddings and universal sentence embeddings across many different languages.

Deep learning and deep learning infrastructure

Our final track, Deep Learning and Deep Learning Infrastructure, featured eight talks by researchers from across industrial labs and academia. The topics they discussed interweaved to paint a cogent picture of the most exciting ideas in deep learning, from the theoretical side of developing robust and multi-task learning with sensible choices of priors and losses, to the engineering side of building AI data centers and designing flexible software frameworks.

Deep learning infrastructure

The track started off with Greg Diamos, a research lead at Baidu’s Silicon Valley AI Lab (SVAIL), who delivered a comprehensive overview of recent advancements in AI. In his insights, deep learning is driven by breakthroughs in algorithms that can harness massive datasets and powerful compute accelerators like GPUs. In the end, he made the case that specialized AI datacenters are powerful tools to confront some of the hardest open problems using deep learning, and is the way to go to reach beyond human accuracy in those problems.

The ImageNet Classification is an important challenge that’s been used to mark progress in deep learning. Yaroslav Bulatov‘s (from South Park Commons) talks focuses specifically on the practical side of building the compute facilities from ground up, known as “How to train ImageNet in 18 minutes” with only $40 worth of compute. Yaroslav has been working on open source software to simplify and democratize large-scale deep learning. The key audience of this talk, and his work in large, in his words, is “people who don’t work at Google or Uber but are interested in large-scale training.”

Yaroslav Bulatov answers questions with Rosanne Liu, organizer of the Deep Learning and Deep Learning Infrastructure track.


James Bradbury, a research software engineer at Google Brain, walked us through dozens of deep learning frameworks from the past ten years of accelerating changes in his exceptionally insightful and informative summary of “The Past and Future of Deep Learning Software.” Beginning with the pioneering effort of Yann LeCun’s Lush in 1987, frameworks like Theano, Caffe, Torch, TensorFlow, PyTorch, and MXNet all have explored a variety of ways to implement, compile, and run deep learning models, but are all essentially connected by two core design concepts: automatic differentiation and accelerator offloading.

In James’ opinion, a conceptual synthesis is emerging in the field as software packages move towards hybrid execution models that provide the best of two main threads in history: imperative and graph-based approaches. During his talk, James described a number of active open-source projects that leverage techniques from programming language research and compiler engineering to reconcile the sometimes contradictory goals of a flexible and dynamic programming model and efficient compilation to specialized accelerators, including the PyTorch JIT, TensorFlow AutoGraph, Flux.jl, and Swift for TensorFlow.

Deep learning

The first talk about the the theoretical side of Deep Learning was delivered by Zachary Lipton of CMU. His presentation, on “Robust Deep Learning with Distribution Shift,” begins by noting that machine learning models break under distribution shift. According to Zachary, while we might hope that, when faced with unexpected inputs, well-designed software systems would fire off warnings, current machine learning models tend to fail in such scenarios.

Zachary’s work highlights several approaches for tackling this distribution shift. In one case, motivated by medical diagnosis in which diseases (targets) cause symptoms (observations), the focus is on label shift, where the label marginal p(y) changes but the conditional p(x|y) does not. In another case, he examined shift detection more broadly, focusing on cases including structured output and noisy inputs.

Zachary Lipton from Carnegie Mellon University explains why machine learning models break under distribution shift.


The second theoretical deep learning talk, by Bryan McCann, a research scientist at Salesforce Research, touched on deep learning and NLP tasks. While Deep Learning has improved performance on many NLP tasks, each task is often tackled individually. In his work, he introduced the Natural Language Decathlon (decaNLP), a challenge that spans ten tasks: question answering, machine translation, summarization, natural language inference, sentiment analysis, semantic role labeling, zero-shot relation extraction, goal-oriented dialogue, semantic parsing, and commonsense pronoun resolution.

He also  presented a new Multitask Question Answering Network (MQAN) that jointly learns all tasks in decaNLP without any task-specific modules or parameters in the multitask setting. MQAN shows improvements in transfer learning for machine translation and named entity recognition, domain adaptation for sentiment analysis and natural language inference, and zero-shot capabilities for text classification.

Bryan McCann from Salesforce Research introduces the new ten-task natural language challenge.


Next, Andrew Rabinovich, head of machine learning at Magic Leap, discussed NLP as it applies to the computer vision problem domain. In his talk, “Multi Task Learning for Computer Vision,” Andrew argues that deep multitask networks, in which one neural network produces multiple predictive outputs, are more scalable and often better regularized than their single-task counterparts, but multitask networks are also difficult to train without finding the right balance between tasks. Andrew went on to present novel gradient-based methods which automatically balance the multitask loss function by directly tuning the gradients to equalize task training rates.

Generative Adversarial Networks (GANs) have excited numerous researchers for their promising power in unsupervised learning and generative modeling. However, despite generating highly-valued art, real breakthroughs in those promised paradigms have been absent, said Justin Johnson, a FAIR researcher and a soon-to-be faculty member at University of Michigan-Ann Arbor. To counteract this issue, Justin proposes considering GANs as a new type of tool in the so-called deep learning toolbox for supervised learning problems with multimodal outputs.

In his talk “GANs as Perceptual Losses,” he walked through three recent projects of his that draw from all related yet distinct areas of work showing that GANs can really shine when applied to supervised vision tasks. According to Justin, while loss functions like Euclidean distances can’t translate what’s perceptually important in images, one trick can be to employ GANs to capture perceptually relevant features. In each of his examples, incorporating GANs as part of the loss function made improvements in image synthesis from scene graphs, pedestrian trajectory prediction, and information hiding with robust watermarking.

Closing out the track, Google Brain research scientist Sam Schoenholz talked about “Priors for Deep Infinite Networks,” where he posed a thought-provoking research question about two conflated concepts in the current practice of ML. On the one hand, models have a theoretically optimal performance on a given dataset or task. In principal, we would like to perform model selection to maximize this peak performance. On the other hand, we have the volume of hyperparameters over which the model will achieve reasonable performance. This leads to a scenario where it becomes impossible to tell whether an architectural tweak actually improves model performance or whether it just makes the model more trainable, and a large amount of computation goes towards hyperparameter tuning.

In his talk, Sam described an ongoing effort to make hyperparameter selection more systematic, showing that, firstly, for a variety of architectures this prior may be precisely quantified, and secondly, understanding the properties of this prior leads to more interesting discoveries, like obtaining theoretically motivated initialization schemes that outperform those used in practice.

Sam Schoenholz from Google Brain encourages machine learning practitioners to share their hyperparameter tuning experiences including, most importantly, their failures.


Final remarks

The First Uber Science Symposium was a huge success and we are grateful to the many fantastic speakers and organizers that made it so. We are excited to put together future editions and to continue to bring leading external researchers together with our science community at Uber. Although AI was the general theme of the first Uber Science Symposium, in upcoming symposia we will cover other areas relevant to Uber’s science community.

If you’re interested in tackling AI and data science challenges at scale, consider applying for a role on one of our science-focused teams.