Skip to main content
Uber logo

Schedule rides in advance

Reserve a rideReserve a ride

Schedule rides in advance

Reserve a rideReserve a ride
AI, Engineering

Uber Goes to NeurIPS 2019

December 4, 2019 / Global
Featured image for Uber Goes to NeurIPS 2019
Visualization of the sampling paths drawn from a Sequential Monte Carlo (SMC) algorithm. Each node represents a weighted sample, connected to its parent from the previous generation. Nodes with a surviving descendant at the final generation are shown in black (others in gray), coalescing back in time onto a common ancestral lineage. This effect is known as path degeneracy and affects the performance of an SMC algorithm. A single path is drawn at the final generation, shown in blue.
The computational graph for the generative model for Bayesian structure and parameter learning contains three types of nodes: sums (S), products (P) and leaves (L).
Exact GPs can be trained on significantly larger datasets using multiple GPUs and conjugate gradients than with the standard Cholesky decomposition.
When experimental designs are shrewdly chosen, our uncertainty about the world—as measured by posterior entropy—rapidly decreases as we conduct more experiments. Our variational approach, depicted in blue, reduces uncertainty faster than the baselines in this adaptive experiment inspired by behavioral economics.
(Left) The true contours of the unknown black-box function. (Middle left) The contours of the surrogate model fitted to the observations (black dots). The current local region is shown as a red square. The global optima are depicted by green stars. (Middle right) During the execution of the algorithm, the local region has moved towards the global optimum and has reduced in size. The area around the optimum has been sampled more densely in effect. (Right) A zoom-in of the local region shows that the surrogate model almost exactly fits the underlying function locally, despite having a poor global fit.
The proposed Two-Step Bayesian optimization algorithm samples next in an area of large posterior uncertainty and thus will obtain useful information for future samples, whereas Expected Improvement is overly exploitative.
Lottery tickets can be found by using many mask criteria (left), some of which lead to Supermasks (right): masks that lead to performance better than chance when overlaid on untrained weights.
Loss Change Allocation (LCA) decomposes each change in loss during each training iteration into per-parameter contributions, showing which parameters helped the network learn that step (green) and which hurt its learning (red).
Graph Recurrent Attention Networks (GRANs) auto-regressively generate one block of nodes and edges at a time.
The Attention Attractor Network meta-learns how to learn novel classes while remembering old classes without the need to review the original training set.
Hamiltonian Neural Networks learn exact conservation laws. As a result, their predictions over time do not decay or explode as happens in vanilla feed-forward neural networks.
Matthias Poloczek

Matthias Poloczek

Matthias Poloczek leads the Bayesian optimization team at Uber AI. His team focuses on fundamental research and its applications at the intersection of machine learning and optimization.

Posted by Matthias Poloczek, Molly Spaeth

Category: