Skip to main content

Uber AI, Engineering

Estimating Q(s,s’) with Deep Deterministic Dynamics Gradients

June 8, 2020 / Global

Share

Abstract

In this paper, we introduce a novel form of value function, Q(s,s′), that expresses the utility of transitioning from a state s to a neighboring state s′ and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value. This formulation decouples actions from values while still learning off-policy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning off-policy from state observations generated by sub-optimal or completely random policies. Code and videos are available at this http URL.

Authors

Ashley D. Edwards, Himanshu Sahni, Rosanne Liu, Jane Hung, Ankit Jain, Rui Wang, Adrien Ecoffet, Thomas Miconi, Charles Isbell, Jason Yosinski

Publication

37th International Conference on Machine Learning (ICML), 2020

Full Paper

Estimating Q(s,s’) with Deep Deterministic Dynamics Gradients (PDF)

Category:

Related articles

Engineering, Mobile

Pickup in 3 minutes: Uber’s implementation of Live Activity on iOS

July 25 / Global

Engineering, Backend

Odin: Uber’s Stateful Platform

July 18 / Global

Engineering, Backend, Data / ML, Uber AI

Navigating the LLM Landscape: Uber’s Innovation with GenAI Gateway

July 11 / Global

Interested in joining Uber Eng?

Engineering, Data / ML

Introduction to Kafka Tiered Storage at Uber

July 1 / Global

Modernizing Logging at Uber with CLP (Part II)

June 27 / Global

Engineering, Backend

How Uber ensures Apache Cassandra®’s tolerance for single-zone failure

June 20 / Global

View more stories

Sign up to drive

Sign up to ride