Skip to main content
Uber logo

Schedule rides in advance

Reserve a rideReserve a ride

Schedule rides in advance

Reserve a rideReserve a ride
Uber AI

Montezuma’s Revenge Solved by Go-Explore, a New Algorithm for Hard-Exploration Problems (Sets Records on Pitfall, Too)

November 26, 2018 / Global
Featured image for Montezuma’s Revenge Solved by Go-Explore, a New Algorithm for Hard-Exploration Problems (Sets Records on Pitfall, Too)
Example of detachment in intrinsic motivation (IM) algorithms. Green areas indicate intrinsic reward, white indicates areas where no intrinsic reward remains, and purple areas indicate where the algorithm is currently exploring.
High-level overview of the Go-Explore algorithm.
Example downsampled cell representation. The full observable state, a color image, is downscaled to an 11 by 8 grayscale image with 8 pixel intensities.
Number of rooms found by Go-Explore during the exploration phase without domain knowledge (via a downscaled pixel representation).
Comparison of Go-Explore without domain knowledge against other RL algorithms on Montezuma’s Revenge. Each point in the plot represents a different algorithm that was tested on Montezuma’s Revenge.
Number of rooms found by Phase 1 of Go-Explore with a cell-representation based on easy-to-provide domain knowledge derived from pixels only.
Comparison of Go-Explore with domain knowledge against other RL algorithms on Montezuma’s Revenge. Red dots indicate algorithms given the solution in the form of a human demonstration of how to solve the game.
Rooms found (left) and rewards obtained (right) in the exploration phase of Go-Explore on Pitfall.
Comparison of deep neural network policies produced by Go-Explore (after robustification) with domain knowledge against other RL algorithms on Pitfall.
Adrien Ecoffet

Adrien Ecoffet

Adrien Ecoffet is a research scientist with Uber AI Labs.

Joel Lehman

Joel Lehman

Joel Lehman was previously an assistant professor at the IT University of Copenhagen, and researches neural networks, evolutionary algorithms, and reinforcement learning.

Kenneth O. Stanley

Kenneth O. Stanley

Before joining Uber AI Labs full time, Ken was an associate professor of computer science at the University of Central Florida (he is currently on leave). He is a leader in neuroevolution (combining neural networks with evolutionary techniques), where he helped invent prominent algorithms such as NEAT, CPPNs, HyperNEAT, and novelty search. His ideas have also reached a broader audience through the recent popular science book, Why Greatness Cannot Be Planned: The Myth of the Objective.

Jeff Clune

Jeff Clune

Jeff Clune is the former Loy and Edith Harris Associate Professor in Computer Science at the University of Wyoming, a Senior Research Manager and founding member of Uber AI Labs, and currently a Research Team Leader at OpenAI. Jeff focuses on robotics and training neural networks via deep learning and deep reinforcement learning. He has also researched open questions in evolutionary biology using computational models of evolution, including studying the evolutionary origins of modularity, hierarchy, and evolvability. Prior to becoming a professor, he was a Research Scientist at Cornell University, received a PhD in computer science and an MA in philosophy from Michigan State University, and received a BA in philosophy from the University of Michigan. More about Jeff’s research can be found at JeffClune.com

Posted by Adrien Ecoffet, Joel Lehman, Kenneth O. Stanley, Jeff Clune

Category: