Skip to main content
AI, Engineering

Fiber: Distributed Computing for AI Made Simple

30 June 2020 / Global
Featured image for Fiber: Distributed Computing for AI Made  Simple
Figure 1: Fiber starts many different job-backed-processes, then runs different Fiber components and user processes inside them. Fiber Master is the main process that manages all the other processes. Some processes, like Ring Node, maintain communications between each member.
Figure 2: Fiber’s architecture consists of an API layer, back-end layer, and cluster layer, allowing it to run on different cluster management systems.
Figure 3: Each job-backed process in Fiber is a containerized job running on the computer cluster. Each job-backed process also has its own allocation of CPU, GPU, and other computing resources. The code that runs inside the container is self-contained.
Figure 4: Fiber can share queues across different Fiber processes. In this example, one Fiber process is located on the same machine as the queue and the other two processes are located on another machine. One process is writing to the queue and the other two are reading from the queue.
Figure 5: In a pool with three workers, as shown in this example, two pools are located on one machine and the other is located on a different machine. They collectively work on tasks that are submitted to the task queue in the master process and send results to the result queue.
Figure 6: In a Fiber Ring with four nodes, Ring node 0 and Ring node 3 run on the same machine but in two different containers. Ring nodes 1 and 2 both run on a separate machine. All these processes collectively run a copy of the same function and communicate with each other during the run.
Figure 8: In testing framework overhead for Fiber, the Python multiprocessing library, Apache Spark, and ipyprallel, we ran five workers locally and adjusted the batch size so each framework would finish in roughly one second.
Figure 9: Our overhead test showed that Fiber performed similarly to the Python multiprocessing library, while ipyparallel and Apache Spark took longer to process tasks at the one millisecond mark. The optimal finishing time was one second.
Figure 10: Over 50 Iterations of ES, Fiber scales better than ipyparallel when running ES with different numbers of workers. Each worker runs on a single CPU.
Jiale Zhi

Jiale Zhi

Jiale Zhi is a senior software engineer with Uber AI. His area of interest is distributed computing, big data, scientific computation, evolutionary computing, and reinforcement learning. He is also interested in real-world applications of machine learning in traditional software engineering. He is the creator of the Fiber project, a scalable, distributed framework for large scale parallel computation applications. Before Uber AI, he was a Tech Lead in Uber's edge team, which manages Uber's global mobile network traffic and routing.

Rui Wang

Rui Wang

Rui Wang is a senior research scientist with Uber AI. He is passionate about advancing the state of the art of machine learning and AI, and connecting cutting-edge advances to the broader business and products at Uber. His recent work at Uber was published on leading international conferences in machine learning and AI (ICML, IJCAI, GECCO, etc.), won a Best Paper Award at GECCO 2019, and was covered by technology media such as Science, Wired, VentureBeat, and Quanta Magazine.

Jeff Clune

Jeff Clune

Jeff Clune is the former Loy and Edith Harris Associate Professor in Computer Science at the University of Wyoming, a Senior Research Manager and founding member of Uber AI Labs, and currently a Research Team Leader at OpenAI. Jeff focuses on robotics and training neural networks via deep learning and deep reinforcement learning. He has also researched open questions in evolutionary biology using computational models of evolution, including studying the evolutionary origins of modularity, hierarchy, and evolvability. Prior to becoming a professor, he was a Research Scientist at Cornell University, received a PhD in computer science and an MA in philosophy from Michigan State University, and received a BA in philosophy from the University of Michigan. More about Jeff’s research can be found at JeffClune.com

Kenneth O. Stanley

Kenneth O. Stanley

Before joining Uber AI Labs full time, Ken was an associate professor of computer science at the University of Central Florida (he is currently on leave). He is a leader in neuroevolution (combining neural networks with evolutionary techniques), where he helped invent prominent algorithms such as NEAT, CPPNs, HyperNEAT, and novelty search. His ideas have also reached a broader audience through the recent popular science book, Why Greatness Cannot Be Planned: The Myth of the Objective.

Posted by Jiale Zhi, Rui Wang, Jeff Clune, Kenneth O. Stanley

Category: