Since launching in March 2009, Uber has been dedicated to fulfilling its mission of making transportation as reliable as running water, everywhere, for everyone. In 2017, we are well on our way.
At the start of 2014, the Uber app was available in 100 cities; by early 2016, this number had grown to more than 400 cities worldwide, offering transportation services beyond ride-sharing. Around that same time, just before New Year’s Eve 2015, we hit the 1 billion ride mark; not longer after in June 2016, we hit 2 billion rides. These numbers only increase as we continue to expand to more cities, lighting up the globe with access to reliable transportation. To grow Uber’s services at scale, however, we need to ensure that our operations meet the proper Internet Protocol (IP).
Uber’s current infrastructure is built on the Internet Protocol version 4 (IPv4) address standard and consists of multiple data centers, located in a few network Point-of-Presences (POPs) and the cloud. However, Uber is rapidly outgrowing IPv4; to best support our expansion, our infrastructure needs to keep pace with user growth by adopting the next IP: Internet Protocol version 6 (IPv6).
In 2016, Uber began rolling out an IPv6 data center architecture by adjusting our existing infrastructure to facilitate this expansion. In this article, we discuss best practices in designing this new network to accommodate Uber Engineering’s growth, as well as the lessons learnt during our infrastructure update to provide IPv6 support.
Making the Jump from IPv4 to IPv6
According to the Internet Society (ISOC), the world ran out of IPv4’s 4.3 billion addresses in February 2011. The total number of IPv4 addresses is over 4 billion, which is far less than the total number of mobile devices in the world. Factor this in with the growth of the Internet-of-Things (IoT) and we quickly find ourselves with a massive dearth of IP addresses.
By 2011, three of the five Regional Internet Registries (RIRs), including the Asia-Pacific Network Information Centre (APNIC), the Réseaux IP Européens (RIPE), and the Latin American and Caribbean Network Information Centre (LACNIC), had exhausted allocation of all their free IPv4 addresses. On September 24, 2015, the American Registry for Internet Numbers (ARIN) announced they, too, had run out of available IPv4 addresses.
First defined in 1996, Internet Protocol version 6 (IPv6) is the most current version of the Internet Protocol (IP) address standard and incorporates features that improve upon many of the challenges of using IPv4, including larger address space, a multicasting base specification, and stateless address autoconfiguration (SLAAC). IPv6 can hold over 340 undecillion addresses, more than enough to handle the world’s current IP needs—and certainly Uber’s.
A map created by APNIC (above) shows IPv6 capability across the globe, ranging from zero in many countries to over 56 percent in Belgium. Measurements regarding the global usage of IPv6 taken by the Internet Society in 2011 reveal that since 2012, major ISP carriers around the world, especially in the United States, have made significant progress on deploying IPv6 in their networks. Among the North American carriers, current IPv6 deployment ranges from 27.93 percent (Cox Communications) to 84.36 percent (Verizon Wireless).
These measurements suggest that IPv6 traffic on the internet is steadily increasing, yet remains much lower than IPv4 traffic. Moreover, as of April 2017, Google reports that the availability of IPv6 connectivity among Google users is 16 percent versus 84 percent IPv4 connectivity; similarly, web information company Alexa revealed that as of March 8, 2017, less than 20 percent of users across its top-ranked 1,000 websites had access to IPv6 connectivity.
Measuring IPv6 Adoption, written for the 2014 Association for Computing Machinery Conference, predicts that “by 2019 the number of IPv6 prefixes allocated will be about .25-.50 of IPv4, while the IPv6 to IPv4 traffic ratio will be somewhere between .03 and 5.0. In other words, IPv6 appears headed to be a significant fraction of traffic.” These findings suggest that at at its current rate of growth, IPv6 adoption will be too slow to accommodate the world’s rapidly augmenting connectivity needs.
IPv6 Deployment at Uber
Currently, Uber operates tens of thousands of servers and hosts over 8 million IPv4 addresses across its network.
Before 2014, Uber’s data center was hosted in a managed hosting facility. In 2014, we built our first North American data center to meet our capacity needs and provide more reliable service to our users; in 2015, Uber expanded their North American data center footprint, building many more on both coasts; and in 2016, Uber built a few network POPs and expanded into the cloud. As our user base grows throughout 2017 and beyond, IPv6 deployment will be mission critical for our expansion.
Three key factors made it clear to us that deploying IPv6 across our networks was going to be critical for maintaining our architecture’s stability at scale:
- Generous IP allocation: The size of our network has grown rapidly over the past few years, supporting thousands of server racks in our data centers. Each rack is allocated a /24 IPv4 subnet out of our Request for Comment (RFC) 1918 IP space, which includes 256 IPv4 addresses per rack. In most of our rack deployments, we host no more than 48 servers.
- Resource limitation: At this stage in our growth, we have used more than 50 percent of our 10.0.0.0/8 IPv4 subnet for internal usage. If we do not transition to IPv6, it is possible that our RFC1918 (the Internet Engineering Task Force (IETF) memorandum on methods of assigning of private IP addresses) space could be exhausted in the foreseeable future.
- Overlapping IP addresses: Traditionally, Uber’s networks defined their own IP addresses for their resources. When Uber began merging with other companies, some IPv4 addresses overlapped between two internal networks of different organizations.
After conducting extensive research, measurements, and other analysis, we realized there were also three major areas of our infrastructure that would need to be updated in order to support IPv6 deployment:
- Network architecture
- Software support
- Vendor support
First, we will discuss the makeup of each of these areas in Uber’s data center network, and then we will address how we are readying them for IPv6.
Uber’s network architecture is composed of three major sections:
Uber’s hardware is uniform and consistent, allowing for a modular and scalable data center design. Each device is typically applied with the same hardware model and is easy to scale up or down based on our needs. Our network devices support 100G/50G/25G ethernet as downlinks to the servers.
With as large a system as Uber’s, automation tooling is a must-have to build, manage, and operate our network. Our network data center builds are automated using zero-touch provisioning tools, and daily network management is aided with our in-house automation tools to manage our network configurations and IP management, along with smart monitoring tools to alert us if any issues arise.
Our data center design follows Clos design defined in the IETF RFC 7938, “Use of BGP for Routing in Large-Scale Data Centers,” which outlines how to use Border Gateway Protocol (BGP) as the dynamic routing protocol. Based on our network architecture, our bandwidth is bisectional with fast convergence and limited failure domain. Vendor interoperability is also achieved via the feature set we used to build our network, depicted below:
On top of the Clos design, we segmented our data center into modular pods and clusters. Each pod consists of the same number of server racks, and each cluster consists of several server pods. The network is thus broken into small, identical units and follows uniform high-performance connectivities between all pods. Uber’s data centers are composed of multiple clusters connected to our edge backbone network, which then connect to the internet.
In order to best support the bandwidth consumers in Uber’s network, including primarily internal east-west technologies like Hadoop, our cluster architecture follows a 1:1 unblocking network model, as demonstrated below for our IPv4 fabric network architecture:
Over time, a fabric plane concept was introduced to our network design to maintain redundancy while still keeping large shared bandwidth at each network layer. On the other hand, the 1:1 unblocking oversubscription rate means that any server host can maintain its uplink bandwidth end-to-end while communicating with another host in this IP fabric network.
To deploy IPv6 on top of this network architecture, we designated IPv6 assignments to each server rack and cluster; while server racks are allocated with a /64 IPv6 subnet, clusters are assigned and aggregated with subnet /n, n<64. These subnets are assigned from a /32 global unique IPv6 address block allocated from a Regional Internet Registry (RIR), which is kept for internal network usage. IPv6 assignments and management is handled using the aforementioned automation processes.
Since our data center network is modular with templated configurations and our Clos design uses only one top-down protocol, External Border Gateway Protocol (eBGP), enabling native IPv6 allocation across our racks is quick and painless compared to designs that use protocols that need to be reconfigured when migrating from IPv4 to IPv6. Each component of our data center cluster, for instance rack subnets, pod subnets, loopbacks, out-of-band subnets, and point-to-point subnets, follow the same IPv6 assignment process. These automated IPv6 address generation tools use similar logic to our IPv4 address allocation scheme.
Finally, our backbone network uses routing protocols such as BGP and IS-IS that can easily be updated to support IPv6 without much operational interruption.
Our network’s ability to support software also needed to be updated for IPv6 deployment, particularly as it related to facilitating IPv6 traffic on our services. Implementing software support for IPv6 was a hugely collaborative effort, spanning multiple teams across Uber, including Engineering Security and Site Reliability Engineering.
Some engineers were trained to write IPv4-only code, so their IPv6 compatibility applications and tools could be IPv4 capable as well. IPv4 and IPv6 addresses differ in both length of address, prefix and how they are represented. Some examples of common application issues we faced when transferring from IPv4-only code to code that supports IPv6 includes:
- Code with prefix lengths set to common IPv4 prefixes, e.g. /24, are no longer suitable for IPv6 prefix lengths.
- Code that is written to split “a.b.c.d” into a tuple of (“a”,”b”,”c”,”d”) will fail to catch IPv6 addresses because they are separated by a colon (“:”) instead of a period (“.”).
- Code that expects to split ‘host:port’, e.g. “a.b.c.d:8080” into (“a.b.c.d”,”80″) will fail when “[2002:a:b:c::ff]:8080” is passed to it. Similarly, code that expects to join (“a.b.c.d”,”80″) into “a.b.c.d:8080” will create invalid IPv6 addresses in the format “2002:a:b:c::ff:8080”.
- Code with regex that used to match an IPv4 address will yield an invalid result when receiving an IPv6 address.
- Databases with hard-coded column sizes that store IPv4 addresses as “varchar(16)” due to “a.b.c.d ” will be unable to store an IPv6 address.
Uber’s infrastructure uses haproxy when routing between regions. Configuration (config) files such as haproxy.cfg yaml that are widely used store their IPv4 addresses along with their hostnames. All service config files will also have to be vetted and scoped to update to include IPv6 addresses while we enable IPv6 across all the hosts.
Instead of hard-coding IPv4 addresses, we now use hostnames in the code to leverage DNS to solve the problem of transition. We are in the process of training our developers how to use IPv6-supported tools, for instance, getaddrinfo(3), to facilitate this transition.
Major hardware vendors of network devices and servers have been actively involved in supporting IPv6 for years, and a lot of IPv6 features have been implemented in them as a result. However, given the short time IPv6 has been in production and its lack of operational experience, there are still a number of bugs being reported while more organizations start to deploy IPv6 in production. Quality assurance (QA) of IPv6 needs to catch up with IPv4’s QA controls.
As more and more organizations deploy their services in the cloud, providers are accelerating their support for IPv6, including Amazon AWS, Google GCP, and Microsoft Azure.
IPv6 Deployment at Uber: 2017 and Beyond
Uber is currently lab testing their IPv6 deployment with guidance from design documentations, including IPv6 addressing and feature RFCs. Before we deploy IPv6 into production at full scale, in-depth load testing needs to be conducted in a staging environment to uncover any possible issues.
Some immediate benefits we expect to receive from full scale IPv6 deployment include but are not limited to:
- Front end: Our front end will be able to serve native IPv6 traffic.
- Organizational merge: We will have a more scalable solution to deal with overlapping IPv4 addresses using IPv6.
- Car offloading: Currently, Uber’s car docking stations for parking autonomous vehicles are resolving to one /24 IPv4 subnet. This current design is for preserving IPv4 resources while keeping configurations consistent for engineers. Moreover, IPv6 can provide a more scalable solution for our car offloading network architecture at docking stations. Of note, our IPv6 solution is only available for car offloading when a user’s iOS or Android software is able to support it.
Call-to-Action: Adopt IPv6
IPv6 deployment in an enterprise takes a lot of planning—it is never a ‘zero or all’ implementation. To support the growth of connectivity and technological innovation more broadly, we encourage developers to write code that supports both IPv4 and IPv6 on the application level. We also hope that you join us in advocating for the deployment of IPv6 in your own networks.
Over time, this growing community of early adopters can gather statistics and metrics that will help us optimize the design and performance of IPv6 before complete IPv4 exhaustion sets in. Together, we can work towards building a better Internet.
If this type of work interests you, consider applying for a role on Uber’s Core Infrastructure team.
Jean He is a network engineer on Uber’s Core Infrastructure team.