Start ordering with Uber Eats

Order now
Data / ML

Freight Pricing with a Controlled Markov Decision Process

April 27, 2021 / Global


Uber Freight was launched in 2017 to revolutionize the business of matching shippers and carriers in the huge and inefficient freight trucking industry (around $800B annual spend in the US). We believe, and have demonstrated, that a technology-first freight broker and marketplace can provide better opportunities to carriers, and superior outcomes to shippers and communities alike. 

One of the wasteful processes we set out to eliminate through technology is the lengthy haggling between traditional freight brokers and carriers for the price of a load (a shipment in freight lingo). This practice stems from a lack of transparency on freight prices and on the willingness-to-get-paid by the carrier. Inspired by the role pricing innovation played in Uber’s massive growth, we decided to be the first freight broker to offer a transparent dynamic carrier pricing that “clears the market” through advanced algorithms, rather than with old school haggling that wastes hours and draws liquidity from the market.

In this post, we describe our framework to generate the optimal sequence of upfront prices of tens of thousands of loads per day.


Business problem

Uber Freight accepts new loads to be hauled from shippers at all times, most often without human intervention. Each load has a pick-up location and time, a drop-off location and time, but also additional requirements, e.g. weight, that we will ignore in this write up. 

Once Uber Freight commits to haul these loads, we have until the pick-up time to find a carrier to move the load. This window between the current time and the load’s pickup time – referred to as lead time – can be as low as 3 hours, and is on average 4-5 days long.

We use this time window to digitally match carriers to loads through our carrier app, web portal, and API, using search, recommendations, and notifications, with the goal of covering the load — having a carrier book it for the upfront price. The pricing decision, guided by the algorithm, plays a key role in regulating the load booking rate.

If we fail to automatically cover the load, we have two potential fallbacks. We can use proactive human outreach (our operations team) to cover the load manually, or we can “roll” the load — reschedule it to a later pickup date. Both have additional operational costs, and the latter also has a lifetime value cost with the shipper.

Data Science Framework

Essentially, our upfront pricing algorithm needs to find for each load , the sequence of prices over the time , that minimizes the sum of the expected cost for all loads


In Operations Research terms, we face a problem of optimal pricing 

  1. of perishable goods — we incur an operational costs and penalties if not covered by pickup time,
  2. with a replenishing inventory — loads constantly arrive in the system,
  3. with cancellations — carriers and shippers sometimes cancel loads,
  4. under uncertainty — driver and load arrivals are uncertain, and so is the price that will result in a booking,
  5. with goods which are partially substitutable — each carrier is interested in a subset of similar loads in time and space,
  6. with a global constraint — our manual coverage arm capacity is capped.

If we model our problem as a Markov Decision Process, this problem is very similar to the problem studied in Optimal Dynamic Pricing of Inventories with Stochastic Demand over Finite Horizons (Gallego and van Ryzin, 1994) and the associated literature.

To understand the basic dynamics, it is useful to consider the case of a single load, and put contexts B to F aside. By doing so, we notably ignore that loads compete for the same drivers or for Ops capacity, which we will revisit in a future blog post.


Single load case

Let us note, the expected cost of covering a load of characteristicsavailable at lead time . We also note the probability that it will be booked in if we set its price to given the state of the system which notably represents market conditions.

Under the optimal pricing policy, satisfies the following recursive Bellman optimality equation


with terminal condition where

  • corresponds to i) the expected carrier cost negotiated manually and ii) the operational cost associated with manual coverage; and
  • corresponds to the i) expected carrier cost of the following day, together with ii) the operational cost of rescheduling and iii) cost to the shipper relationship.

We can take the minimum of the two, assuming we choose optimally between these options at .

Before we jump into the modeling side, we illustrate the outcome of the model. Below are the price trajectories of 2 “identical” loads picking up on Thanksgiving Eve, one going from Miami, FL to Atlanta, GA and the other going the reverse way. Notice the differences in i) the price level and ii) the amplitude of the price variation over time. Both mostly stem from the freight network topology — carriers in Miami have fewer options than those in Atlanta and outbound Florida freight is usually cheaper than identical inbound freight. Optimizing pricing in the context of these patterns is a critical task for our solver.

Pricing for pickup on Thanksgiving eve (2020-11-24) for Atlanta to Miami, and Miami to Atlanta


Going back to Eq2, there are hence 2 key quantities that one needs to determine in order to find the optimal price path:

  1. Booking probabilitythat captures the time-varying price elasticity of the load 
  2. Terminal value that captures the unique characteristics of the load 

A common challenge in estimating these quantities is that the data is censored and subject to survival bias — we observe loads only until they are booked. Over the past few years, we have developed several best practices to handle these prediction problems.

Booking probability

At its heart, this is a binary classification problem. However, there are some characteristics of the problem which are worth highlighting:

  • Measure price elasticity. Logloss or pROC are natural ways to measure the performance of that type of model. However,is used inside a price-based solver, so loss of price elasticityshould be weighed against accuracy improvements, as low elasticities will produce unrealistically aggressive price curves.
  • Enforce price monotonicity. This is obvious but worth remembering: booking probability should increase with price. Fortunately, over time, multiple models like XGBoost have gained the ability to enforce monotonicity, allowing new tradeoffs between complexity and stability.
  • Augment dataset. On top of the upfront price, we also receive bids automatically or manually. These bids provide a useful counterfactual of what could have happened if we had changed the price – and only the price, which makes them valuable for training.
  • Correct for censoring. The database containing load prices and the booking outcome does not match what the solver uses: we have few datapoints at , whereas the Bellman equation always start from. This means that the data may have to be weighted to reflect the importance of short lead times in the solver.
Distribution of datapoints used during prediction vs observed



Terminal value

One of the largest determinants of our pricing trajectory is the terminal value that the Markov Decision Process will approach as lead time shrinks. This problem is mostly ignored in the literature of optimal dynamic pricing where the terminal value is most often assumed to be known.

In our case, the terminal value is uncertain and difficult to observe:

  • The freight industry – of which Uber Freight is only one participant – is volatile and rates can move by double digit % in days
  • Only a small fraction of Uber Freight loads are manually booked or rolled, with most of the loads covered well ahead of their terminal stage
  • And the observations of those terminal values are subject to survival bias: underpriced loads are more likely to reach .

Over time we developed a couple of techniques to improve the performance.

  • Use non terminal values. We have found that the terminal value is best estimated through tree based models that incorporate not only deals done at , but also use deals done at high lead times – withused as a feature. 
  • Correct offline training data for survival bias. Past deals that resulted in very quick booking can bias the training sets. Based on simulation it is possible to calculate the expected bias for a given booking speed and correct the price accordingly.
  • Capture the impact of service levels on shipper LTV. Some shippers have explicit penalties for lack of performance, but most do not. We approximate the “penalty” of rolling a shipper’s load to the next day to make the tradeoff between service and margin explicit.
  • Explore the lowest market price. Similarly to multi-arm bandit problems, we are uncertain of a key value and interested in maximizing the outcome on the cumulative life of a load. Taking inspiration from “Upper Confidence Bound” approaches, we found it useful to discount based on estimated variance and time spent available.


Building a resilient system

The efficacy of the approach shared above hinges on the accuracy of the predictive models and the fit of the Markov Decision Process to the problem at hand. These can be compromised when situations arise that are far from historical behaviors, for example when the market shifted dramatically during COVID-19 related events in 2020.

Hence, early on, we found it useful to add a monitoring algorithm to absorb shocks our model might miss. It effectively creates a form of PID controller which limits the drift of errors in the system, and helps enforce the booking speed targets set by the solvers.

As you can see below, our controller was particularly active in Q1 of 2020, when demand briefly surged before decreasing significantly.

Average controller values for pickup between February and May 2020

Final thoughts

Above we discussed a base version of the problem of optimally pricing freight in a dynamic environment, putting a lot of context aside and focusing on a single load without impact on the rest of the system, especially the supply of drivers or the operational capacity.

We were extremely pleased to see that a Markov Decision Process produced superior pricing outcomes, not only because it is exciting to see theory working in practice, but also because the theoretical foundation allows us to extend the framework. For example, Uber Freight actually has multiple channels beyond booking through upfront price, notably bidding and committed capacity. The framework naturally generalizes to handle channels with different levels of friction and intent.

If you enjoyed thinking through this problem, reach out. We are hiring