Using Triggers On Schemaless, Uber Engineering’s Datastore Using MySQL
19 January 2016 / Global
The details and examples of Schemaless triggers, a key feature of the datastore that’s kept Uber Engineering scaling since October 2014. This is the third installment of a three-part series on Schemaless; the first part is a design overview and the second part is a discussion of architecture.
Schemaless triggers is a scalable, fault-tolerant, and lossless technique for listening to changes to a Schemaless instance. It is the engine behind our trip processing workflow, from the driver partner pressing “End Trip” and billing the rider to data entering our warehouse for analytics. In this last installment of our Schemaless series, we will dive into the features of Schemaless triggers and how we made the system scalable and fault-tolerant.
To recap, the basic entity of data in Schemaless is called a cell. It is immutable, and once written, it cannot be overwritten. (In special cases, we can delete old records.) A cell is referenced by a row key, column name, and ref key. A cell’s content is updated by writing a new version with a higher ref key but same row key and column name. Schemaless does not enforce any schema on the data stored within it; hence the name. From Schemaless’s point of view, it just stores JSON objects.
Schemaless Triggers Example
Let’s see how Schemaless triggers works in practice. The code below shows a simplified version of how we do billing asynchronously (UPPERCASE denotes Schemaless column names). The example is in Python:
We define the trigger by adding a decorator, @trigger, on a function and specifying a column in the Schemaless Instance. This tells the Schemaless triggers framework to call the function—in this case, bill_rider—whenever a cell is written to the given column. Here, the column is BASE, and a new cell written in BASE indicates that a trip has finished. This fires the trigger, and the row key—here, it’s the trip UUID