Skip to main content
Featured image for Cinnamon Auto-Tuner: Adaptive Concurrency in the Wild
Image
Figure 1: The relationship between max concurrent requests and throughput. At some point the service can’t handle more and it drops fast.
Image
Figure 2: Architecture diagram of Cinnamon, with the scheduler and auto-tuner part highlighted.
Image
Figure 3:  Prioritized request scheduling
Image
Image
Figure 4: The lower the limit – the more tolerance to latency increase.
Image
Figure 5: The aggregation process from individual request timings to a smoothed value.
Image
Figure 6: The ever drifting targetLatency issue in effect. When resetting targetLatency the new targetLatency is captured at a higher limit, which leads to both of them drifting up. Note that at ~14:10 the overload stopped.
Image
Figure 7A: Positive covariance between the number of inflight requests and throughput.
Image
Figure 7B: Negative covariance between the number of inflight requests and throughput.
Image
Figure 8: The effect of covariance when resetting the inflight limit. Both inflight limit and the latency samples (i.e., request timing) are now stable.
Image
Figure 9: Overloading one node in production using Ballast
Image
Figure 10A: Throughput and latency during overload
Image
Figure 10B: Inflight limit (top); the ratio between the number of inflight requests
 and the inflight limit (middle); CPU usage (bottom)