What happens to latency if service time is cut in half

Introduction

The goal of this post is to determine how the latency is going to be impacted if the service time is going to be cut in half. I was watching a great presentation of Martin Thompson where he mentioned this example example and wanted to freshen up my queuing theory a bit and do some calculations.

Lots of formulas

Let’s first get some formulas out of the way.

The mean latency as function of the utilization:

\(R\) is latency (residency time)

\(\mu\) is service rate

\(\rho\) is utilization

\[R=\dfrac{1}{\mu(1-\rho)}\]

\(\mu\) is the reciprocal of service time \(S\).

\[\mu=\dfrac{1}{S}\]

And if we fill this in and simplify the formula a bit, we get the following:

\[R=\dfrac{1}{\dfrac{1}{S}(1-\rho)}\] \[R=\dfrac{S}{(1-\rho)}\]

Another part we need is the calculation of the utilization:

\(\rho\) is the utilization

\(\lambda\) is the arrival rate.

\(S\) is the service time.

\[\rho = S \lambda\]

This formula is known as the Utilization Law or Little’s Microscopic Law.

And this can be converted into a function that calculates the arrival rate as a function of utilization and service time:

Comparing the latencies

\[\lambda = \dfrac{S}{\rho}\]

In Martin’s example he has 2 services:

an unoptimized service with \(S=0.1\) seconds.

an optimized service with \(S=0.05\) seconds

To make it a bit more tangible let’s plot latency curves as function of utilization for both services.

green: unoptimized service

blue: optimized service

The x-axis is the utilization and the y-axis is the latency.

The first thing we can calculate is the latency with a service time of 0.10 seconds and a utilization of 0.90 because this utilization Martin picked in his example:

\(R=\dfrac{S}{(1-\rho)}=\dfrac{0.10}{(1.0-0.90)}=1.0\) second.

The next part we need to determine is the arrival rate based on the utilization and service time. So with a service time of 0.10 and a utilization of 0.90 we get the following arrival rate:

\(\lambda = \dfrac{S}{\rho} = \dfrac{0.1}{1-0.9} = 10\) requests/second.

So if the service time is going to be cut in half, and we keep the same arrival rate, the utilization is going to be cut in half:

\[\rho = S \lambda = 0.05 \times 10 = 0.45\]

And if fill this into the calculate the latency is function of the utilization for the optimized service, we get:

\(R=\dfrac{S}{(1-\rho)} = \dfrac{0.05}{1-0.45}=0.091\) seconds.

We can see both the calculated latencies in the plot below:

So we went from a latency of 1 second to 0.091 seconds which is an \(\dfrac{1}{0.091}=11\) times improvement.

Discrepancy

Martin’s claim is that the latency improvement is 20x. But the results based on the above formulas don’t agree with that. He makes a remark that the unit of processing is half the calculated latency, but I don’t understand what he means or how this is relevant for the latency calculation.

I haven’t configured comments yet on this new website. So if you want to comment, send me a message on Twitter.