Here I will be taking a closer look at a branch of mathematics known as queueing theory. I will describe it, explain its basic laws. Explore its potential applications in the world of software engineering. End with a short example of calculating service capacity with all the things explained earlier.

Let’s start with a quick recap of what was described in the previous part, which can be found here.

**Contention and Coherency — Recap**

Before, I explained coherency delay and their impact on our systems performance. I brought up concepts like Amdahl’s Law, which is focused around measuring the impact of contention and Universal Scalability Law (Gunther’s Law) which additionally accounts for coherency delay. I described them, presented their equations and plotted graphs to show the difference in numbers between both Laws.

**How Both Parts Are Related?**

As you may have noticed while reading through the previous part, getting real numbers from either Amdahl’s Law or Universal Scalability Law is quite a task. It may require lots of checks, tests and preparations. Fortunately, the queueing theory seems to help us solve this problem.

It contains laws, equations and relations which can be used to calculate the same metrics. As in the case of Universal Scalability Law and their arguments are way easier to get than ones of USL.

As a side note, I can add that Universal Scalability Law can be derived from one of the models from queueing theory. The machine repairman model — according to Neil Gunther himself.

**What Is Queueing Theory?**

Unsurprisingly, queueing theory is a branch of mathematics, focused around studying and describing queues (or, in more professional terms, lines). The whole theory is all about how lines are created, how they behave and, most important, how and why they malfunction.

It is one of these branches of mathematics which are useful in real life, e.g. it can be used in many branches of industry. From general logistics — to estimate and calculate flow of goods between stores, through telecoms — quality of service assurance or teleco flags require capacity to even our world of software — all the message queues around like Kafka or RabbitMQ.

**What is Kendall Notation — Describing Queuing Systems?**

Kendall’s notation or Kendall notion is a system for describing and classifying queuing systems proposed in 1953 by D.G.Kendall. Originally it was based on three parameters A/S/c but later another three parameters were added and notation ended up as A/S/c/K/N/D. Of course, each one of the parameters has its unique meaning and represents a different aspect of a particular system.

Below you can see the description of all the parameters and their meaning:

- A is the time between arrivals of customers to the queue
- S is the service time distribution
- c is the number of servers
- K is the capacity of the queue
- N is overall maximum number of customers
- D is the queuing discipline

Parameters** A**, **B** and **D** have a finite set of possible values — you can find more information about the most common values of these parameters here. On the other hand, parameters **c**, **K** and **N** have an infinite set of possible values. We can assign values from the full set of whole numbers.

Examples of queuing systems descriptions in Kendall notation:

**M/M/c****M/M/2****G/M/c/5****D/D/3/5/PS**

If any of the last three elements is missing, for example in **M/M/c** queue, then one should assume that **K** and **N** equal to infinity and **D** equal to FIFO.

**Queueing Theory In Mathematical Symbols**

Before we move to describing laws and relationships between various concepts within queueing theory, let’s make a short description of metrics and symbols that are commonly used to represent them.As you can see, most of the metrics are averaged calculations. It works this way to simplify the computation as otherwise it will end up with different output based on temporal traffic, while in queueing theory it is more about the long term averaged results.

Another important thing is that most of the metrics are, in fact, unitless and do not have any explainable unit like meters or grams. They are just pure values so we do not have to deal with verifying units and unit correctness.

What is more, we must remember to use time periods everywhere — for every metric based on time. For example, if our arrival rate is 4000 customers per hour and service rate is 5 per second, we have to normalize the values either to hours or seconds but we cannot compute them before such operation as our results would not be logically correct.

**Queueing Theory Laws**

I will start with the Little’s Law which is one of the most important laws concerning queueing theory. It was presented by John Little in 1954 and for many years taken for granted and used without any formal mathematical proof, which Little finally published in 1961.

The law describes the relation between the long term average number of customers in a stable system, the arrival time and residence time — the number of customers is equal to the arrival rate multiplied by residence time. In symbolic notation:

**L = AR**

It can be applied to any system including systems within systems. Another notable thing related to this law is that “it is not influenced by the arrival process distribution, the service distribution, the service order, or practically anything else.” — quoting from Simchi-Levi, D.; Trick, M. A. (2013) *Introduction to Little’s Law as Viewed on Its 50th Anniversary*.

**Be aware**

This law can only be applied to stable (stationary) systems — systems whose “unconditional joint probability distribution does not change when shifted in time. Consequently, parameters such as mean and variance also do not change over time”.

For our needs we can assume that it means that our system is not affected by any outside factors while it is up and running, e.g. changing resources it uses.

**Utilization Law**

Describes the relationship between, unsurprisingly, utilization, throughput, service time and the number of servers. Utilization is equal to throughput multiplied by service time. In symbolic notation:

**p = λS/M**

There is also an alternative equation of utilization law, which states that utilization is equal to arrival rate divided by service rate.

**p = λ/u**

In the second form of the equation we can notice that when customers arrive faster than they are processed, the utilization will be higher than 100% and the queue will grow infinitely.

The last important equation is not a law per se. However,it describes quite a significant relation,namely, the relationship between residence time, waiting time and service time. The resident time is equal to waiting time plus service time. In mathematical notation:

**R = Wq + S**

There is yet another equation for **R** which we can use for the **M/M/c** queue and which I will be using quite extensively later on. In mathematical notation:

**R = S/(1-(p)^M)**

here **M** is the number of servers taken from system description in Kendall’s notation.

Proving it would require a separate article focused only on this topic so I will not do it in this article. For those of you who are more interested in mathematics here is the link to an article with a formal proof but beware, the math there has the potential to burn one’s brain at once.

**How Queueing Theory Relates To Software?**

Basically, we can treat most of the systems like a queue. When users send requests, the request process and response return to the user. When the system is too busy to process the request right away, the request waits until some arbitrary timeout is reached or it will be processed.

The real problem is to correctly identify the class of the system we are working on. In most cases it will be the variation of **M/M/c** or** Mx/m/c**. Unfortunately, it can result in our calculations not being very in line with real life.

As long as we are taking care of long term average system performance then **M/M/c** is an appropriate description and most of the inconsistencies should be kept in line by averaged results.

With all this theoretical introduction done we can move to the most important part — the example.

**Example**

Assume that there is a service somewhere and that it is doing some arbitrary work. The team responsible for its development and maintenance estimated that average traffic will be around 100 requests per minute, not great, not terrible. Through some simple performance tests they learnt that processing a single request takes around 200 milliseconds.

At the beginning they want to check how long the service will be processing requests in the long term. Then they want to check what will happen when the number of requests increases to 200 per minute. They make the assumption that their service is an **M/M/1** queue.

1. Let’s start with normalizing both values to use the same time period.

λ = 100 RPM = 1.6 RPS => λ2 = 3.2 RPS

S = 200 ms = 0.2 s

M = 1

2. Calculate p (utilization) as it is needed later in the R (residence time) equation.

p = (A * S)/M

p = 1.6 * 0.2 / 1= 0.32

utilization is 32 %

3. Calculating R (latency or residence time)

R = S/(1-p^M) R = 0.2/(1- 0.32)= 0.2/(0.68)=0.29 R = 0.29

The average long term latency in case of 100 requests per minute is 0,29 second or 290 milliseconds.

Now we can check and see what happens when a number of requests will be increased to 3.2 RPS

1. Recalculation utilization

p = 3.2 * 0.2 = 0.64

Utilization is 2 times higher than what was expected before because we doubled the number of requests.

2. Recalculating latency for new value of utilization

R = 0.2/(1–0.64)= 0.2/(0.36)=0.56

R = 0.56

The average long term latency was increased to 0,56 seconds or 560 milliseconds which is slightly better than utilization.

Our maintainers were able to successfully calculate how their system will behave so they decided to check how many additional instances of service will be needed to keep latency for 200 RPM on the same level as for 100 RPM while keeping utilization at 64%.

First, I will have to make a few transformations to our R equation.

Given that R = S/(1-p^M) and we need to find M we need such equation;

M = (ln (1-S/R))/ln(p)M = ln(1–0.2/0.29)/ln(0.64) = ln(0.31)/ln(0.64) = 2.62

M is equal to 2.62 but we have to round it up to whole numbers because we cannot add 2,6 nodes to only 2 or 3. We need at least 2 additional servers to keep up with latency after the number of requests doubled.

One last thing I want to show you is what will happen if the number of RPS is huge 1000 RPS. Like previously, I will start with calculating utilization.

P = 1000 * 0,2 = 200

Utilization is equal to 200 while it should be equal to, at most, 1. It means that the system utilization is 200 times bigger than it should be. The system is down, dead and without any redeeming qualities.

**Summary**

It was quite a brief and simple introduction to queueing theory and its applications in the world of software engineering.

I hope that my description was clear, cohesive or that I showed you something new or/and interesting.

Thank you for your time.

Comments are closed.