Availability

Last modified Apr 20, 2023

What is ‘up’?

First of all we calculate whether a service is up or not. A service is considered available if

  • Service has at least 1 replica available

Let us consider a cluster with 2 services, both with two replicas desired. If there are no available replicas, we consider the service down, otherwise it is up.

Service Available Replicas Desired replicas Up
ServiceOne 2 2
ServiceOne 1 2
ServiceOne 0 2

Uptime calculation

If we have two services, A and B, we would calculate the current uptime for each service

A status B status Up
100%
50%

Now we would calculate the average over a time-span

Time A status B status UpTime
1
100%
2
50%
3
100%
4
0%
total uptime 62.5%

And the total uptime for the entire cluster will be 62.5% for a period over 4 timeunits

What we dont measure

Replicas not used

Lets say that an ingress is set up wrongly, but the replica is still up and running - From the user-perspective, the service is down, but since there are available replicas, we will say that it is up.