Sorting through the nines to compare reliability or availability in designs

When comparing architectures for availability and reliability, it is often difficult to comprehend the small differences in the estimates.

It is tempting to compare in terms of downtime, but doing that is a slippery slope; people like to think of minutes per year as an actual outcome, almost too real for reality.  Remember there is no such thing as an average; there is also no such outcome, no prediction that is as accurate as people like it to be. But that doesn’t stop the decision makers from thinking as though the estimate is a tangible number.

A couple tricks I like to use are Data Bars in Excel, and a log conversion that compares the “number of nines.”

Under Conditional Formatting in Excel, the Data Bars capability is a very simple way of presenting a grid of numbers, such as when plotting the effect of two different variables in a system design, or combinations of connections in a network design.  Figure 1 is an example of the availability of node to node connections in a network, in pairs. Each row is a particular node pair, and each column is as well, so the diagonal is the availability of a node to node connection, and the off diagonal is the availability of the pair of node to node connections.  Notice that clusters and patterns are easier to see, such as the higher availability of the diagonal as expected, or the relatively high availability cluster at the bottom right for nodes 3, 4, 5.

Figure 1. Node to node availability grid for an arbitrary network.

When decision makers are comfortable with thinking in terms of the number of nines for comparison, then a log scale can be helpful, as it makes the smaller differences easier to see against the larger differences. But that can lead to unfair comparisons on a graph, where a small difference is made to look as large as a much larger difference. So I like to use this type of comparison against a standard or target, say “five nines” or 0.99999 availability for example.  Table I below gives a few of the common conversions. Note that “four and a half nines” is not 0.99995, but rather about 0.9999684. Precisely, the “number of nines” is calculates as –log(unavailability) in the table.

Table I

Conversions of availability, downtime, and fair weighting of nines.

These methods have different uses, and are not for every use either. The first method of Data Bars is broadly useful for comparing large numbers of data visually. And as long as the numbers are fair, the weighting provided by that method will be, and so the comparison should be useful and fair. Edward Tufte might approve (http://www.edwardtufte.com/tufte/). But the second method of log conversion to “the number of nines” can be misleading in comparisons of a large number of results as it is nonlinear. So I reserve its use for comparisons against targets and goals for a design.