Network availability and performability modeling tricks

I read a lot, and I mean a lot, of network reliability, availability, and performability papers. The ones we publish in the IEEE Transactions on Reliability, I read multiple times. So there are a lot of relationships that go unnoticed, or at least unacknowledged, yet are important to understand nonetheless. Some of these tricks I outline below mean that some common assumptions in these papers are not limiting at all, while others are limiting only in that they impact scale. Some of these tricks are ones I use myself when I encode network models for analysis so that I don’t have to  do a lot of head-spinning algorithm re-development. I’m lazy. I’d rather use something I know works rather than build something new that might work a bit better, but take much more time and effort to get there. If you too are lazy, read on, and save some time and effort overall!

1)      A link is a node is a network. It is common for solutions to network reliability problems (including availability and performability, for example) to assume only nodes or links can fail. That presents no problem, as the two are convertible.  A link that can fail can be easily converted to a node that can fail by simply placing a node on the link, rendering the link no longer the failure item, as in Fig. 1 below.  Likewise, a node that can fail can be converted to a link by reversing that process, as long as the node has only two bidirectional links connecting to it, or two unidirectional links (one in, one out). Handling the more general node case, however, gets more complicated.

Fig. 1. Converting a fail able link to a fail able node.

a)      Because a node that can fail can be modeled as a set of links that fail concurrently (with correlation of unity), simply by failing the connecting links whenever the node fails, allowing correlation is the easiest way to handle it. See Fig. 2 for a good picture of this replacement. Because most analysis approaches (heuristics, Monte Carlo simulations, structure functions) can be easily augmented to handle correlated parts, this method is usually easiest and best. See trick 2) below, as well.

Fig. 2. Replacing a failing node with a set of links that fail concurrently.

b)      If links are unidirectional, then place a new node at all incoming links of the old node, and a new node at all outgoing links of the old node, then add a link between the two new nodes, resulting in a single link that fails just like the old node you are replacing. See Fig. 3.

Fig. 3. Replacing a fail able node with a fail able link, when links are directional.

c)      Now it can get complicated (too complicated for a good figure), but extend case b) for each possible combination of a single inbound link to all other links being outbound, replicating each case, and you have the equivalent sub-network that replaces a node with multiple bi-directional links.  This approach assures a bidirectional link is not used in both directions at the node, but could allow both directions to be utilized in the overall network, so this approach is limited, and may not address all possible analysis approaches.

2)      Bidirectional is equivalent to parallel unidirectional, with correlation. Take a single  bidirectional link, split it into two parallel, unidirectional links flowing in opposite directions, and force the two new links to only fail concurrently (unity correlation). The new configuration is equivalent to the original. But cycles are still possible, so must be handled by the algorithm you use.

3)      Complex correlations can be handled with correlated and uncorrelated parts. Links or nodes that only fail as a group are fully correlated, while most networks are composed of parts that are independent (or assumed to be, at least). Between these two situations of completely correlated to uncorrelated, we have partially correlated links or nodes.  In static networks (where the reliabilities or availabilities or performabilities do not change), it is sufficient to split the part into fully correlated and uncorrelated parts, and assign the probabilities appropriately.  Positive correlation is handled by a fully correlated part in series with each uncorrelated part, whereas negative correlation is handled by a fully correlated part in parallel with each uncorrelated part. The former case is shown in Fig. 4, and the latter case is similar.

Fig. 4. Top: replacing a link with one that fails correlated (blue) with another, and a link that is independent (orange). Bottom: replacing a node with correlated and uncorrelated nodes.

About Rupe

Dr. Jason Rupe wants to make the world more reliable, even though he likes to break things. He received his BS (1989), and MS (1991) degrees in Industrial Engineering from Iowa State University; and his Ph.D. (1995) from Texas A&M University. He worked on research contracts at Iowa State University for CECOM on the Command & Control Communication and Information Network Analysis Tool, and conducted research on large scale systems and network modeling for Reliability, Availability, Maintainability, and Survivability (RAMS) at Texas A&M University. He has taught quality and reliability at these universities, published several papers in respected technical journals, reviewed books, and refereed publications and conference proceedings. He is a Senior Member of IEEE and of IIE. He has served as Associate Editor for IEEE Transactions on Reliability, and currently works as its Managing Editor. He has served as Vice-Chair'n for RAMS, on the program committee for DRCN, and on the committees of several other reliability conferences because free labor is always welcome. He has also served on the advisory board for IIE Solutions magazine, as an officer for IIE Quality and Reliability division, and various local chapter positions for IEEE and IIE. Jason has worked at USWEST Advanced Technologies, and has held various titles at Qwest Communications Intl., Inc, most recently as Director of the Technology Modeling Team, Qwest's Network Modeling and Operations Research group for the CTO. He has always been those companies' reliability lead. Occasionally, he can be found teaching as an Adjunct Professor at Metro State College of Denver. Jason is the Director of Operational Modeling (DOM) at Polar Star Consulting where he helps government and private industry to plan and build highly performing and reliable networks and services. He holds two patents. If you read this far, congratulations for making it to the end!
This entry was posted in Engineering Consulting, IT and Telecommunications, ORMS, RAMS - all the -ilities and tagged , , , , , , , . Bookmark the permalink.

Comments are closed.