What is the cure for Diversity Fever?

Everyone seems to have diversity fever. We want to improve the reliability of our engineered solution, or the availability of a network, so we add redundancy, make sure we have diverse access to the service, and so on.

Diversity is expensive. Not only in terms of spending real money to duplicate systems, but all the management of resources that goes with it, including repair, maintenance, power, space, perhaps monitoring, etc.

How often do we really need that? Sure adding diversity is easy. But it isn’t always beneficial, and not always the best option. What is the goal for this diversity, and are you meeting it?

  • If it is to improve reliability, consider that you just introduced two things that can fail where you used to have one. Double the rate of occurrence of failures, and you may have just decreased the reliability of what you wanted to improve. Or you may not have impacted it at all. And depending on the protection system, you may have created a reliance on the redundant system yet be blind to its failure, resulting in lower reliability. At best, you may have a good solid protection system in place, but have you analyzed the system to know if it is now better than the single system? Did you get enough improvement to make the case for the redundant system?
  • If you are trying to improve availability, then is the problem dominated by failure frequency, or maybe it is repair time? Perhaps your resources are better spent improving repair time, or better yet installing PHM to improve proactive maintenance.
  • Are you trying to improve the reliability of one thing, or many things?  If you are going to use the new redundancy in a load sharing situation, that means you might be able to do more than you did before with the new resources. For example, having two batteries for your cell phone means if one goes bad you can use the other one and make another call.  But what if you really need two phones? Are you improving the availability of a phone, or trying to accomplish a mission that requires two phones to work? It could be that you have not really build redundancy, but rather build dependency. The added resources are not redundant, but rather critical. Where you once needed one, you now need two, or three. And having anything less than all the units working at one time means you can’t accomplish your goal.  In this case, your introduction of what you thought was redundancy became a system that has lower reliability and availability.  The two units are not in parallel, but in series instead.
  • Are the units really independent?  If the failure mode of one unit leads to a higher probability of failure in the other unit, then the overall system may be worse than before.  What good are those two batteries for your phone when they are both in your pocket when you fall in the pool?

To be fair, redundancy in systems is often a good way to enhance design to improve reliability and availability. It isn’t always complicated, but it can be. It is important to think about what you are trying to achieve by enhancing the reliability and availability of a system or network or process. There are many options beyond redundancy to consider, and sometimes redundancy is a very bad choice.

About Rupe

Dr. Jason Rupe wants to make the world more reliable, even though he likes to break things. He received his BS (1989), and MS (1991) degrees in Industrial Engineering from Iowa State University; and his Ph.D. (1995) from Texas A&M University. He worked on research contracts at Iowa State University for CECOM on the Command & Control Communication and Information Network Analysis Tool, and conducted research on large scale systems and network modeling for Reliability, Availability, Maintainability, and Survivability (RAMS) at Texas A&M University. He has taught quality and reliability at these universities, published several papers in respected technical journals, reviewed books, and refereed publications and conference proceedings.

He is a Senior Member of IEEE and of IIE. He has served as Associate Editor for IEEE Transactions on Reliability, and currently works as its Managing Editor. He has served as Vice-Chair’n for RAMS, on the program committee for DRCN, and on the committees of several other reliability conferences because free labor is always welcome. He has also served on the advisory board for IIE Solutions magazine, as an officer for IIE Quality and Reliability division, and various local chapter positions for IEEE and IIE.

Jason has worked at USWEST Advanced Technologies, and has held various titles at Qwest Communications Intl., Inc, most recently as Director of the Technology Modeling Team, Qwest’s Network Modeling and Operations Research group for the CTO. He has always been those companies’ reliability lead. Occasionally, he can be found teaching as an Adjunct Professor at Metro State College of Denver. Jason is the Director of Operational Modeling (DOM) at Polar Star Consulting where he helps government and private industry to plan and build highly performing and reliable networks and services. He holds two patents. If you read this far, congratulations for making it to the end!

This entry was posted in Engineering Consulting, IT and Telecommunications, ORMS, Quality, RAMS - all the -ilities and tagged , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

Comments are closed.