Everyone seems to have diversity fever. We want to improve the reliability of our engineered solution, or the availability of a network, so we add redundancy, make sure we have diverse access to the service, and so on.
Diversity is expensive. Not only in terms of spending real money to duplicate systems, but all the management of resources that goes with it, including repair, maintenance, power, space, perhaps monitoring, etc.
How often do we really need that? Sure adding diversity is easy. But it isn’t always beneficial, and not always the best option. What is the goal for this diversity, and are you meeting it?
- If it is to improve reliability, consider that you just introduced two things that can fail where you used to have one. Double the rate of occurrence of failures, and you may have just decreased the reliability of what you wanted to improve. Or you may not have impacted it at all. And depending on the protection system, you may have created a reliance on the redundant system yet be blind to its failure, resulting in lower reliability. At best, you may have a good solid protection system in place, but have you analyzed the system to know if it is now better than the single system? Did you get enough improvement to make the case for the redundant system?
- If you are trying to improve availability, then is the problem dominated by failure frequency, or maybe it is repair time? Perhaps your resources are better spent improving repair time, or better yet installing PHM to improve proactive maintenance.
- Are you trying to improve the reliability of one thing, or many things? If you are going to use the new redundancy in a load sharing situation, that means you might be able to do more than you did before with the new resources. For example, having two batteries for your cell phone means if one goes bad you can use the other one and make another call. But what if you really need two phones? Are you improving the availability of a phone, or trying to accomplish a mission that requires two phones to work? It could be that you have not really build redundancy, but rather build dependency. The added resources are not redundant, but rather critical. Where you once needed one, you now need two, or three. And having anything less than all the units working at one time means you can’t accomplish your goal. In this case, your introduction of what you thought was redundancy became a system that has lower reliability and availability. The two units are not in parallel, but in series instead.
- Are the units really independent? If the failure mode of one unit leads to a higher probability of failure in the other unit, then the overall system may be worse than before. What good are those two batteries for your phone when they are both in your pocket when you fall in the pool?
To be fair, redundancy in systems is often a good way to enhance design to improve reliability and availability. It isn’t always complicated, but it can be. It is important to think about what you are trying to achieve by enhancing the reliability and availability of a system or network or process. There are many options beyond redundancy to consider, and sometimes redundancy is a very bad choice.