Capacity management’s slippery slope, and into analytics

Capacity Management (CM) from a telecommunications and IT perspective has a fairly broad meaning, sometimes overlapping some related terms. While understandable, it is unfortunate because of the confusion it causes. But no worries, as we can sort it out pretty well right here.

Capacity Management often refers to the larger context of more than the capacity of physical entities, such as CPU, memory, ports, or even routers and switches. This broad reference is both understandable and useful.  It’s understandable because, once you identify as a critical resource an entity such as software, a service, a network resource, even when these entities are not physical, you create the need to manage its capacity. So related terms such as Network Performance Monitoring (NPM), Network Capacity Management (NCM), and even Application Performance Monitoring/Management (APM), all can be considered as subsets of CM. And reasonably so. In fact, the ITIL takes this approach: http://www.itlibrary.org/index.php?page=Capacity_Management.

As a result, it can be convenient to reference the entire set as all CM. But this becomes inconvenient when we need to relate to the management of just physical entities. The simplest way out of this is to just be clear, and rely on the context.

If that is not sufficient, then let’s just agree to use CM for the specific idea of the management of physical entities, which may include software if it is isolated to a physical location (not spread around as in applications and services, to be described below).

Now we can let NPM refer to the management of network performance, which can span across elements falling under CM.  Likewise, NCM can refer to the management of the capacity of these network resources, which span across elements falling under CM.

Beyond that, APM can refer to the performance monitoring of applications and services spanning across the network(s) and utilizing many different CM elements.

A term I’ve not seen, which could be convenient to add, would be Application Capacity Management, which can refer to the act of managing the capacity of applications, as you should expect. I wouldn’t mind coining a couple more obvious terms: Network Reliability Management, and Application Reliability Management, for obvious reasons.

Now to achieve success with APM and NPM, often Analytics are leveraged, and this is an emerging area as well. While there are tools in existence today that do a great job at finding causes of network and application problems before a person even has a chance to investigate, many more are being created that take it to the next level.  And outside the IT and Telecommunications arenas, we have the developing engineering space of Prognostics and Health Management (PHM). PHM is all about utilizing telemetry about a component or system to estimate the risk of failure. Because lack of capacity is a form of system failure, there really is no difference in the concepts of PHM and CM or NPM or APM. So while the various camps of engineering develop within their focused areas, we shall see eventual cross pollination which can lead to exceptional abilities in the IT and Telecommunications network and system analytics arena.

 

About Rupe

Dr. Jason Rupe wants to make the world more reliable, even though he likes to break things. He received his BS (1989), and MS (1991) degrees in Industrial Engineering from Iowa State University; and his Ph.D. (1995) from Texas A&M University. He worked on research contracts at Iowa State University for CECOM on the Command & Control Communication and Information Network Analysis Tool, and conducted research on large scale systems and network modeling for Reliability, Availability, Maintainability, and Survivability (RAMS) at Texas A&M University. He has taught quality and reliability at these universities, published several papers in respected technical journals, reviewed books, and refereed publications and conference proceedings. He is a Senior Member of IEEE and of IIE. He has served as Associate Editor for IEEE Transactions on Reliability, and currently works as its Managing Editor. He has served as Vice-Chair'n for RAMS, on the program committee for DRCN, and on the committees of several other reliability conferences because free labor is always welcome. He has also served on the advisory board for IIE Solutions magazine, as an officer for IIE Quality and Reliability division, and various local chapter positions for IEEE and IIE. Jason has worked at USWEST Advanced Technologies, and has held various titles at Qwest Communications Intl., Inc, most recently as Director of the Technology Modeling Team, Qwest's Network Modeling and Operations Research group for the CTO. He has always been those companies' reliability lead. Occasionally, he can be found teaching as an Adjunct Professor at Metro State College of Denver. Jason is the Director of Operational Modeling (DOM) at Polar Star Consulting where he helps government and private industry to plan and build highly performing and reliable networks and services. He holds two patents. If you read this far, congratulations for making it to the end!
This entry was posted in Engineering Consulting, IT and Telecommunications, RAMS - all the -ilities and tagged , , , , , , , . Bookmark the permalink.

Comments are closed.