FMECA with Wireshark

Returning from Sharkfest, I found that my laptop’s configuration must have changed. The laptop, running Windows 7 home version, was now not able to see a few of the non-windows machines on my network, including the NAS. Assuming the cause was one of the many changes I did while at Sharkfest, to access the internet on campus, or to tie into the ad hoc networks for the hands on sessions, I decided to do some digging. And what better place to start but with Wireshark.

Looking at the packet traffic, I could immediately see a problem. The computer was talking to the gateway, but its request for a given IPv4 address was being redirected to a location it could no longer find. Flush the ARP tables, but no change. After more research and a couple regedit operations, some things got better, but not everything. The computer still thinks it needs to append a Home prefix that doesn’t exist. Then I corrected an option on the network adapter, but the problem was still not fixed. Then I gave up and went to static IP. That change solved the problem, and all behavior was better than ever before, including the packet capture file.  A comparison of packets before and after in Wireshark shows much was improved; the calls to the false home location stopped, as did a lot of the other chatter.

But it bothered me that DHCP was not working properly. I probably just wasn’t patient enough, as a switch back to DHCP after a couple days, and everything remains better than before I left for Sharkfest. The packet trace file looks much better too, and the computer’s network response time is faster. Looks like time would have cleared the issue, either after or without the cleanup I did.

There are a few things this experience teaches:

  • Having a baseline of your computer’s performance is always useful. Critical in fact. How else do you know what normal operation looks like in a packet capture?
  • Knowing your network is always important for reliability and security, and Wireshark is a great, free, community supported tool for doing that. Do a packet capture every once in awhile and look at the network’s behavior, both to learn and diagnose.
  • Application Performance Monitoring (APM) is tricky, and can be very dependent on the environment.  How well would such a system pick up on issues with computers not being able to communicate fully, when users work around the problem anyway? Would it even be able to tell if it never caused a problem with a specific application?  And if not, how useful is the solution if that is a critical operation for users? Tools that dive deep, like Wireshark, will always have a role.
  • Networks are resilient these days, and sometimes all they need is a bit of time to sort themselves out; patience is a virtue here as well, no matter how fast those packets fly.
  • Could someone categorize some common performance issues found with Wireshark, automate some of it, and build indicators to help teach the Wireshark novices? Automation is useful, but only when it accompanies user education for validation and verification.
  • There is no substitute for having a good Failure Modes, Effects, and Criticality Analysis (FMECA), or good Root Cause Analysis (RCA) when the former is lacking.  In fact, that is a recurring theme in all systems and networks, in all disciplines, in all things important. The packet baseline is strongly related to the FMECA or RCA, as it provides clues for which failure occurred, and how the service is affected.

About Rupe

Dr. Jason Rupe wants to make the world more reliable, even though he likes to break things. He received his BS (1989), and MS (1991) degrees in Industrial Engineering from Iowa State University; and his Ph.D. (1995) from Texas A&M University. He worked on research contracts at Iowa State University for CECOM on the Command & Control Communication and Information Network Analysis Tool, and conducted research on large scale systems and network modeling for Reliability, Availability, Maintainability, and Survivability (RAMS) at Texas A&M University. He has taught quality and reliability at these universities, published several papers in respected technical journals, reviewed books, and refereed publications and conference proceedings. He is a Senior Member of IEEE and of IIE. He has served as Associate Editor for IEEE Transactions on Reliability, and currently works as its Managing Editor. He has served as Vice-Chair'n for RAMS, on the program committee for DRCN, and on the committees of several other reliability conferences because free labor is always welcome. He has also served on the advisory board for IIE Solutions magazine, as an officer for IIE Quality and Reliability division, and various local chapter positions for IEEE and IIE. Jason has worked at USWEST Advanced Technologies, and has held various titles at Qwest Communications Intl., Inc, most recently as Director of the Technology Modeling Team, Qwest's Network Modeling and Operations Research group for the CTO. He has always been those companies' reliability lead. Occasionally, he can be found teaching as an Adjunct Professor at Metro State College of Denver. Jason is the Director of Operational Modeling (DOM) at Polar Star Consulting where he helps government and private industry to plan and build highly performing and reliable networks and services. He holds two patents. If you read this far, congratulations for making it to the end!
This entry was posted in Engineering Consulting, IT and Telecommunications, RAMS - all the -ilities and tagged , , , , , , , , , , , , . Bookmark the permalink.

Comments are closed.