1) Believing the “Intelligence” of the Software without understanding how it makes determinations.
Software default settings are very seldom correct for YOU. For example, a device may say that a SQL server should respond in 50ms. But, if that device is across a WAN with a 200ms ping time–that is highly unlikely. This causes false SLOW SQL messages. This is only an example, but there are many such alerts and messages based on default “thresholds” within this type of software tool’s configuration.
Particulars of your environment may create false alerts or other messages. The definitions of what is an “excessive” delay–latency–broadcasts, etc, are up to you–not the tool.
It’s important for you to know the default settings driving alerts and messages. Then, ignore or alter those alerts that are not set best–for your enterprise. Altering them to make the appropriate settings for your enterprise is the best strategy. Too many false flags or alerts numb you into ignoring important ones or–cause you to make serious errors and incorrect decisions that can be Very Very expensive.
Properly used, those features can save enormous amounts of time and show things your own eye would likely miss.
2) Not understanding the Protocols used, such as TCP, HTTP, etc.
What good is a tool that tells you information about how a protocol is behaving if you do not understand the underlying technology? By this I mean the RFC’s for the protocols that are relevent to your concerns.
—What is the impact of various protocols working differently for the same application doing the same transaction–in different locations?
—What is expected according to specs–and how is your trace file showing different–or less optimal behavior?
—Why would there be 2 TCP connections from one location and 10 from another–for the same application doing the same transaction?
This short article cannot answer all these questions–but it can show you the types of information that you will need to understand in order to make sense out of the data a trace file will show you. Know the protocols well. Deep understanding of TCP is the basic price of admission. While you may consider this a matter of skill sets, my point is that attempting to troubleshooting a problem with a packet-sniffer while not understanding the protocols is a mistake–and a common one. If you add this point to the first one listed–about not believing all the standard settings on tools–you find that the tool cannot answer anything for you by itself. You need to know what you are looking at. You are the analyst–the tool is just an aid.
3) Not understanding the layer 1 and layer 2 aspects of the topology you are sniffing.
Ethernet and all other topologies have many different specifications, which are altered or outright ignored by many switch or other network device manufactures. You must know the specs and how the hardware you are working with applies those specs–or doesn’t apply them.
A classic example is Spanning Tree. There are IEEE specifications for Spanning-Tree but those specifications are just a model…not a law. Each manufacturer has tweaked it in order to create some proprietary advancement to give them a competitive advantage.
Sometimes, those advances become the new spec. However, you need to know what is standard and how your equipment varies on that theme. What good is seeing the BPDU’s in a trace file if you don’t understand what they contain or how it relates to the problem at hand? Again, this may be looked at as a skill set issue but–expecting to solve critical problems with a packet-sniffer while not knowing this about your network is a mistake.
4) Uni-directional SPANs or Port Mirroring & Single-sided trace files.
Often the switch port used by a server you need to monitor is incapable of providing a bi-directional SPAN (Port Mirror). If so, you cannot get answers from such a trace as it will miss critical information. It can be an oversight by the Engineer doing the trace but sometimes it is simply not understood to be such a critical concern–and ignored.
Either way, when you have a situation like this you need to bite the bullet and put in a Change Order to get it moved to a fully bi-directionally mirror-able port before any serious analysis can be done.
Here is a good example of why this is so. Picture a Client and a Server. The Server wants to end a specific TCP connection and keeps sending FIN’s. Yet, we never see the Client send back a FIN ACK. We do see other traffic between them and know that there is connectivity. So, here are the questions:
–Are the FINs not arriving at the Client–or–is the Client receiving them and appropriately sending back the FIN ACK–which are not getting back successfully?
—-If so, then it is most likely a network issue.
–Are the FINs arriving successfully–but being ignored by the Client?
—If so, then it is mostly likely a Server or OS or Data Center issue.
These questions can not be answered with a trace file that only sees one side of the conversation. Two traces, sychronized, are needed to determine the answer to these questions.