Advertisment

T&M: Help-The Network's Down

author-image
VoicenData Bureau
New Update

We've all heard this or maybe even said this. There are

many tools and testers to assist administrators with identifying when a network

is down and several approaches to react to the alarms. What method is best? The

short answer is, none of them. No single method works in every situation. There

are basically two approaches to troubleshooting, top down and bottom up. There

is also one rule for both.... At some point in time, you will use the one only

to realize you should have used the other!

Advertisment

In a recent survey conducted by Infonetics Research, the top

three threats to your network are network products, security, and cabling and

connectors in that order. Gartner also released a new study that said that

roughly 20 percent of all IT investments are for things that don't work.

Top-down Approach



In a top-down approach, the network manager begins at the upper layers of

the OSI protocol stack. The administrator tests the application to make sure it

is working, then pings the servers, until they are at the bottom of the stack or

the physical layer. This approach is best if multiple users initiate the help

desk calls. It is very rare that physical layer problems will be an issue for

all users, unless of course, it happens to be the only server connection. This

methodology allows the administrator to determine if the application or server

is down, slow, or for some reason non-responsive to network commands. To be

effective, it is generally aided by some tool or network monitoring application

that can provide the network manager with some type of trending and actionable

data.

Actionable

data could be as simple as a ping that results in a host unreachable all the way

to monitoring bit errors and other errors delivered via SNMP (Simple Network

Management Protocol) traps. The real trick, however, is to determine the cause

of the errors. To be effective in doing so, a methodical troubleshooting plan

should be used. This should certainly include more than rebooting a server. If a

server is going down, there is something causing it to do so. It may be a memory

leak, over-utilization in the processors, or other issue, but rebooting should

be considered a bandage, not a solution. So, what exactly is actionable data? It

is data that provides enough information to be useful and clear enough to

determine a plan of action.

Advertisment

Most management packages and monitoring tools allow a network

administrator to set thresholds for performance outside of an acceptable range.

Knowing where to set these for specific issues will require a bit of trial and

error. Set too low, they will make a pager or cell phone flooded with messages,

set too high and they will result in unemployment. Blindly accepting the

defaults can result in underutilization of the tools. Any time you deploy

management software, be sure to spend the money and



get trained. The best training would ideally be on site, in your environment, by
someone certified in the software. That way you can eliminate the modules you

don't want or need to use and tune the ones that will provide you with the

best information. Bandwidth heavy applications and heavily utilized servers will

require the most tuning to be of benefit.

Another

benefit of management software is the ability to query disparate equipment and

retain statistics and trends in one reporting tool. In the old days, and still

in many environments today, the network manager is stuck double clicking on each

switch in a wide variety of interfaces depending on the server software and

active electronics. With a single tool, trending and overall traffic reports can

be exported, sorted, etc. These can be used to justify new equipment and

upgrades (just a little side perk). The advantages of trending and utilization

models is that it allows you to determine which servers could benefit from

multiple network cards for instance. It also allows you to segment your switches

so that you balance the amount of packets within each switch so that one is not

over utilized while the others are under utilized. It also helps you to know

what types of packets are moving where so that can be optimized as well.

Bottom Up Approach



In a bottom up approach, the cabling is checked first and then

troubleshooting moves up the protocol stack. When one user goes down, it is far

easier to start at the physical layer and move up. Some idiosyncrasies can

develop when EMI and/or environmental concerns are causing the problem. Physical

layer testers are a bit different. These can be field testers, smart bit

testers, and/or spectrometers for radio frequency information. What you are

testing will determine what type of tester you need. They key here is that the

tester be calibrated just prior to the test and that the tester be certified by

an independent agency. Test and Measurement World (http://www.reed-electronics.com/tmworld/)

has a listing of testers, ratings for how well they perform, and certifications

for each variety. You will want to be sure that your tester is certified by an

independent testing source.

Advertisment

When there are errors, either continuous or intermittent, it

is a good idea to look at your physical layer. Field terminated patch cords are

a particular culprit, but other environmental conditions can also be to blame.

When walls are moved, cables that were once placed away from fluorescent light

fixtures may no longer be outside of acceptable range, new power panels may be

located too close, etc. It is important to note that you should not rely on the

fact that you have a link light on your switch port to determine if the cable is

good or bad. Just like your electronics, there are conditions where you may have

a link, but the signal is so degraded from sender to receiver that the packet is

useless. Remember the expression "lights are on but no one's home."

This is true for copper or fiber that is not performing, but still cause the

link light on the switch to illuminate.

Another thing that can happen is performance degradation

through autonegotiation down to lower speeds or to half duplex in order to try

and maintain the connection. If you have employed Gigabit Ethernet and your

cabling was installed before the new parameters for channel performance were

adopted, you will also want to have your cabling recertified for the new

parameters. This is recommended by the cabling standards bodies. You should note

that when equipment is tested for operation with any physical layer media, that

this is done in a lab in a pristine environment. Actual installations may vary

for a number of reasons. If you are going with the bottom up approach, check all

of the physical medium and don't skip this step because you can ping a device

or see it's link light. On the other side, if you don't have a link light

— it's obvious.

Then you work your way up — checking the network card

diagnostics, switch port statistics, and work up to the application. If only one

application is not working, start at the top. If several are not working or all

are not working for one workstation, start your way up from the bottom. And

remember, once in a while it will be in the middle or this rule will be

backwards.

Advertisment

Pre-Installation Testing



This step can be one of the best tools you can use to eliminate problems

before applications and networks go live. This should include a thorough testing

of all components under load. All components means physical layer, network layer

and where possible applications. It is unwise to assume that if you are

installing standards based components that there will not be problems. This can

be particularly true in the physical layer. Anomalies in installation, poor

installation practices, EMI or RF interference, and marginally compliant

components can all cause errors especially when combined in any combination. The

higher the frequency, the worse the problems can become.

Many manufacturers make cables and connectivity with margin

above the minimal standards. This provides a bit of a forgiveness factor for

installation issues, but proper installation is still the main key to error free

performance of any system, active or passive.

Many larger companies maintain test networks for just this

reason. When you are troubleshooting a problem, you can move the components to a

test lab where they physical layer is certified and determined to be trouble

free. Network electronics can then be tested in the test bed before or after

implementation if problems are found, in a controlled environment.

Carrie Higbie

of The Siemon Company

Advertisment