Telecom Services

NETWORK MANAGEMENT: Be Proactive, Keep Business Ticking

Voice&Data Bureau

05 Feb 2004 08:56 IST

New Update

In today’s service-driven business environment, the key to success hinges
on effective management of business-critical services. For example, today a
management application for a Web portal can’t take pride in efficiently
detecting and repairing a downtime. Instead, it must aim to detect and repair
cases where the service is up, but the backbone network is operating at its full
capacity and might bring down the service in the near future due to dropped
packets.

Advertisment

In such scenarios, monitoring performance (and trends in the performance) of
the services and the underlying components that make up the services becomes a
very critical aspect. Analyzing, correlating and reporting on the collected
performance data is no longer a luxury but a necessity. This article explains
the role of performance monitoring and exemplifies how network performance
monitoring plays a key role in today’s service management.

Service Management

A managed service typically comprises three main components–networking
infrastructure, hardware systems and software applications. The service is
offered along with a service level agreement (SLA) that forms a contract between
the provider of the service and its users. For example, an IT Department in an
enterprise would offer network infrastructure as a service to its internal users
with a guarantee of 99.99 percent uptime; an application service provider would
offer to provide a banking portal to banks with an agreement that the response
time will always be within five seconds; a storage provider would offer bulk
storage over the Internet with a ‘no data loss’ guarantee.

Market demands…
n	Minimize business losses because of poor availability and high response time of services
n	Detect service degradation before it affects end customers
n	Deploy new services without affecting existing services
n	Control cost by effectively utilizing existing resources
n	SLA negotiation and verification
Translates to…
n	Service Driven Monitoring
n	Business Planning
n	Maximize Return on Investment Documenting SLA reports

Advertisment

Traditionally, management of these services focused mainly on managing the
faults in the service, whether it was a fault in the network infrastructure or a
problem in the hardware systems or a defect in the software applications.
Mechanisms were put in place to identify the fault as quickly as possible and
resolve it efficiently. However, the problem with such an approach is that it is
reactive and doesn’t prevent the occurrence of the fault; the managed service
gets affected irrespective of how quickly the fault is identified and resolved.
Today, the challenges are even more. Customers are forcing service-level
guarantees and are expecting documented evidence of the same. Competitive
pressure is compelling businesses to introduce differentiators by adding new
features and services without affecting already deployed services. In the
meanwhile, businesses are demanding maximum return on investment.

To meet these growing challenges, it is very critical for
enterprises and service providers to move from a reactive management mode to
proactive management. They must identify conditions in the infrastructure that
will affect the services provided to the end customers before they really do so;
they must minimize the service downtime and increase the performance to the
fullest possible extent; they must predict the effects of introducing new
services in the environment.

Performance Monitoring and Reporting

To meet the challenges mentioned above, the management application in
addition to monitoring faults must also periodically collect performance metrics
from all the components that make up the managed service; must correlate and
analyze the data to detect conditions that will lead to a fault and must collate
the analysis and correlation in the form of performance reports.

Advertisment

One can’t prevent a hardware failure, but if CPU
utilization has been continuously growing, it is a cause for concern and must be
analyzed immediately to prevent failures. This analysis forms the basis for
taking remedial actions and the reports form the basis for SLA documentation.

Service-driven Monitoring

This is the core of proactive service management. Apart from monitoring
faults, the management application needs to look at any abnormal conditions in
the network. A software application for example, might be processing data as
expected but its memory consumption is continuously growing and may soon result
in a memory fault, a router might be routing data as expected but one of its
interface is operating at full capacity and may soon start dropping packets.

To enable this kind of analysis, the management application
must look at the service environment in multiple ways.

Advertisment

Top-down view where a user’s perception of the service
is verified against the pre-defined SLAs.
Bottom-up view where the infrastructure supporting the
service is examined for possible degradations/abnormality.
Trend analysis for the infrastructure behavior.

Why both top-down and bottom-up views? This is because no
single view will give a complete picture of the problems in the environment. For
instance, a top-down view might show that the service is up and performing as
expected. However, in reality the network underneath might be operating on a
fail-over path.

Why trends? By collecting statistics and generating trends,
one can predict the behaviour in future. For example, if the service access has
been growing continuously during the busy hours for the past few months, it
might cross the acceptable threshold during next month.

Advertisment

Business Planning

What additional disks you will need three months from now? Do you need to
increase your network bandwidth in near future? What will happen if this new
service is put on the existing infrastructure? These questions are becoming
increasingly critical. The management application, using statistical methods and
historical usage data, must calculate future needs and must predict the effect
of changing the existing configurations.

Maximize Return on Investment

The market economy is increasingly forcing businesses to control their
operational costs. Strict justification for any new investment is becoming a
norm. To enable this, the management application must collect the required
performance metrics and identify under-utilized and over-utilized areas in the
service environment. This is useful to:

Get hints on the best ways to configure the environment
for maximum utilization
Justify budgetary spending on additional resources

Advertisment

Today, using monitoring techniques for managed service,
hardware systems and applications, and co-relating the information thus
obtained, one can exactly pin point the source of trouble in a service
environment before it affects end users.

Ajay Chitale, HP
OpenView Business Unit

Proactive Network Congestion Control

Advertisment

How can performance data collection and correlation of the
same be effectively used to identify and isolate conditions in the network that
will eventually affect the end service?

Given alongside is a high level architecture of a Web portal
service. In this example, customers access the web service hosted on the Web/app
server via the access/core network across the firewall. This service makes use
of the database server for backend processing. Also in the picture is the
enterprise network of the provider and some legacy applications, which are run
on the Database server.

The top-down measurement of the Web-service indicates
everything is in order. The service is up and performing as expected. However,
the underlying network is overly utilized and may soon start dropping packets,
eventually affecting the web service. Proactive bottom-up performance monitoring
would help identify and isolate this case. Let’s see how:

Monitor

The performance monitoring application polls the networking infrastructure
for usage data; looks for any utilization threshold violations and generates a
utilization threshold alarm for the Router A. This alarm triggers the network
administrator to take a detailed look at performance reports, which provides the
utilization trends and comparison with a baseline.

If the trend shows that the utilization is continuously
increasing, it’s a concern since the same router provides access to the DB
server used by the web service. It could very well affect the web service in
future.

Co-relate

At this point, the first task for the network administrator is to find out
the source for this heavy utilization. The performance monitoring application
can collect flow data from the router, analyze the flows and can generate
reports to give detailed breakup (source, destination, protocol, etc.) of the
traffic on the router.

This analysis will help the administrator to find out the
cause of heavy utilization on that router interface.

Repair

The analysis obtained above helps determine that some nodes in the
enterprise network are interacting with the legacy application on the DB server.
Since this application is not required for the web service, it can be moved to
some other server, thereby removing the potential threat to the web service.