Remote Monitoring

The common platform from which to monitor multivendor wireless and wired networks is the Simple Network Management Protocol’s (SNMP’s) Remote Monitoring (RMON) Management Information Base (MIB). Although a variety of SNMPMIBs collect performance statistics to provide a snapshot of events, RMON enhances this monitoring capability by keeping a past record of events that can be used for fault diagnosis, performance tuning, and network planning.

RMON works on wired, wireless and hybrid networks. Hardware- and/or software-based RMON-compliant devices (i.e., probes) placed on each network segment monitor all data packets sent and received. The probes view every packet and produce summary information on various types of packets, such as undersized packets, and events, such as packet collisions. The probes also can capture packets according to predefined criteria set by the network manager or test technician.

At any time, the RMON probe can be queried for this information by a network management application or an SNMP-based management console so that detailed analysis can be performed in an effort to pinpoint where and why an error occurred. The original Remote Network Monitoring MIB defined a framework for the remote monitoring of Ethernet. Subsequent RMON MIBs have extended this framework to Token Ring and other types of networks.

RMON Applications

A management application that views the internetwork, for example, gathers data from RMON agents running on each segment in the network. The data are integrated and correlated to provide various internetwork views that provide end-to-end visibility of network traffic, both local area network (LAN) and wide area network (WAN).

The operator can switch between a variety of views. For example, the operator can switch between a Media Access Control (MAC) view (which shows traffic going through routers and gateways) and a network view (which shows endto- end traffic) or can apply filters to see only traffic of a given protocol or suite of protocols. These traffic matrices provide the information necessary to configure or partition the internetwork to optimize LAN and WAN utilization.

In selecting the MAC level view, for example, the network map shows each node of a segment separately, indicating intrasegment node-to-node data traffic. It also shows total intersegment data traffic from routers and gateways. This combination allows the operator to see consolidated internetwork traffic and how each end node contributes to it. In selecting the network level view, the network map shows end-to-end data traffic between nodes across segments.

By connecting source and ultimate destination without clouding the view with routers and gateways, the operator can immediately identify specific areas contributing to an unbalanced traffic load. Another type of application allows the network manager to consolidate and present multiple segment information, configure RMON alarms, and provide complete Token Ring RMON information, as well as perform baseline measurements and long-term reporting.

Alarms can be set on any RMON variable. Notification via traps can be sent to multiple management stations. Baseline statistics allow longterm trend analysis of network traffic patterns that can be used to plan for network growth.

Ethernet Object Groups

The RMON specification consists of nine Ethernet/Token Ring groups and ten specific Token Ring RMON extensions.

Ethernet Statistics Group The Statistics Group provides segment- level statistics. These statistics show packets, octets (or bytes), broadcasts, multicasts, and collisions on the local segment, as well as the number of occurrences of packets dropped by the agent. Each statistic is maintained in its own 32-bit cumulative counter. Real-time packet size distribution is also provided.

Ethernet History Group With the exception of packet size distribution, which is provided only on a real-time basis, the History Group provides historical views of the statistics provided in the Statistics Group. The History Group can respond to user-defined sampling intervals and bucket counters, allowing for some customization in trend analysis.

The RMON MIB comes with two defaults for trend analysis. The first provides for 50 buckets (or samples) of 30-second sampling intervals over a period of 25 minutes. The second provides for 50 buckets of 30-minute sampling intervals over a period of 25 hours. Users can modify either of these or add additional intervals to meet specific requirements for historical analysis. The sampling interval can range from 1 second to 1 hour.

Host Table Group The RMON MIB specifies a host table that includes node traffic statistics: packets sent and received, octets sent and received, as well as broadcasts, multicasts, and errored packets sent. In the host table, the classification “errors sent” is the combination of packet undersizes, fragments, cyclic redundancy check (CRC)/alignment errors, collisions, and oversizes sent by each node.

The RMON MIB also includes a host timetable that shows the relative order in which the agent discovered each host. This feature is not only useful for network management purposes but also assists in uploading those nodes to the management station of which it is not yet aware. This reduces unnecessary SNMP traffic on the network.

Host Top N Group The Host Top N Group extends the host table by providing sorted host statistics, such as the top 10 nodes sending packets or an ordered list of all nodes according to the errors sent over the last 24 hours. The data selected and the duration of the study are both defined at the network management station. The number of studies that can be run depends on the resources of the monitoring device.

When a set of statistics is selected for study, only the selected statistics are maintained in the Host Top N counters; other statistics over the same time intervals are not available for later study. This processing—performed remotely in the RMON MIB agent—reduces SNMP traffic on the network and the processing load on the management station, which would otherwise need to use SNMP to retrieve the entire host table for local processing.

Alarms Group The Alarms Group provides a general mechanism for setting thresholds and sampling intervals to generate events on any counter or integer maintained by the agent, such as segment statistics, node traffic statistics defined in the host table, or any user-defined packet match counter defined in the Filters Group. Both rising and falling thresholds can be set, each of which can indicate network faults. Thresholds can be established for both the absolute value of a statistic and its delta value, enabling the manager to be notified of rapid spikes or drops in a monitored value.

Filters Group The Filters Group provides a generic filtering engine that implements all packet capture functions and events. The packet capture buffer is filled with only those packets that match the user-specified filtering criteria. Filtering conditions can be combined using the Boolean parameters “and” or “not.” Multiple filters are combined with the Boolean “or” parameter.

Packet Capture Group The types of packets collected depend on the Filter Group. The Packet Capture Group allows the user to create multiple capture buffers and to control whether the trace buffers will wrap (overwrite) when full or stop capturing. The user may expand or contract the size of the buffer to fit immediate needs for packet capturing rather than permanently commit memory that will not always be needed.

Notifications (Events) Group In a distributed management environment, the RMON MIB agent can deliver traps to multiple management stations that share a single community name destination specified for the trap. In addition to the three traps already mentioned—rising threshold and falling threshold (see Alarms Group) and packet match (see Packet Capture Group)—seven additional traps can be specified:

  • coldStart This trap indicates that the sending protocol entity is reinitializing itself such that the agent’s configuration or the protocol entity implementation may be altered.
  • warmStart This trap indicates that the sending protocol entity is reinitializing itself such that neither the agent configuration nor the protocol entity implementation is altered.
  • linkDown This trap indicates that the sending protocol entity recognizes a failure in one of the communication links represented in the agent’s configuration.
  • linkUp This trap indicates that the sending protocol entity recognizes that one of the communication links represented in the agent’s configuration has come up.
  • authenticationFailure This trap indicates that the sending protocol entity is the addressee of a protocol message that is not properly authenticated. While implementations of the SNMP must be capable of generating this trap, they also must be capable of suppressing the emission of such traps via an implementation-specific mechanism.
  • egpNeighborLoss This trap indicates that an External Gateway Protocol (EGP) neighbor for whom the sending protocol entity was an EGP peer has been marked down and the peer relationship is no longer valid.
  • enterpriseSpecific This trap indicates that the sending protocol entity recognizes that some enterprise-specific event has occurred.

The Notifications (Events) Group allows users to specify the number of events that can be sent to the monitor log. From the log, any specified event can be sent to the management station. The log includes the time of day for each event and a description of the event written by the vendor of the monitor. The log overwrites when full, so events may be lost if not uploaded to the management station periodically.

Traffic Matrix Group The RMON MIB includes a traffic matrix at the MAC layer. A traffic matrix shows the amount of traffic and number of errors between pairs of nodes—one source and one destination address per pair. For each pair, the RMON MIB maintains counters for the number of packets, number of octets, and error packets between the nodes. Users can sort this information by source or destination address.

Applying remote monitoring and statistics-gathering capabilities to the Ethernet environment offers a number of benefits. The availability of critical networks is maximized, since remote capabilities allow for a more timely resolution of the problem. With the capability to resolve problems remotely, operations staff can avoid costly travel to troubleshoot problems on site. With the capability to analyze data collected at specific intervals over a long period of time, intermittent problems can be tracked down that would normally go undetected and unresolved.