Using SAA
Problem
You want to configure the routers to automatically poll one another to collect performance statistics.
Solution
Cisco supplies a feature called the Service Assurance Agent (SAA) in IOS Version 12.0(5)T and higher, which allows the routers to automatically poll one another to collect end-to-end performance statistics:
Router1#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
Router1(config)#rtr responder
Router1(config)#rtr 10
Router1(config-rtr)#type echo protocol ipIcmpEcho 10.1.2.3
Router1(config-rtr)#tag ECHO_TEST
Router1(config-rtr)#threshold 1000
Router1(config-rtr)#frequency 300
Router1(config-rtr)#exit
Router1(config)#rtr schedule 10 life 2147483647 start-time now
Router1(config)#rtr 20
Router1(config-rtr)#type jitter dest-ipaddr 10.1.2.3 dest-port 99 num-packets 100
Router1(config-rtr)#tag JITTER_TEST
Router1(config-rtr)#frequency 300
Router1(config-rtr)#exit
Router1(config)#rtr schedule 20 life 100000 start-time now ageout 3600
Router1(config)#exit
Router1#
The target router, which is specified as the destination in both of these tests, 10.1.2.3, must be configured to respond to SAA tests as follows:
Router2#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
Router2(config)#rtr responder
Router2(config)#exit
Router2#
Discussion
The SAA feature includes replaces the earlier Round Trip Reporter (RTR) and Route Trip Time Monitor (RTTMON) facilities, which were available in IOS Version 11.3, and uses the same basic syntax. However, where RTR only includes some simple round-trip PING and SNA tests, SAA includes several more interesting and useful features.
The first line in the example is the rtr responder command. This is required on all routers that will be taking part in SAA, including the targets of these tests. You will notice, for example, that both of the tests use a target IP address of 10.1.2.3. This destination must be another Cisco router that is also configured with the rtr responder command.
In this example, we have configured two tests. We have given the first test the arbitrary number, 10, and a name ECHO_TEST. The second test is number 20, and is called JITTER_TEST. Note that you don't actually need to give your SAA tests names, but it is a good idea if you have several of them. This name, or tag, is included in the SAA SNMP MIB table for this test. So, if you intend to download the test data via SNMP for performance management purposes, it can be extremely useful to name your tests.
Let's look at both of these example tests in more detail.
The first test does an ICMP echo (PING) to the destination device, 10.1.2.3:
Router1(config)#rtr 10
Router1(config-rtr)#type echo protocol ipIcmpEcho 10.1.2.3
Router1(config-rtr)#tag ECHO_TEST
Router1(config-rtr)#threshold 1000
Router1(config-rtr)#frequency 300
Router1(config-rtr)#exit
The threshold command defines a minimum interesting threshold, which in this case is set to 1,000 milliseconds. This allows you to count the number of ping tests where the round-trip time was greater than one second, in addition to keeping track of the PING times and number of PING failures, which we will show in a moment.
Next is the frequency command, which defines how often this test will be run in seconds. In this case, we want the test to run every five minutes (300 seconds).
Then, once you have defined the test in the rtr configuration block, you have to tell the router when to run it. This is done with the rtr schedule command:
Router1(config)#rtr schedule 10 life 2147483647 start-time now
This command defines the schedule for running test number 10. It sets a lifetime for this test of 2,147,483,647 seconds (a very long time), which is the maximum value. This effectively means that this test will continue to run indefinitely. It is scheduled to start immediately.
Note that when we scheduled the second test, we used slightly different parameters:
Router1(config)#rtr schedule 20 life 100000 start-time now ageout 3600
In this case, the test is scheduled to run only for 100,000 seconds, which is about 27 hours. We have also configured an ageout value of 3,600 seconds for this test. This says that the router will keep this test rule in memory for this length of time after it expires. This allows you to restart the test if you want to, without needing to reconfigure it.
You can view the data for the first test as follows:
Router1#show rtr operational-state 10
Current Operational State
Entry Number: 10
Modification Time: 18:51:53.000 EST Tue Dec 17 2002
Diagnostics Text:
Last Time this Entry was Reset: Never
Number of Octets in use by this Entry: 1910
Connection Loss Occurred: FALSE
Timeout Occurred: FALSE
Over Thresholds Occurred: FALSE
Number of Operations Attempted: 203
Current Seconds Left in Life: 2147483647
Operational State of Entry: active
Latest Completion Time (milliseconds): 54
Latest Operation Start Time: 11:41:53.000 EST Wed Dec 18 2002
Latest Operation Return Code: ok
Latest 10.1.2.3
In this output, you can see that it has run this test 203 times, and the last test took 54 milliseconds, and completed successfully. Note that it doesn't give a running average PING time. However, one of the nicest features of SAA is that you can configure a network management station to download this data using SNMP, provided you have the SAA MIB loaded on your server.
The second test is considerably more interesting. This test measures jitter between the routers by sending a series of UDP packets and looking at the time differences between consecutive packets at both ends:
Router1(config)#rtr 20
Router1(config-rtr)#type jitter dest-ipaddr 10.1.2.3 dest-port 99 num-packets 100
Router1(config-rtr)#tag JITTER_TEST
Router1(config-rtr)#frequency 300
Router1(config-rtr)#exit
Router1(config)#rtr schedule 20 life 100000 start-time now ageout 3600
The type command defines a jitter test to the same destination IP address as the previous test. In this case, we have decided to use UDP port 99 for our test, and each test run will consist of 100 packets. The frequency command tells the router to run this test every five minutes. Here is some sample output from this test:
Router1#show rtr operational-state 20
Current Operational State
Entry Number: 20
Modification Time: 10:25:36.000 EST Wed Dec 18 2002
Diagnostics Text:
Last Time this Entry was Reset: Never
Number of Octets in use by this Entry: 1742
Number of Operations Attempted: 22
Current Seconds Left in Life: 93400
Operational State of Entry: active
Latest Operation Start Time: 12:10:36.000 EST Wed Dec 18 2002
RTT Values:
NumOfRTT: 98 RTTSum: 6063 RTTSum2: 384317
Packet Loss Values:
PacketLossSD: 0 PacketLossDS: 2
PacketOutOfSequence: 2 PacketMIA: 0 PacketLateArrival: 0
InternalError: 0 Busies: 0
Jitter Values:
MinOfPositivesSD: 4 MaxOfPositivesSD: 14
NumOfPositivesSD: 32 SumOfPositivesSD: 175 Sum2PositivesSD: 1111
MinOfNegativesSD: 1 MaxOfNegativesSD: 5
NumOfNegativesSD: 60 SumOfNegativesSD: 175 Sum2NegativesSD: 547
MinOfPositivesDS: 1 MaxOfPositivesDS: 45
NumOfPositivesDS: 20 SumOfPositivesDS: 78 Sum2PositivesDS: 2166
MinOfNegativesDS: 1 MaxOfNegativesDS: 16
NumOfNegativesDS: 21 SumOfNegativesDS: 69 Sum2NegativesDS: 693
There is a clearly a lot more information in this test output. This is because measuring jitter is not a simple single variable test. What you want from a jitter measurement is to characterize the statistical distribution of packet-by-packet variation in latency in the forward and backward directions, as well as for the round trip. All of this information is here. Note that as with the SAA PING test we discussed earlier, the router only records the results of the most recent test. If you want to keep historical records, you need to poll and download the SAA MIB tables once per poll cycle.
The first set of numbers includes the Round Trip Time (RTT) values. You can see that this sample included 98 packets. The total of all of the round trip times of all of these packets was 6,063 milliseconds, and the sum of the squares of all of these times was 384,317 milliseconds. These values are not extremely meaningful in themselves, but if you divide the RTTSum value by the number of measurements, you get the average latency for this set of packets, roughly 61 milliseconds.
Applying some simple statistics, you can use the square value to understand how the actual values are spread around this average. The mean of the squares of the round-trip times is 3,922 milliseconds2 (just dividing the sum of the squares by the total number of samples). If you subtract the square of the average from this value, and take the square root, you get a statistical estimate of the variation in milliseconds. The higher this value, the greater the spread. In this case, you can calculate that this spread is roughly 10 milliseconds. This means that half of the time, the round trip latency is within the range 61 ± 10ms. Note that the ± symbol is a standard mathematical notation that, in this case, indicates a range from 51 ms (61 10) to 71 ms (61 + 10).
The next set of data records dropped packets. Recall that the sample size is 100 packets, but the NumOfRTT value is only 98. So the network must have dropped two of our test packets. SAA separately keeps track of packets lost in both directions, source to destination (PacketLossSD) and destination to source (PacketLossDS). This router is the source; the other router is the destination. So in this example, both of the lost packets happened on the way back. Notice also that the output claims that there were two out-of-sequence packets during this test, which is consistent with the number of dropped packets.
The next group of numbers includes the actual jitter measurements. There are two groups of numbers here. The variables that end with "SD" are measured from the source to the destination, and "DS" are for the return path. Within each of these groups there are two subgroups, one for "positives" and the other for "negatives." Positives are events where the spacing between two packets has increased since the last pair of successive packets. The "Negatives" counters record all of the times that the interpacket spacing decreased. Now let's look a little bit more closely at one set of values:
MinOfPositivesSD: 4 MaxOfPositivesSD: 14
NumOfPositivesSD: 32 SumOfPositivesSD: 175 Sum2PositivesSD: 1111
This says that of the 100 packets the router sent in this polling interval, there were 32 cases when the jitter in the forward direction had a positive value. Of these, the largest value was 14 milliseconds, and the smallest was 4 milliseconds. Then we can use the sum and the sum of the squares to calculate the average and spread of values in precisely the same way as we did to calculate the average latency a moment ago. The result here is that half the time the positive jitter in this direction was within the range 5.5 ± 2.2 ms.
Applying this same technique to the other jitter measurements gives the following statistics. The negative jitter from source to destination was 2.9 ± 0.8 ms, with a maximum of 5 ms and a minimum value of 1 ms. In the other direction, the positive jitter was 3.9 ± 9.6 ms, and the negative jitter was 3.3 ± 4.7 ms. These last two values might look a little bit funny because the spread is larger than the mean. This is actually not bad, though, because the output also shows that the maximum positive jitter in this direction was 45 ms, and 16 ms for negative jitter. So clearly, the spread is very large, but the mean jitter values are relatively small. This is a fairly typical result