Monitoring

Monitoring

As you have seen, failover can trigger either because one FWSM unit fails or one standby unit becomes healthier than the active unit. An FWSM unit's health depends on the interface's health. So to detect unit failure, or to decide which unit is healthier, both unit monitoring and interface monitoring needs to be performed as follows:

  • Unit Monitoring Each FWSM will monitor the unit health of its failover peer by sending a Hello message to the other unit. If a unit has not received a Hello message from the other unit for 30 seconds, it will perform the following test:

    - Send ARP request for failover interface and each Firewall interface

    If a reply is heard on a Firewall interface, but not the failover interface, no failover will take place, and the Active/Standby state will remain as it is. If no replies are heard on either of the Firewall interfaces and on the failover interface, the other unit is marked as "Failed" and this unit will become Active, if it is not already. Conditions which can cause this type of failure include:

    - Removal of peer FWSM module

    - Reboot of peer FWSM module

    - Removal or failure of the physical medium carrying failover interface and Firewall interface traffic for FWSMs in separate chassis

    - Shutdown of failover interface and Firewall interface VLANs

    - Traffic overload on the failover interface and Firewall interfaces causing packet loss

  • Interface Monitoring Each unit monitors the health of its own Firewall interfaces and those of its failover peer by sending Hello messages on the Firewall interfaces. If a unit does not receive any messages on a particular interface for three consecutive poll intervals, the unit will run the following tests on that interface for 30 seconds:

    1. Check for link status of that interface

    2. Check for any incoming traffic on that interface

    3. Send the ARP for the most recent hosts (up to 2) learned from that interface

    4. Broadcast ping to the interface's subnet

    If all of these tests fail, and the interface on the other FWSM is receiving traffic, or is able to ARP/PING a host on that interface, this unit's interface is marked as Failed.

    Remember that in FWSM version 1.x, if a unit finds that it has fewer than half the number of healthy interfaces as the other unit, it will mark itself as Failed. The Standby will take over if it is determined that the Active has failed. This is referred to as the 50 percent rule.

    The biggest drawback with the 50 percent rule is that that the failure of an important link does not necessarily trigger a unit failover. To understand this point, look at an example. Assume that you have three interfacesinside, outside, and DMZ for FWSM, which is set up as a failover. If the outside link is down, you will lose all connectivity to the outside world. Although this is an important link, due to the 50 percent rule, the Primary unit that is currently active will not failover, even though the secondary unit is healthier (as all three interfaces are up on the secondary unit). So you need to rely on Layer 1 or 2 redundancy schemes, such as Etherchannel and the Spanning-Tree protocol, to avoid this type of failure.

    To address this 50 percent rule shortcoming, FWSM release 2.x introduces the capability to modify the fixed 50 percent rule that applies to the 1.1(x) release. This is achieved using interface tracking. You can designate monitored interfaces across contexts. Each time a monitored interface fails, a counter (N) is incremented. It is compared against a global counter (M) that is specified by the user. Whenever N exceeds M, a module failover is triggered. Using this property, a module failover could be initiated when, for instance, all the interfaces of a context fail, or when the interface leading to the ISP router fails. This feature is available both in single and multiple modes.

    Interface failure might occur for one of the following reasons:

    - The VLAN interface is shut down or the VLAN is cleared from the switch.

    - The port/ports or the cable carrying the interface's VLAN between a pair of FWSMs in separate chassis is removed or becomes faulty.

    - The interface is overloaded with traffic and experiencing packet drops.

    - No failover IP address is configured.

    Interface failover can be determined either by using the show failover or show interface command.