Problem-Solving Model
While troubleshooting a production network problem, it is best to use a systematic troubleshooting approach. It is not uncommon to see a network security troubleshooter take an unsystematic approach, which might work sometimes, but could introduce other problems to the network. Besides, troubleshooting takes enormous amounts of time unless you know about the product and technology, and take a systematic approach.
This section explains the systematic troubleshooting approach that you can use to help isolate your problems.
Step 1. | |
Step 2. | |
Step 3. | |
Step 4. | |
Step 5. | |
Step 6. | |
Step 7. | |
Step 8. |
The sections that follow provide additional information about each of these steps.
Step 1: Define the Problem
This is the first step in the process. Based on the information available, you must define your problem precisely. You should define the problem in terms of a set of symptoms and potential causes. Every problem should have an element (for example, PIX firewall, translation, connection, and so on) and one or two lines outlining the problem statement. Sometimes you have to deal with multiple problems. In that case, it is important to define each problem individually and address each one in the order of priority. This is especially important because without solving the problem of highest priority, you might not be able to address the next issue.
For instance, users might complain that the PIX firewall was not allowing their traffic from inside to the Internet. There could be multiple reasons for this. It could be that the translation is not formed that is required for the connection to be formed. This, in turn, results in an unsuccessful connection across the PIX firewall. Or, it could be that translation is formed but the connection is not formed. So, under that circumstance, you might define two problem statements. The first statement could be this: "The PIX is not building up translation." The second statement could be this: "The PIX is not building up a connection." Now you need to prioritize which is more important. Obviously, without translation, there are no connections. So, you need to work on translation first, because resolving translation may resolve the connection. From this example you can see that setting priorities is important. So, to properly analyze the problem, identify the general symptoms, and then ascertain what kinds of problems (causes) could result in these symptoms. The problem statement should follow this format: What is wrong with what?
Here are some of the questions that will help you define a good problem statement:
1 | Identify the concerns end users have by asking these questions:
|
2 | Once you have answers to these preliminary questions, you might need to prioritize the problem by answering the following questions:
|
3 | After going through steps 1 and 2, you should be able to define the problems and prioritize them if needed. The problem statement can now be as narrowed to the following: "Translation is not getting built up on the PIX firewall for inside host x." |
Step 2: Gather the Facts
Gathering facts is a step that will help to isolate the possible causes. The following actions should be taken to uncover the facts:
1 | What are the reported problems? To find out the problems reported, take the following steps:
|
2 | Where is the problem reported? Once you identify symptoms of the problems, you need to find out exactly where the problem is reported. This deals with the location of the device and the problems in the device. The following questions, along with a topology diagram of the network, will help you find information about where the problem is reported:
|
3 | When is the problem reported? It is very important to know when the problem is reported. This might help in identifying what changes have occurred at the time of the problem occurrence. The following question will help in getting this information:
|
4 | What is the scope of the problem? Understanding the magnitude of the problem is important. The following questions will assist you in identifying the magnitude of the problem:
|
5 | After collecting as many facts as possible, use the baseline information (configuration, statistics, and so on) to find out what has changed in terms of configurations and statistics. For example, you might have baselined the Port Address Translation (PAT) to be 30K during a busy hour. And if you find the translation number has crossed more than 30K at any given time, this could be a potential problem. |
Step 3: Consider Possible Problems
If you do a good job in fact-finding, know your network well, and have topology and baseline information on hand, this step is easier. In other words, the success or failure of this step depends heavily upon the previous steps.
Using the facts, you can eliminate some of the potential problems from the list you defined in Step 1. Depending on the data, for example, you might be able to identify whether a problem involves software or configuration. At every opportunity, try to decrease the number of potential problems, so that you can create an efficient plan of action, which is discussed next.
Step 4: Create an Action Plan
Based on the remaining potential problems deduced from the previous step, prioritize the issues. Then start making the changes one by one, based on the list you have created, with highest priority first. Working with only one variable at a time enables you to reproduce a given solution to a specific problem. If you alter more than one variable simultaneously, you might be able to solve the problem, but identifying the specific change that eliminated the symptom becomes far more difficult, and will not help you solve the same problem if it occurs in the future.
Step 5: Implement the Action Plan
Perform each step carefully while testing to see whether the symptom disappears.
Step 6: Observe Results
Whenever you change a variable, be sure to gather results. Generally, you should use the same method of gathering facts that you used in Step 2 (that is, working with the key people affected, in conjunction with using your diagnostic tools).
Step 7: Repeat if Necessary
Analyze the results to determine whether the problem has been resolved. If it has, then the troubleshooting is complete.
Step 8: Document the Changes
If the problem has not been resolved, you must create an action plan based on the next most likely problem in your list. Return to Step 4, change one variable at a time, and repeat the process until the problem is solved. However, if the problem is resolved, be sure to document the changes you make. This step is very important and often is ignored. Every change you make to the network poses the potential of creating another problem (although this does not often happen). So, if you have the documentation on the changes you make, you can always refer back to them. Besides, this process produces a good knowledge base for others in the department who are not involved with the specific troubleshooting that you have performed.