Archive for April 2011

State the problem, specify the problem

Anyone working in a services / performance tuning role will quickly understand the value of the title.

For any problem solving to start, its critical for you to have a clear idea about what you think the problem actually is. That might sound obvious to you, but its very easy to get lost in someones description of  whats happening.

Imagine the scene where you get a phone call and someone says “you need to fix the network, it seems broken…”

What does that sentence really mean? You won’t find out until you start asking sensible questions about what the stakeholder is seeing. If the user was on a 10/100 Mbit and they were seeing about 10-12 MBs a second then thats probably about as fast as its going to go (note Mbit vs MB). To get over this “problem” we have to make a decision; compress the files, use a different machine etc etc.

Until you know the root cause of the problem, you can’t suggest a fix. You can’t even mitigate the issue.

Let’s say the above network was using an ACME switch, known to not be fully non-blocking. If you don’t know how much bandwidth is in use, then can you say for sure that just swapping out the switch for a brand new switch (of the same model) is going to fix it? No. So if you did that, you’d be swapping out a perfectly ok switch for a brand new one which is going to experience exactly the same behaviour.

Below is a video which is ficticious scenario, but fairly realistic. A user calls up and complains about the website being “down”. The webserver seems to be working, but he user insists that an action be taken. The analyst doesn’t take the time to actually determine what problem the user is seeing, and instead simply complies with the request to action. Watch the video to see the rest!

Jumping to cause broke the website completely, and could have been averted if the analyst stuck to his guns and produced a proper problem statement and description.

Keep in mind that the true root cause will explain the symptoms that are seen. So when someone makes a suggestion about what the cause could be, test it against the specification and see if it seems true.