Tuesday, October 02, 2007

Will there be a claxon when it's time to panic?

When asked what the gist of what we do in Engineering is I generally have to think about it because it differs given different situations. In some cases it's about technology and how it is applied. In some cases it is about ramp up plans and safe rates of growth. In some cases it is about algorithms built for scalability and consistency versus short term function. At the heart of all of these things though is Risk Management.

Risk Management is an art and not a science. If it was easy to tell everything that would go wrong then nothing ever would. So since we don't have perfect information there is a balance that needs to be achieved between safety and progress. We need to look at the risk, its potential cost (PLOP factor) and then weigh all of this with previous experience and make a call to be later judged as a good call or a bad call. Or... if everything does what it is supposed to then it's a decision that just fades into the background.

So when you encounter a risk what do you do with it. It actually boils down into some simple choices. A Cutter article a while ago defined out a basic framework that I wrote on a sticky and refer to now and then as a framework.

  1. Accept it - It's a risk. It's understood. There is not much you can do about it so move on with life and be prepared if it happens.

  2. Avoid it - Sometimes a risk when found can simply be avoided. The ones that I think of in this regard are running a volume test in an overlapping time window with a system change.

  3. Transfer it - This is the get someone else to do it approach. To successfully use this approach the other party needs to be aware that they are getting the risk (no email volleys please). This makes sense when there is someone who is better qualified or has a business to handle the type of thing you are dealing with. It may cost money but mitigates the risk.

  4. Reduce it - This approach is commonly used when it a risk we have to face and work through, but can't directly transfer it or otherwise avoid it. A good example of this is a ramp up plan that is overly optimistic or doesn't account for transition. We mitigate this risk by reducing it and slowing the ramp down in order to make the problems smaller.

While not my own list I have thought that this provided a nice structured way to think through risks and what you need to do. If nothing else it helps in the acknowledgement that there are risks, even if we do choose to not do anything we need that to be a conscious choice.

No comments: