How to deal with issues in production – the short version
Okay, imagine the following scenario: You’re happily going about your day, coding away, without a worry in the world, and then this hits you: “There is something wrong with the production app” (well, maybe not this exact message, but you get the idea). What do you do?! How do you go about fixing the problem and go back to your everyday life?
1. Calm down
Yes, things are bad at the moment. However, this is not the time to lose your head and run around with your hair on fire. You need to have a clear mind in order to focus on the problem and easily find the solution. Go to your happy place and come back when you are ready to get to work.
2. Calm down some more
If you are anything like me then you probably are not calm enough to think clearly. You should calm down some more.
3. Investigate the issue and understand it
OK, now we are ready to start doing something about the problem. The first step is to understand the root cause of the issue. Maybe someone deployed something they shouldn’t have, maybe the data became corrupted or maybe, and this only happens once in a blue moon, maybe there this is just a misunderstanding and there is no issue at all, in which case: Congratulations, you solved it.
At the end of this stage you should be able to at least answer the following question: When exactly does this happens ?
4. Estimate the impact
Now that we know when the issue occurs, we can start thinking about who does this issue affect. A quick and dirty estimation should be enough. Some useful classifications are: critical (everyone is affected and everything is broken), medium (there is some impact, but the project will probably survive), low (this issue affects only a handful of clients/users), minimal (yes, there is an issue, but unless someone looks really, really hard they’re going to miss it).
Based on the impact of the issue you need to decide if you should fix this now, or later.
At the end of this stage you should be able to at least answer the following question: How bad is it ?
5. Calm down some more
OK, things are starting to clear up a bit now. You know what the problem is and how bad it is. Take a deep breath, decide when this needs to be fixed: either now or later.
6. Track down when the issue was introduced
If you use a versioning system (if you don’t you really, really should), use that to track down the commit that caused the issue. Look trough the code and use a sandbox environment (if any available) to figure that out.
At the end of this stage you should be able to answer the following question: What is the exact cause of the issue ?
7. Fix the issue
Now that you know what causes the issue you are in the best position to start fixing it, so go do it, be a hero and save the day.
Congratulations!! You fixed it! You can go back to your happy life – crisis averted.