On-call done right: how even a developer can help
It is another night on-call, and a customer is reporting a problem with one of your key services. The logs and the monitoring systems tell you nothing, and so it is time to wake up one of the on-call developers. You can already predict what they are going to say – that they “don’t see anything on their end” and that they “can’t understand why you woke them up, again, for what is clearly a problem on your end”. Same old, same old… You have to remember, though, that while you have all the tools at your disposal – the developers don’t. They wrote and pushed the code, and at that point have transferred the responsibility to you – they don’t have the same context. This talk discusses this gap, and what we can do to close it.
Tom is a developer advocate at Lightrun, where he works on re-shaping what production observability looks like. Tom was previously a site reliability engineer for a distributed systems startup, teaches technological prototyping for creatives at a local college’s media lab, and is an avid explainer of all things tech. You can follow him @TomGranot on Twitter.