Keeping Application Support Human*
You’ve built it, now you need to keep your users happy. Doing this without sacrificing your own happiness, and those of your teammates, takes planning. The intersection of monitoring and bother-the-humans is central to team happiness. We will go over this intersection and provide ways to navigate this humanely, and make your users happy.
Any product that has live users in it will need some degree of continual monitoring for trouble; from infrastructure wobbles, usage spikes, to code of conduct violating harassment campaigns. Balancing these 24/7 needs with a healthy work/life balance for the team should be a priority. Not just the Operations team face this, the teams who answer the phones and email when problems arise face the same challenges.
Large companies may be able to afford follow-the-sun support, where all reported problems arrive in someone’s 9-5 day shift; smaller companies often have their employees in closely grouped timezones and have to resort to on-call schedules to handle out-of-hours problems. Being lazy about this often leads to poor customer success, as well as poor employee satisfaction. Being intentional makes everyone happier.
In this talk, we will cover:
- Analyzing the risk for out-of-hours problems.
- Deciding if you even need on-call.
- Using the risk-analysis to determine your call-rate probability curve.
- Review how call-frequency impacts responder quality of life.
- For monitoring-system alerts, how to pick what will bother a human, and what can wait until morning.
- Feedback processes to implement so you can maintain the humane nature of your off-hours work.
culture, monitoring, on-call
I am new on the scene, with three talks in 2016 at RailsConf, DevOpsDays Minneapolis, and LISA.
Jamie Riedesel is a DevOps Engineer at HelloSign and has been performing acts of systems administration and engineering since 1997. She moved from corporate IT to the startup space in 2010 and experienced the good kind of culture shock. Jamie has been blogging as sysadmin1138 since 2004, a community elected moderator on ServerFault since 2010, and awarded the Chuck Yerkes community award by LOPSA in 2015.