A day in the life of Facebook Operations*
A look at the tools and practices used at Facebook to support the #2 site in the world.
Facebook is now the #2 global website, responsible for billions of photos, conversations, and interactions between people all around the world running on top of tens of thousands of servers spread across multiple geographically-separated datacenters. When problems arise in the infrastructure behind the scenes it directly impacts the ability of people to connect and share with those they care about around the World.
Facebook’s Technical Operations team has to balance this need for constant availability with a fast-moving and experimental engineering culture. We release code every day. Additionally, we are supporting exponential user growth while still managing an exceptionally high radio of users per employee within engineering and operations.
This talk will go into how Facebook is “run” day-to-day with particular focus on actual tools in use (configuration management systems, monitoring, automation, etc), how we detect anomalies and respond to them, and the processes we use internally for rapidly pushing out changes while still keeping a handle on site stability.
Tom is a Systems Engineer on the Technical Operations team at Facebook, where he is responsible for a variety of low-level services and systems within the production environment. During his time at Facebook, the systems footprint has expanded over 10×. Prior to joining the company, Tom worked for a number of smaller tech companies in Texas.