MesosCon18 has ended
Welcome to MesosCon 2018 which will be held in The Village (969 Market St, San Francisco) between November 5th-7th, bringing together users and developers to share and learn about the project and its growing ecosystem.

Tickets are now available to purchase below.

Back To Schedule
Monday, November 5 • 1:00pm - 1:40pm
Shipping Reliably at Scale

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Apache Mesos and Apache Aurora enable engineering teams at Twitter to run production services at scale, without the pain associated with traditional management of bare-metal hardware. Within Ads Serving, we face the interesting challenge of running (deploying, operating, staffing oncall) production systems that contain code from many teams. Others depend on us for shipping their feature, and we carry the pager for their code in addition to ours.

Historically, we have invested in release testing tooling to help provide signal on the viability of a future deployment. While this was helpful, it ultimately had gaps in coverage that resulted in failed canaries and delayed deployments. One example is that our ML based ads serving system uses a feedback control mechanism to trade off ad quality for service availability. This introduces additional challenges to validate the health of a change list or a release candidate.

In this talk, we will discuss the multi-faceted problems we were facing, our design for tackling these challenges, and the results we have observed since going live. We will cover Aurora-based solutions for automated load-testing of code reviews (each diff, before merge), release candidate load testing (multiple diffs in a deployment), canary analysis, and deployment of multiple logical clusters across multiple datacenters … all with two clicks :)

avatar for Brian Brophy

Brian Brophy

Staff Site Reliability Engineer, Twitter
Brian Brophy is a Staff Site Reliability Engineer at Twitter with a passion for music, puzzles, security, automation, performance, and scale.
avatar for Jianhang Gao

Jianhang Gao

Software Engineer, Twitter
Jianhang Gao is a Staff Software Engineer who works at Twitter focusing on Adserver architecture and performance. Jianhang holds a Ph.D. in Electrical and Computer Engineering from UC Davis.

Monday November 5, 2018 1:00pm - 1:40pm PST
  Breakout Session