MesosCon18 has ended
Welcome to MesosCon 2018 which will be held in The Village (969 Market St, San Francisco) between November 5th-7th, bringing together users and developers to share and learn about the project and its growing ecosystem.

Tickets are now available to purchase below.

Back To Schedule
Tuesday, November 6 • 1:50pm - 2:30pm
Automating large scale cluster management

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

At Uber, we manage a Mesos fleet of tens of thousands of hosts, and as we scale up to span multiple datacenters and cloud platforms, we've developed a system called CLM (Cluster Lifecycle Manager) to automatically manage host maintenance, cluster operations, and infra upgrades without impacting running services' SLAs. CLM serves as a missing layer between service orchestration and host / resource management. By using an extensible system to gather issues on our hosts from multiple sources, such as hardware failures or misconfigurations, we are able to repair or remove those hosts before they impact production. We use goalstate config to specify the expected state of clusters and automatically converge on that, while integrating with the orchestration layer to ensure we operate safely without causing any disruption to service health.

avatar for Iain Becker

Iain Becker

Staff Software Engineer, Uber
Iain is a Staff engineer at Uber working on cluster management automation. Before Uber he worked on deployment and test automation for search infra at Facebook, and search infra at Google.
avatar for Yunpeng Liu

Yunpeng Liu

Sr Software Engineer, Uber
Lead the compute cluster lifecycle management at Uber.Currently working on efficiency and federation projects in Uber Compute.

Tuesday November 6, 2018 1:50pm - 2:30pm PST
  Breakout Session
  • Host Organization Uber