Loading…
MesosCon18 has ended
Welcome to MesosCon 2018 which will be held in The Village (969 Market St, San Francisco) between November 5th-7th, bringing together users and developers to share and learn about the project and its growing ecosystem.

Tickets are now available to purchase below.


View analytic
Tuesday, November 6 • 1:50pm - 2:30pm
Automating large scale cluster management

Sign up or log in to save this to your schedule and see who's attending!

At Uber, we manage a Mesos fleet of tens of thousands of hosts, and as we scale up to span multiple datacenters and cloud platforms, we've developed a system called CLM (Cluster Lifecycle Manager) to automatically manage host maintenance, cluster operations, and infra upgrades without impacting running services' SLAs. CLM serves as a missing layer between service orchestration and host / resource management. By using an extensible system to gather issues on our hosts from multiple sources, such as hardware failures or misconfigurations, we are able to repair or remove those hosts before they impact production. We use goalstate config to specify the expected state of clusters and automatically converge on that, while integrating with the orchestration layer to ensure we operate safely without causing any disruption to service health.

Speakers
avatar for Iain Becker

Iain Becker

Staff Software Engineer, Uber
Iain is a Staff engineer at Uber working on cluster management automation. Before Uber he worked on deployment and test automation for search infra at Facebook, and search infra at Google.
avatar for Yunpeng Liu

Yunpeng Liu

Sr Software Engineer, Uber



Tuesday November 6, 2018 1:50pm - 2:30pm
B
  • Host Organization Uber