Modeling and synthesizing task placement constraints in Google compute clusters

Bikash Sharma, Victor Chudnovsky, Joseph L. Hellerstein, Rasekh Rifaat, Chita R. Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

120 Scopus citations

Abstract

Evaluating the performance of large compute clusters requires benchmarks with representative workloads. At Google, performance benchmarks are used to obtain performance metrics such as task scheduling delays and machine resource utilizations to assess changes in application codes, machine configurations, and scheduling algorithms. Existing approaches to workload characterization for high performance computing and grids focus on task resource requirements for CPU, memory, disk, I/O, network, etc. Such resource requirements address how much resource is consumed by a task. However, in addition to resource requirements, Google workloads commonly include task placement constraints that determine which machine resources are consumed by tasks. Task placement constraints arise because of task dependencies such as those related to hardware architecture and kernel version. This paper develops methodologies for incorporating task placement constraints and machine properties into performance benchmarks of large compute clusters. Our studies of Google compute clusters show that constraints increase average task scheduling delays by a factor of 2 to 6, which often results in tens of minutes of additional task wait time. To understand why, we extend the concept of resource utilization to include constraints by introducing a new metric, the Utilization Multiplier (UM). UM is the ratio of the resource utilization seen by tasks with a constraint to the average utilization of the resource. UM provides a simple model of the performance impact of constraints in that task scheduling delays increase with UM. Last, we describe how to synthesize representative task constraints and machine properties, and how to incorporate this synthesis into existing performance benchmarks. Using synthetic task constraints and machine properties generated by our methodology, we accurately reproduce performance metrics for benchmarks of Google compute clusters with a discrepancy of only 13% in task scheduling delay and 5% in resource utilization.

Original languageEnglish (US)
Title of host publicationProceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011
DOIs
StatePublished - 2011
Event2nd ACM Symposium on Cloud Computing, SOCC 2011 - Cascais, Portugal
Duration: Oct 26 2011Oct 28 2011

Publication series

NameProceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011

Other

Other2nd ACM Symposium on Cloud Computing, SOCC 2011
CountryPortugal
CityCascais
Period10/26/1110/28/11

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint Dive into the research topics of 'Modeling and synthesizing task placement constraints in Google compute clusters'. Together they form a unique fingerprint.

Cite this