TetriSched: Global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters

Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A. Kozuch, Mor Harchol-Balter, Gregory R. Ganger

Research output: Chapter in Book/Report/Conference proceedingConference contribution

75 Scopus citations

Abstract

TetriSched is a scheduler that works in tandem with a calendaring reservation system to continuously re-evaluate the immediate-term scheduling plan for all pending jobs (including those with reservations and best-effort jobs) on each scheduling cycle. TetriSched leverages information supplied by the reservation system about jobs' deadlines and estimated runtimes to plan ahead in deciding whether to wait for a busy preferred resource type (e.g., machine with a GPU) or fall back to less preferred placement options. Plan-ahead affords significant flexibility in handling mis-estimates in job runtimes specified at reservation time. Integrated with the main reservation system in Hadoop YARN, TetriSched is experimentally shown to achieve significantly higher SLO attainment and cluster utilization than the best-configured YARN reservation and CapacityScheduler stack deployed on a real 256 node cluster.

Original languageEnglish (US)
Title of host publicationProceedings of the 11th European Conference on Computer Systems, EuroSys 2016
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450342407
DOIs
StatePublished - Apr 18 2016
Event11th European Conference on Computer Systems, EuroSys 2016 - London, United Kingdom
Duration: Apr 18 2016Apr 21 2016

Publication series

NameProceedings of the 11th European Conference on Computer Systems, EuroSys 2016

Other

Other11th European Conference on Computer Systems, EuroSys 2016
Country/TerritoryUnited Kingdom
CityLondon
Period4/18/164/21/16

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Networks and Communications
  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'TetriSched: Global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters'. Together they form a unique fingerprint.

Cite this