HybridMR: A hierarchical MapReduce scheduler for hybrid data centers

Bikash Sharma, Timothy Wood, Chita R. Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

45 Citations (Scopus)

Abstract

Virtualized environments are attractive because they simplify cluster management, while facilitating cost-effective workload consolidation. As a result, virtual machines in public clouds or private data centers, have become the norm for running transactional applications like web services and virtual desktops. On the other hand, batch workloads like MapReduce, are typically deployed in a native cluster to avoid the performance overheads of virtualization. While both these virtual and native environments have their own strengths and weaknesses, we demonstrate in this work that it is feasible to provide the best of these two computing paradigms in a hybrid platform. In this paper, we make a case for a hybrid data center consisting of native and virtual environments, and propose a 2-phase hierarchical scheduler, called HybridMR, for the effective resource management of interactive and batch workloads. In the first phase, HybridMR classifies incoming MapReduce jobs based on the expected virtualization overheads, and uses this information to automatically guide placement between physical and virtual machines. In the second phase, HybridMR manages the run-time performance of MapReduce jobs collocated with interactive applications in order to provide best effort delivery to batch jobs, while complying with the Service Level Agreements (SLAs) of interactive applications. By consolidating batch jobs with over-provisioned foreground applications, the available unused resources are better utilized, resulting in improved application performance and energy efficiency. Evaluations on a hybrid cluster consisting of 24 physical servers and 48 virtual machines, with diverse workload mix of interactive and batch MapReduce applications, demonstrate that HybridMR can achieve up to 40% improvement in the completion times of MapReduce jobs, over the virtual-only case, while complying with the SLAs of interactive applications. Compared to the native-only cluster, at the cost of minimal performance penalty, HybridMR boosts resource utilization by 45%, and achieves up to 43% energy savings. These results indicate that a hybrid data center with an efficient scheduling mechanism can provide a cost-effective solution for hosting both batch and interactive workloads.

Original languageEnglish (US)
Title of host publicationProceedings - 2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013
Pages102-111
Number of pages10
DOIs
StatePublished - Dec 1 2013
Event2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013 - Philadelphia, PA, United States
Duration: Jul 8 2013Jul 11 2013

Publication series

NameProceedings - International Conference on Distributed Computing Systems

Other

Other2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013
CountryUnited States
CityPhiladelphia, PA
Period7/8/137/11/13

Fingerprint

Information use
Consolidation
Web services
Virtual reality
Energy efficiency
Costs
Energy conservation
Servers
Scheduling
Virtual machine
Virtualization

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Sharma, B., Wood, T., & Das, C. R. (2013). HybridMR: A hierarchical MapReduce scheduler for hybrid data centers. In Proceedings - 2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013 (pp. 102-111). [6681580] (Proceedings - International Conference on Distributed Computing Systems). https://doi.org/10.1109/ICDCS.2013.31
Sharma, Bikash ; Wood, Timothy ; Das, Chita R. / HybridMR : A hierarchical MapReduce scheduler for hybrid data centers. Proceedings - 2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013. 2013. pp. 102-111 (Proceedings - International Conference on Distributed Computing Systems).
@inproceedings{4e9bc99bb1534ed293fb4e61c86a3074,
title = "HybridMR: A hierarchical MapReduce scheduler for hybrid data centers",
abstract = "Virtualized environments are attractive because they simplify cluster management, while facilitating cost-effective workload consolidation. As a result, virtual machines in public clouds or private data centers, have become the norm for running transactional applications like web services and virtual desktops. On the other hand, batch workloads like MapReduce, are typically deployed in a native cluster to avoid the performance overheads of virtualization. While both these virtual and native environments have their own strengths and weaknesses, we demonstrate in this work that it is feasible to provide the best of these two computing paradigms in a hybrid platform. In this paper, we make a case for a hybrid data center consisting of native and virtual environments, and propose a 2-phase hierarchical scheduler, called HybridMR, for the effective resource management of interactive and batch workloads. In the first phase, HybridMR classifies incoming MapReduce jobs based on the expected virtualization overheads, and uses this information to automatically guide placement between physical and virtual machines. In the second phase, HybridMR manages the run-time performance of MapReduce jobs collocated with interactive applications in order to provide best effort delivery to batch jobs, while complying with the Service Level Agreements (SLAs) of interactive applications. By consolidating batch jobs with over-provisioned foreground applications, the available unused resources are better utilized, resulting in improved application performance and energy efficiency. Evaluations on a hybrid cluster consisting of 24 physical servers and 48 virtual machines, with diverse workload mix of interactive and batch MapReduce applications, demonstrate that HybridMR can achieve up to 40{\%} improvement in the completion times of MapReduce jobs, over the virtual-only case, while complying with the SLAs of interactive applications. Compared to the native-only cluster, at the cost of minimal performance penalty, HybridMR boosts resource utilization by 45{\%}, and achieves up to 43{\%} energy savings. These results indicate that a hybrid data center with an efficient scheduling mechanism can provide a cost-effective solution for hosting both batch and interactive workloads.",
author = "Bikash Sharma and Timothy Wood and Das, {Chita R.}",
year = "2013",
month = "12",
day = "1",
doi = "10.1109/ICDCS.2013.31",
language = "English (US)",
isbn = "9780769550008",
series = "Proceedings - International Conference on Distributed Computing Systems",
pages = "102--111",
booktitle = "Proceedings - 2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013",

}

Sharma, B, Wood, T & Das, CR 2013, HybridMR: A hierarchical MapReduce scheduler for hybrid data centers. in Proceedings - 2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013., 6681580, Proceedings - International Conference on Distributed Computing Systems, pp. 102-111, 2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013, Philadelphia, PA, United States, 7/8/13. https://doi.org/10.1109/ICDCS.2013.31

HybridMR : A hierarchical MapReduce scheduler for hybrid data centers. / Sharma, Bikash; Wood, Timothy; Das, Chita R.

Proceedings - 2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013. 2013. p. 102-111 6681580 (Proceedings - International Conference on Distributed Computing Systems).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - HybridMR

T2 - A hierarchical MapReduce scheduler for hybrid data centers

AU - Sharma, Bikash

AU - Wood, Timothy

AU - Das, Chita R.

PY - 2013/12/1

Y1 - 2013/12/1

N2 - Virtualized environments are attractive because they simplify cluster management, while facilitating cost-effective workload consolidation. As a result, virtual machines in public clouds or private data centers, have become the norm for running transactional applications like web services and virtual desktops. On the other hand, batch workloads like MapReduce, are typically deployed in a native cluster to avoid the performance overheads of virtualization. While both these virtual and native environments have their own strengths and weaknesses, we demonstrate in this work that it is feasible to provide the best of these two computing paradigms in a hybrid platform. In this paper, we make a case for a hybrid data center consisting of native and virtual environments, and propose a 2-phase hierarchical scheduler, called HybridMR, for the effective resource management of interactive and batch workloads. In the first phase, HybridMR classifies incoming MapReduce jobs based on the expected virtualization overheads, and uses this information to automatically guide placement between physical and virtual machines. In the second phase, HybridMR manages the run-time performance of MapReduce jobs collocated with interactive applications in order to provide best effort delivery to batch jobs, while complying with the Service Level Agreements (SLAs) of interactive applications. By consolidating batch jobs with over-provisioned foreground applications, the available unused resources are better utilized, resulting in improved application performance and energy efficiency. Evaluations on a hybrid cluster consisting of 24 physical servers and 48 virtual machines, with diverse workload mix of interactive and batch MapReduce applications, demonstrate that HybridMR can achieve up to 40% improvement in the completion times of MapReduce jobs, over the virtual-only case, while complying with the SLAs of interactive applications. Compared to the native-only cluster, at the cost of minimal performance penalty, HybridMR boosts resource utilization by 45%, and achieves up to 43% energy savings. These results indicate that a hybrid data center with an efficient scheduling mechanism can provide a cost-effective solution for hosting both batch and interactive workloads.

AB - Virtualized environments are attractive because they simplify cluster management, while facilitating cost-effective workload consolidation. As a result, virtual machines in public clouds or private data centers, have become the norm for running transactional applications like web services and virtual desktops. On the other hand, batch workloads like MapReduce, are typically deployed in a native cluster to avoid the performance overheads of virtualization. While both these virtual and native environments have their own strengths and weaknesses, we demonstrate in this work that it is feasible to provide the best of these two computing paradigms in a hybrid platform. In this paper, we make a case for a hybrid data center consisting of native and virtual environments, and propose a 2-phase hierarchical scheduler, called HybridMR, for the effective resource management of interactive and batch workloads. In the first phase, HybridMR classifies incoming MapReduce jobs based on the expected virtualization overheads, and uses this information to automatically guide placement between physical and virtual machines. In the second phase, HybridMR manages the run-time performance of MapReduce jobs collocated with interactive applications in order to provide best effort delivery to batch jobs, while complying with the Service Level Agreements (SLAs) of interactive applications. By consolidating batch jobs with over-provisioned foreground applications, the available unused resources are better utilized, resulting in improved application performance and energy efficiency. Evaluations on a hybrid cluster consisting of 24 physical servers and 48 virtual machines, with diverse workload mix of interactive and batch MapReduce applications, demonstrate that HybridMR can achieve up to 40% improvement in the completion times of MapReduce jobs, over the virtual-only case, while complying with the SLAs of interactive applications. Compared to the native-only cluster, at the cost of minimal performance penalty, HybridMR boosts resource utilization by 45%, and achieves up to 43% energy savings. These results indicate that a hybrid data center with an efficient scheduling mechanism can provide a cost-effective solution for hosting both batch and interactive workloads.

UR - http://www.scopus.com/inward/record.url?scp=84893230913&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893230913&partnerID=8YFLogxK

U2 - 10.1109/ICDCS.2013.31

DO - 10.1109/ICDCS.2013.31

M3 - Conference contribution

AN - SCOPUS:84893230913

SN - 9780769550008

T3 - Proceedings - International Conference on Distributed Computing Systems

SP - 102

EP - 111

BT - Proceedings - 2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013

ER -

Sharma B, Wood T, Das CR. HybridMR: A hierarchical MapReduce scheduler for hybrid data centers. In Proceedings - 2013 IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013. 2013. p. 102-111. 6681580. (Proceedings - International Conference on Distributed Computing Systems). https://doi.org/10.1109/ICDCS.2013.31