Multiverse: Dynamic VM Provisioning for Virtualized High Performance Computing Clusters

Jashwant Raj Gunasekaran, Michael Cui, Prashanth Thinakaran, Josh Simons, Mahmut T. Kandemir, Chita R. Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Traditionally, HPC workloads have been deployed in bare-metal clusters; but the advances in virtualization have led the pathway for these workloads to be deployed in virtualized clusters. However, HPC cluster administrators/providers still face challenges in terms of resource elasticity and virtual machine (VM) provisioning at large-scale, due to the lack of coordination between a traditional HPC scheduler and the VM hypervisor (resource management layer). This lack of interaction leads to low cluster utilization and job completion throughput. Furthermore, the VM provisioning delays directly impact the overall performance of jobs in the cluster. Hence, there is a need for effectively provisioning virtualized HPC clusters, which can best-utilize the physical hardware with minimal provisioning overheads.Towards this, we propose Multiverse, a VM provisioning framework, which can dynamically spawn VMs for incoming jobs in a virtualized HPC cluster, by integrating the HPC scheduler along with VM resource manager. We have implemented this framework on the Slurm scheduler along with the vSphere VM resource manager. In order to reduce the VM provisioning overheads, we use instant cloning which shares both the disk and memory with the parent VM, when compared to full VM cloning which has to boot-up a new VM from scratch. Measurements with real-world HPC workloads demonstrate that, instant cloning is 2.5× faster than full cloning in terms of VM provisioning time. Further, it improves resource utilization by up to 40%, and cluster throughput by up to 1.5×, when compared to full clone for bursty job arrival scenarios.

Original languageEnglish (US)
Title of host publicationProceedings - 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020
EditorsLaurent Lefevre, Carlos A. Varela, George Pallis, Adel N. Toosi, Omer Rana, Rajkumar Buyya
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages131-141
Number of pages11
ISBN (Electronic)9781728160955
DOIs
StatePublished - May 2020
Event20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020 - Melbourne, Australia
Duration: May 11 2020May 14 2020

Publication series

NameProceedings - 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020

Conference

Conference20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020
CountryAustralia
CityMelbourne
Period5/11/205/14/20

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality

Fingerprint Dive into the research topics of 'Multiverse: Dynamic VM Provisioning for Virtualized High Performance Computing Clusters'. Together they form a unique fingerprint.

  • Cite this

    Gunasekaran, J. R., Cui, M., Thinakaran, P., Simons, J., Kandemir, M. T., & Das, C. R. (2020). Multiverse: Dynamic VM Provisioning for Virtualized High Performance Computing Clusters. In L. Lefevre, C. A. Varela, G. Pallis, A. N. Toosi, O. Rana, & R. Buyya (Eds.), Proceedings - 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020 (pp. 131-141). [9139712] (Proceedings - 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CCGrid49817.2020.00-80