Provisioning a multi-tiered data staging area for extreme-scale machines

Ramya Prabhakar, Sudharshan S. Vazhkudai, Youngjae Kim, Ali R. Butt, Min Li, Mahmut Kandemir

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Citations (Scopus)

Abstract

Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to improve their I/O performance. Traditional parallel file systems (PFS) in high performance computing (HPC) systems are unable to keep up with such high data rates, creating a storage wall. In this work, we present a novel multi-tiered storage architecture comprising hybrid node-local resources to construct a dynamic data staging area for extreme-scale machines. Such a staging ground serves as an impedance matching device between applications and the PFS. Our solution combines diverse resources (e.g., DRAM, SSD) in such a way as to approach the performance of the fastest component technology and the cost of the least expensive one. We have developed an automated provisioning algorithm that aids in meeting the checkpointing performance requirement of HPC applications, by using a least-cost storage configuration. We evaluate our approach using both an implementation on a large scale cluster and a simulation driven by six-years worth of Jaguar supercomputer job-logs, and show that our approach, by choosing an appropriate storage configuration, achieves 41.5% cost savings with only negligible impact on performance.

Original languageEnglish (US)
Title of host publicationProceedings - 31st International Conference on Distributed Computing Systems, ICDCS 2011
Pages1-12
Number of pages12
DOIs
StatePublished - Aug 25 2011
Event31st International Conference on Distributed Computing Systems, ICDCS 2011 - Minneapolis, MN, United States
Duration: Jun 20 2011Jul 24 2011

Publication series

NameProceedings - International Conference on Distributed Computing Systems

Other

Other31st International Conference on Distributed Computing Systems, ICDCS 2011
CountryUnited States
CityMinneapolis, MN
Period6/20/117/24/11

Fingerprint

Supercomputers
Costs
Dynamic random access storage

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Prabhakar, R., Vazhkudai, S. S., Kim, Y., Butt, A. R., Li, M., & Kandemir, M. (2011). Provisioning a multi-tiered data staging area for extreme-scale machines. In Proceedings - 31st International Conference on Distributed Computing Systems, ICDCS 2011 (pp. 1-12). [5961683] (Proceedings - International Conference on Distributed Computing Systems). https://doi.org/10.1109/ICDCS.2011.33
Prabhakar, Ramya ; Vazhkudai, Sudharshan S. ; Kim, Youngjae ; Butt, Ali R. ; Li, Min ; Kandemir, Mahmut. / Provisioning a multi-tiered data staging area for extreme-scale machines. Proceedings - 31st International Conference on Distributed Computing Systems, ICDCS 2011. 2011. pp. 1-12 (Proceedings - International Conference on Distributed Computing Systems).
@inproceedings{6962d20bda12466aaa754fdae0710daa,
title = "Provisioning a multi-tiered data staging area for extreme-scale machines",
abstract = "Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to improve their I/O performance. Traditional parallel file systems (PFS) in high performance computing (HPC) systems are unable to keep up with such high data rates, creating a storage wall. In this work, we present a novel multi-tiered storage architecture comprising hybrid node-local resources to construct a dynamic data staging area for extreme-scale machines. Such a staging ground serves as an impedance matching device between applications and the PFS. Our solution combines diverse resources (e.g., DRAM, SSD) in such a way as to approach the performance of the fastest component technology and the cost of the least expensive one. We have developed an automated provisioning algorithm that aids in meeting the checkpointing performance requirement of HPC applications, by using a least-cost storage configuration. We evaluate our approach using both an implementation on a large scale cluster and a simulation driven by six-years worth of Jaguar supercomputer job-logs, and show that our approach, by choosing an appropriate storage configuration, achieves 41.5{\%} cost savings with only negligible impact on performance.",
author = "Ramya Prabhakar and Vazhkudai, {Sudharshan S.} and Youngjae Kim and Butt, {Ali R.} and Min Li and Mahmut Kandemir",
year = "2011",
month = "8",
day = "25",
doi = "10.1109/ICDCS.2011.33",
language = "English (US)",
isbn = "9780769543642",
series = "Proceedings - International Conference on Distributed Computing Systems",
pages = "1--12",
booktitle = "Proceedings - 31st International Conference on Distributed Computing Systems, ICDCS 2011",

}

Prabhakar, R, Vazhkudai, SS, Kim, Y, Butt, AR, Li, M & Kandemir, M 2011, Provisioning a multi-tiered data staging area for extreme-scale machines. in Proceedings - 31st International Conference on Distributed Computing Systems, ICDCS 2011., 5961683, Proceedings - International Conference on Distributed Computing Systems, pp. 1-12, 31st International Conference on Distributed Computing Systems, ICDCS 2011, Minneapolis, MN, United States, 6/20/11. https://doi.org/10.1109/ICDCS.2011.33

Provisioning a multi-tiered data staging area for extreme-scale machines. / Prabhakar, Ramya; Vazhkudai, Sudharshan S.; Kim, Youngjae; Butt, Ali R.; Li, Min; Kandemir, Mahmut.

Proceedings - 31st International Conference on Distributed Computing Systems, ICDCS 2011. 2011. p. 1-12 5961683 (Proceedings - International Conference on Distributed Computing Systems).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Provisioning a multi-tiered data staging area for extreme-scale machines

AU - Prabhakar, Ramya

AU - Vazhkudai, Sudharshan S.

AU - Kim, Youngjae

AU - Butt, Ali R.

AU - Li, Min

AU - Kandemir, Mahmut

PY - 2011/8/25

Y1 - 2011/8/25

N2 - Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to improve their I/O performance. Traditional parallel file systems (PFS) in high performance computing (HPC) systems are unable to keep up with such high data rates, creating a storage wall. In this work, we present a novel multi-tiered storage architecture comprising hybrid node-local resources to construct a dynamic data staging area for extreme-scale machines. Such a staging ground serves as an impedance matching device between applications and the PFS. Our solution combines diverse resources (e.g., DRAM, SSD) in such a way as to approach the performance of the fastest component technology and the cost of the least expensive one. We have developed an automated provisioning algorithm that aids in meeting the checkpointing performance requirement of HPC applications, by using a least-cost storage configuration. We evaluate our approach using both an implementation on a large scale cluster and a simulation driven by six-years worth of Jaguar supercomputer job-logs, and show that our approach, by choosing an appropriate storage configuration, achieves 41.5% cost savings with only negligible impact on performance.

AB - Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to improve their I/O performance. Traditional parallel file systems (PFS) in high performance computing (HPC) systems are unable to keep up with such high data rates, creating a storage wall. In this work, we present a novel multi-tiered storage architecture comprising hybrid node-local resources to construct a dynamic data staging area for extreme-scale machines. Such a staging ground serves as an impedance matching device between applications and the PFS. Our solution combines diverse resources (e.g., DRAM, SSD) in such a way as to approach the performance of the fastest component technology and the cost of the least expensive one. We have developed an automated provisioning algorithm that aids in meeting the checkpointing performance requirement of HPC applications, by using a least-cost storage configuration. We evaluate our approach using both an implementation on a large scale cluster and a simulation driven by six-years worth of Jaguar supercomputer job-logs, and show that our approach, by choosing an appropriate storage configuration, achieves 41.5% cost savings with only negligible impact on performance.

UR - http://www.scopus.com/inward/record.url?scp=80051867217&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80051867217&partnerID=8YFLogxK

U2 - 10.1109/ICDCS.2011.33

DO - 10.1109/ICDCS.2011.33

M3 - Conference contribution

SN - 9780769543642

T3 - Proceedings - International Conference on Distributed Computing Systems

SP - 1

EP - 12

BT - Proceedings - 31st International Conference on Distributed Computing Systems, ICDCS 2011

ER -

Prabhakar R, Vazhkudai SS, Kim Y, Butt AR, Li M, Kandemir M. Provisioning a multi-tiered data staging area for extreme-scale machines. In Proceedings - 31st International Conference on Distributed Computing Systems, ICDCS 2011. 2011. p. 1-12. 5961683. (Proceedings - International Conference on Distributed Computing Systems). https://doi.org/10.1109/ICDCS.2011.33