Galaxy CloudMan: Delivering cloud compute clusters

Enis Afgan, Dannon Baker, Nate Coraor, Brad Chapman, Anton Nekrutenko, James Taylor

Research output: Contribution to journalArticle

122 Citations (Scopus)

Abstract

Background: Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is " cloud computing" , which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate " as is" use by experimental biologists.Results: We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs.Conclusions: The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.

Original languageEnglish (US)
Article numberS4
JournalBMC bioinformatics
Volume11
Issue numberSUPPL. 12
DOIs
StatePublished - Dec 21 2010

Fingerprint

Galaxies
Cloud computing
Web Browser
Research Personnel
Infrastructure
Informatics
Web browsers
Reproducibility of Results
Linux
Cloud Computing
Software
Throughput
Research
Resources
Reproducibility
Resource Management
Sequencing
High Throughput
Genomics
Trivial

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Afgan, E., Baker, D., Coraor, N., Chapman, B., Nekrutenko, A., & Taylor, J. (2010). Galaxy CloudMan: Delivering cloud compute clusters. BMC bioinformatics, 11(SUPPL. 12), [S4]. https://doi.org/10.1186/1471-2105-11-S12-S4
Afgan, Enis ; Baker, Dannon ; Coraor, Nate ; Chapman, Brad ; Nekrutenko, Anton ; Taylor, James. / Galaxy CloudMan : Delivering cloud compute clusters. In: BMC bioinformatics. 2010 ; Vol. 11, No. SUPPL. 12.
@article{38852c6a12b14fa3afb01533fa37f051,
title = "Galaxy CloudMan: Delivering cloud compute clusters",
abstract = "Background: Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is {"} cloud computing{"} , which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate {"} as is{"} use by experimental biologists.Results: We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs.Conclusions: The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.",
author = "Enis Afgan and Dannon Baker and Nate Coraor and Brad Chapman and Anton Nekrutenko and James Taylor",
year = "2010",
month = "12",
day = "21",
doi = "10.1186/1471-2105-11-S12-S4",
language = "English (US)",
volume = "11",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "SUPPL. 12",

}

Afgan, E, Baker, D, Coraor, N, Chapman, B, Nekrutenko, A & Taylor, J 2010, 'Galaxy CloudMan: Delivering cloud compute clusters', BMC bioinformatics, vol. 11, no. SUPPL. 12, S4. https://doi.org/10.1186/1471-2105-11-S12-S4

Galaxy CloudMan : Delivering cloud compute clusters. / Afgan, Enis; Baker, Dannon; Coraor, Nate; Chapman, Brad; Nekrutenko, Anton; Taylor, James.

In: BMC bioinformatics, Vol. 11, No. SUPPL. 12, S4, 21.12.2010.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Galaxy CloudMan

T2 - Delivering cloud compute clusters

AU - Afgan, Enis

AU - Baker, Dannon

AU - Coraor, Nate

AU - Chapman, Brad

AU - Nekrutenko, Anton

AU - Taylor, James

PY - 2010/12/21

Y1 - 2010/12/21

N2 - Background: Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is " cloud computing" , which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate " as is" use by experimental biologists.Results: We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs.Conclusions: The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.

AB - Background: Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is " cloud computing" , which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate " as is" use by experimental biologists.Results: We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs.Conclusions: The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.

UR - http://www.scopus.com/inward/record.url?scp=78650841579&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650841579&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-11-S12-S4

DO - 10.1186/1471-2105-11-S12-S4

M3 - Article

C2 - 21210983

AN - SCOPUS:78650841579

VL - 11

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL. 12

M1 - S4

ER -

Afgan E, Baker D, Coraor N, Chapman B, Nekrutenko A, Taylor J. Galaxy CloudMan: Delivering cloud compute clusters. BMC bioinformatics. 2010 Dec 21;11(SUPPL. 12). S4. https://doi.org/10.1186/1471-2105-11-S12-S4