A framework for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly

Daniel James Blankenberg, James Taylor, Ian Schenck, Jianbin He, Yi Zhang, Matthew Ghent, Narayanan Veeraraghavan, Istvan Albert, Webb Miller, Kateryna Dmytrivna Makova, Ross Cameron Hardison, Anton Nekrutenko

Research output: Contribution to journalArticle

104 Citations (Scopus)

Abstract

The standardization and sharing of data and tools are the biggest challenges of large collaborative projects such as the Encyclopedia of DNA Elements (ENCODE). Here we describe a compact Web application, Galaxy2 ENCODE, that effectively addresses these issues. It provides an intuitive interface for the deposition and access of data, and features a vast number of analysis tools including operations on genomic intervals, utilities for manipulation of multiple sequence alignments, and molecular evolution algorithms. By providing a direct link between data and analysis tools, Galaxy2ENCODE allows addressing biological questions that are beyond the reach of existing software. We use Galaxy2ENCODE to show that the ENCODE regions contain >2000 unannotated transcripts under strong purifying selection that are likely functional. We also show that the ENCODE regions are representative of the entire genome by estimating the rate of nucleotide substitution and comparing it to published data. Although each of these analyses is complex, none takes more than 15 min from beginning to end. Finally, we demonstrate how new tools can be added to Galaxy2ENCODE with almost no effort. Every section of the manuscript is supplemented with QuickTime screencasts. Galaxy2ENCODE and the screencasts can be accessed at http://g2.bx.psu.edu.

Original languageEnglish (US)
Pages (from-to)960-964
Number of pages5
JournalGenome research
Volume17
Issue number6
DOIs
StatePublished - Jun 1 2007

Fingerprint

Encyclopedias
DNA
Molecular Evolution
Manuscripts
Information Dissemination
Sequence Alignment
Software
Nucleotides
Genome

All Science Journal Classification (ASJC) codes

  • Genetics
  • Genetics(clinical)

Cite this

Blankenberg, Daniel James ; Taylor, James ; Schenck, Ian ; He, Jianbin ; Zhang, Yi ; Ghent, Matthew ; Veeraraghavan, Narayanan ; Albert, Istvan ; Miller, Webb ; Makova, Kateryna Dmytrivna ; Hardison, Ross Cameron ; Nekrutenko, Anton. / A framework for collaborative analysis of ENCODE data : Making large-scale analyses biologist-friendly. In: Genome research. 2007 ; Vol. 17, No. 6. pp. 960-964.
@article{8a4fd250b8d54d20a4c2b03961858243,
title = "A framework for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly",
abstract = "The standardization and sharing of data and tools are the biggest challenges of large collaborative projects such as the Encyclopedia of DNA Elements (ENCODE). Here we describe a compact Web application, Galaxy2 ENCODE, that effectively addresses these issues. It provides an intuitive interface for the deposition and access of data, and features a vast number of analysis tools including operations on genomic intervals, utilities for manipulation of multiple sequence alignments, and molecular evolution algorithms. By providing a direct link between data and analysis tools, Galaxy2ENCODE allows addressing biological questions that are beyond the reach of existing software. We use Galaxy2ENCODE to show that the ENCODE regions contain >2000 unannotated transcripts under strong purifying selection that are likely functional. We also show that the ENCODE regions are representative of the entire genome by estimating the rate of nucleotide substitution and comparing it to published data. Although each of these analyses is complex, none takes more than 15 min from beginning to end. Finally, we demonstrate how new tools can be added to Galaxy2ENCODE with almost no effort. Every section of the manuscript is supplemented with QuickTime screencasts. Galaxy2ENCODE and the screencasts can be accessed at http://g2.bx.psu.edu.",
author = "Blankenberg, {Daniel James} and James Taylor and Ian Schenck and Jianbin He and Yi Zhang and Matthew Ghent and Narayanan Veeraraghavan and Istvan Albert and Webb Miller and Makova, {Kateryna Dmytrivna} and Hardison, {Ross Cameron} and Anton Nekrutenko",
year = "2007",
month = "6",
day = "1",
doi = "10.1101/gr.5578007",
language = "English (US)",
volume = "17",
pages = "960--964",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "6",

}

Blankenberg, DJ, Taylor, J, Schenck, I, He, J, Zhang, Y, Ghent, M, Veeraraghavan, N, Albert, I, Miller, W, Makova, KD, Hardison, RC & Nekrutenko, A 2007, 'A framework for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly', Genome research, vol. 17, no. 6, pp. 960-964. https://doi.org/10.1101/gr.5578007

A framework for collaborative analysis of ENCODE data : Making large-scale analyses biologist-friendly. / Blankenberg, Daniel James; Taylor, James; Schenck, Ian; He, Jianbin; Zhang, Yi; Ghent, Matthew; Veeraraghavan, Narayanan; Albert, Istvan; Miller, Webb; Makova, Kateryna Dmytrivna; Hardison, Ross Cameron; Nekrutenko, Anton.

In: Genome research, Vol. 17, No. 6, 01.06.2007, p. 960-964.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A framework for collaborative analysis of ENCODE data

T2 - Making large-scale analyses biologist-friendly

AU - Blankenberg, Daniel James

AU - Taylor, James

AU - Schenck, Ian

AU - He, Jianbin

AU - Zhang, Yi

AU - Ghent, Matthew

AU - Veeraraghavan, Narayanan

AU - Albert, Istvan

AU - Miller, Webb

AU - Makova, Kateryna Dmytrivna

AU - Hardison, Ross Cameron

AU - Nekrutenko, Anton

PY - 2007/6/1

Y1 - 2007/6/1

N2 - The standardization and sharing of data and tools are the biggest challenges of large collaborative projects such as the Encyclopedia of DNA Elements (ENCODE). Here we describe a compact Web application, Galaxy2 ENCODE, that effectively addresses these issues. It provides an intuitive interface for the deposition and access of data, and features a vast number of analysis tools including operations on genomic intervals, utilities for manipulation of multiple sequence alignments, and molecular evolution algorithms. By providing a direct link between data and analysis tools, Galaxy2ENCODE allows addressing biological questions that are beyond the reach of existing software. We use Galaxy2ENCODE to show that the ENCODE regions contain >2000 unannotated transcripts under strong purifying selection that are likely functional. We also show that the ENCODE regions are representative of the entire genome by estimating the rate of nucleotide substitution and comparing it to published data. Although each of these analyses is complex, none takes more than 15 min from beginning to end. Finally, we demonstrate how new tools can be added to Galaxy2ENCODE with almost no effort. Every section of the manuscript is supplemented with QuickTime screencasts. Galaxy2ENCODE and the screencasts can be accessed at http://g2.bx.psu.edu.

AB - The standardization and sharing of data and tools are the biggest challenges of large collaborative projects such as the Encyclopedia of DNA Elements (ENCODE). Here we describe a compact Web application, Galaxy2 ENCODE, that effectively addresses these issues. It provides an intuitive interface for the deposition and access of data, and features a vast number of analysis tools including operations on genomic intervals, utilities for manipulation of multiple sequence alignments, and molecular evolution algorithms. By providing a direct link between data and analysis tools, Galaxy2ENCODE allows addressing biological questions that are beyond the reach of existing software. We use Galaxy2ENCODE to show that the ENCODE regions contain >2000 unannotated transcripts under strong purifying selection that are likely functional. We also show that the ENCODE regions are representative of the entire genome by estimating the rate of nucleotide substitution and comparing it to published data. Although each of these analyses is complex, none takes more than 15 min from beginning to end. Finally, we demonstrate how new tools can be added to Galaxy2ENCODE with almost no effort. Every section of the manuscript is supplemented with QuickTime screencasts. Galaxy2ENCODE and the screencasts can be accessed at http://g2.bx.psu.edu.

UR - http://www.scopus.com/inward/record.url?scp=34250376884&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34250376884&partnerID=8YFLogxK

U2 - 10.1101/gr.5578007

DO - 10.1101/gr.5578007

M3 - Article

C2 - 17568012

AN - SCOPUS:34250376884

VL - 17

SP - 960

EP - 964

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 6

ER -