Computational pan-genomics: Status, promises and challenges

The Computational Pan-Genomics Consortium

Research output: Contribution to journalArticle

70 Citations (Scopus)

Abstract

Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different Computational methods and paradigms are needed.We will witness the rapid extension of Computational pan-genomics, a new sub-area of research in Computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a Computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common Computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to Computational pangenomics can help address many of the problems currently faced in various domains.

Original languageEnglish (US)
Article numberbbw089
Pages (from-to)118-135
Number of pages18
JournalBriefings in bioinformatics
Volume19
Issue number1
DOIs
StatePublished - Jan 1 2018

Fingerprint

Genomics
Genes
Computational Biology
Genome
Bioinformatics
Plant Pathology
Microbiology
Oncology
Medical Genetics
Computational methods
Computer science
Pipelines
Joints
Technology
Research

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Molecular Biology

Cite this

The Computational Pan-Genomics Consortium (2018). Computational pan-genomics: Status, promises and challenges. Briefings in bioinformatics, 19(1), 118-135. [bbw089]. https://doi.org/10.1093/bib/bbw089
The Computational Pan-Genomics Consortium. / Computational pan-genomics : Status, promises and challenges. In: Briefings in bioinformatics. 2018 ; Vol. 19, No. 1. pp. 118-135.
@article{fedd7e172dd24a2d871f199d165d0977,
title = "Computational pan-genomics: Status, promises and challenges",
abstract = "Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different Computational methods and paradigms are needed.We will witness the rapid extension of Computational pan-genomics, a new sub-area of research in Computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a Computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common Computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to Computational pangenomics can help address many of the problems currently faced in various domains.",
author = "{The Computational Pan-Genomics Consortium} and Tobias Marschall and Manja Marz and Thomas Abeel and Louis Dijkstra and Dutilh, {Bas E.} and Ali Ghaffaari and Paul Kersey and Kloosterman, {Wigard P.} and Veli M{\"a}kinen and Novak, {Adam M.} and Benedict Paten and David Porubsky and Eric Rivals and Can Alkan and Baaijens, {Jasmijn A.} and {De Bakker}, {Paul I.W.} and Valentina Boeva and Bonnal, {Raoul J.P.} and Francesca Chiaromonte and Rayan Chikhi and Ciccarelli, {Francesca D.} and Robin Cijvat and Erwin Datema and {Van Duijn}, {Cornelia M.} and Eichler, {Evan E.} and Corinna Ernst and Eleazar Eskin and Erik Garrison and Mohammed El-Kebir and Klau, {Gunnar W.} and Korbel, {Jan O.} and Lameijer, {Eric Wubbo} and Benjamin Langmead and Marcel Martin and Paul Medvedev and Mu, {John C.} and Pieter Neerincx and Klaasjan Ouwens and Pierre Peterlongo and Nadia Pisanti and Sven Rahmann and Ben Raphael and Knut Reinert and {de Ridder}, Dick and {de Ridder}, Jeroen and Matthias Schlesner and Ole Schulz-Trieglaff and Sanders, {Ashley D.} and Siavash Sheikhizadeh and Carl Shneider",
year = "2018",
month = "1",
day = "1",
doi = "10.1093/bib/bbw089",
language = "English (US)",
volume = "19",
pages = "118--135",
journal = "Briefings in Bioinformatics",
issn = "1467-5463",
publisher = "Oxford University Press",
number = "1",

}

The Computational Pan-Genomics Consortium 2018, 'Computational pan-genomics: Status, promises and challenges', Briefings in bioinformatics, vol. 19, no. 1, bbw089, pp. 118-135. https://doi.org/10.1093/bib/bbw089

Computational pan-genomics : Status, promises and challenges. / The Computational Pan-Genomics Consortium.

In: Briefings in bioinformatics, Vol. 19, No. 1, bbw089, 01.01.2018, p. 118-135.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Computational pan-genomics

T2 - Status, promises and challenges

AU - The Computational Pan-Genomics Consortium

AU - Marschall, Tobias

AU - Marz, Manja

AU - Abeel, Thomas

AU - Dijkstra, Louis

AU - Dutilh, Bas E.

AU - Ghaffaari, Ali

AU - Kersey, Paul

AU - Kloosterman, Wigard P.

AU - Mäkinen, Veli

AU - Novak, Adam M.

AU - Paten, Benedict

AU - Porubsky, David

AU - Rivals, Eric

AU - Alkan, Can

AU - Baaijens, Jasmijn A.

AU - De Bakker, Paul I.W.

AU - Boeva, Valentina

AU - Bonnal, Raoul J.P.

AU - Chiaromonte, Francesca

AU - Chikhi, Rayan

AU - Ciccarelli, Francesca D.

AU - Cijvat, Robin

AU - Datema, Erwin

AU - Van Duijn, Cornelia M.

AU - Eichler, Evan E.

AU - Ernst, Corinna

AU - Eskin, Eleazar

AU - Garrison, Erik

AU - El-Kebir, Mohammed

AU - Klau, Gunnar W.

AU - Korbel, Jan O.

AU - Lameijer, Eric Wubbo

AU - Langmead, Benjamin

AU - Martin, Marcel

AU - Medvedev, Paul

AU - Mu, John C.

AU - Neerincx, Pieter

AU - Ouwens, Klaasjan

AU - Peterlongo, Pierre

AU - Pisanti, Nadia

AU - Rahmann, Sven

AU - Raphael, Ben

AU - Reinert, Knut

AU - de Ridder, Dick

AU - de Ridder, Jeroen

AU - Schlesner, Matthias

AU - Schulz-Trieglaff, Ole

AU - Sanders, Ashley D.

AU - Sheikhizadeh, Siavash

AU - Shneider, Carl

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different Computational methods and paradigms are needed.We will witness the rapid extension of Computational pan-genomics, a new sub-area of research in Computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a Computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common Computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to Computational pangenomics can help address many of the problems currently faced in various domains.

AB - Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different Computational methods and paradigms are needed.We will witness the rapid extension of Computational pan-genomics, a new sub-area of research in Computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a Computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common Computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to Computational pangenomics can help address many of the problems currently faced in various domains.

UR - http://www.scopus.com/inward/record.url?scp=85041170263&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85041170263&partnerID=8YFLogxK

U2 - 10.1093/bib/bbw089

DO - 10.1093/bib/bbw089

M3 - Article

C2 - 27769991

AN - SCOPUS:85041170263

VL - 19

SP - 118

EP - 135

JO - Briefings in Bioinformatics

JF - Briefings in Bioinformatics

SN - 1467-5463

IS - 1

M1 - bbw089

ER -

The Computational Pan-Genomics Consortium. Computational pan-genomics: Status, promises and challenges. Briefings in bioinformatics. 2018 Jan 1;19(1):118-135. bbw089. https://doi.org/10.1093/bib/bbw089