TY - JOUR
T1 - Computational pan-genomics
T2 - Status, promises and challenges
AU - The Computational Pan-Genomics Consortium
AU - Marschall, Tobias
AU - Marz, Manja
AU - Abeel, Thomas
AU - Dijkstra, Louis
AU - Dutilh, Bas E.
AU - Ghaffaari, Ali
AU - Kersey, Paul
AU - Kloosterman, Wigard P.
AU - Mäkinen, Veli
AU - Novak, Adam M.
AU - Paten, Benedict
AU - Porubsky, David
AU - Rivals, Eric
AU - Alkan, Can
AU - Baaijens, Jasmijn A.
AU - De Bakker, Paul I.W.
AU - Boeva, Valentina
AU - Bonnal, Raoul J.P.
AU - Chiaromonte, Francesca
AU - Chikhi, Rayan
AU - Ciccarelli, Francesca D.
AU - Cijvat, Robin
AU - Datema, Erwin
AU - Van Duijn, Cornelia M.
AU - Eichler, Evan E.
AU - Ernst, Corinna
AU - Eskin, Eleazar
AU - Garrison, Erik
AU - El-Kebir, Mohammed
AU - Klau, Gunnar W.
AU - Korbel, Jan O.
AU - Lameijer, Eric Wubbo
AU - Langmead, Benjamin
AU - Martin, Marcel
AU - Medvedev, Paul
AU - Mu, John C.
AU - Neerincx, Pieter
AU - Ouwens, Klaasjan
AU - Peterlongo, Pierre
AU - Pisanti, Nadia
AU - Rahmann, Sven
AU - Raphael, Ben
AU - Reinert, Knut
AU - de Ridder, Dick
AU - de Ridder, Jeroen
AU - Schlesner, Matthias
AU - Schulz-Trieglaff, Ole
AU - Sanders, Ashley D.
AU - Sheikhizadeh, Siavash
AU - Shneider, Carl
N1 - Funding Information:
The Netherlands Organization for Scientific Research (NWO) Vidi (639.072.309 to A.S., 864.14.004 to B.E.D.); CAPES/BRASIL (to B.E.D.); the Academy of Finland (284598 [CoECGR] to V.M. and D.V.); the Russian Scientific Foundation (14–11–00826 to L.D.); Institut de Biologie Computationnelle (ANR-11-BINF-0002 to E.R.); and the French Colib’read project (ANR–12– BS02–0008 to E.R.). NSFC 31671372 (to K. Y.); the Dutch Graduate School for Experimental Plant Sciences (054EPS15 to S.S.); the EMGO Institute for Health and Care Research (EMGO+) to K.O.; the National Human Genome Research Institute (1U54HG007990 [BD2K] to B.P. and A.M.N., 5U41HG007234 [GENCODE] to B.P.); the W. M. Keck Foundation (DT06172015 to B.P. and A.M.N.); the Simons Foundation (SFLIFE# 351901 to B.P. and A.M.N.); the ARCS Foundation (2014–15 ARCS fellowship to A.M.N.); Edward Schulak (Edward Schulak Fellowship in Genomics to A.M.N.)
Funding Information:
We are deeply grateful to the Lorentz Center for hosting the workshop ‘Future Perspectives in Computational Pan-Genomics’ (8–12 June 2015), which gave rise to this article. In particular, we like to thank the Lorentz Center staff, who turned organizing and attending the workshop into a great pleasure. The workshop received additional financial support by KNAW, Bina Technologies, ERIBA, PacBio and Genalice. E.E.E. is an investigator of the Howard Hughes Medical Institute.
Publisher Copyright:
© The Author 2016.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different Computational methods and paradigms are needed.We will witness the rapid extension of Computational pan-genomics, a new sub-area of research in Computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a Computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common Computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to Computational pangenomics can help address many of the problems currently faced in various domains.
AB - Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different Computational methods and paradigms are needed.We will witness the rapid extension of Computational pan-genomics, a new sub-area of research in Computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a Computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common Computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to Computational pangenomics can help address many of the problems currently faced in various domains.
UR - http://www.scopus.com/inward/record.url?scp=85041170263&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041170263&partnerID=8YFLogxK
U2 - 10.1093/bib/bbw089
DO - 10.1093/bib/bbw089
M3 - Article
C2 - 27769991
AN - SCOPUS:85041170263
SN - 1467-5463
VL - 19
SP - 118
EP - 135
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 1
M1 - bbw089
ER -