Modern phylogenomics: Building phylogenetic trees using the multispecies coalescent model

Liang Liu, Christian Anderson, Dennis Keith Pearl, Scott V. Edwards

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

The multispecies coalescent (MSC) model provides a compelling framework for building phylogenetic trees from multilocus DNA sequence data. The pure MSC is best thought of as a special case of so-called “multispecies network coalescent” models, in which gene flow is allowed among branches of the tree, whereas MSC methods assume there is no gene flow between diverging species. Early implementations of the MSC, such as “parsimony” or “democratic vote” approaches to combining information from multiple gene trees, as well as concatenation, in which DNA sequences from multiple gene trees are combined into a single “supergene,” were quickly shown to be inconsistent in some regions of tree space, in so far as they converged on the incorrect species tree as more gene trees and sequence data were accumulated. The anomaly zone, a region of tree space in which the most frequent gene tree is different from the species tree, is one such region where many so-called “coalescent” methods are inconsistent. Second-generation implementations of the MSC employed Bayesian or likelihood models; these are consistent in all regions of gene tree space, but Bayesian methods in particular are incapable of handling the large phylogenomic data sets currently available. Two-step methods, such as MP-EST and ASTRAL, in which gene trees are first estimated and then combined to estimate an overarching species tree, are currently popular in part because they can handle large phylogenomic data sets. These methods are consistent in the anomaly zone but can sometimes provide inappropriate measures of tree support or apportion error and signal in the data inappropriately. MP-EST in particular employs a likelihood model which can be conveniently manipulated to perform statistical tests of competing species trees, incorporating the likelihood of the collected gene trees on each species tree in a likelihood ratio test. Such tests provide a useful alternative to the multilocus bootstrap, which only indirectly tests the appropriateness of competing species trees. We illustrate these tests and implementations of the MSC with examples and suggest that MSC methods are a useful class of models effectively using information from multiple loci to build phylogenetic trees.

Original languageEnglish (US)
Title of host publicationMethods in Molecular Biology
PublisherHumana Press Inc.
Pages211-239
Number of pages29
DOIs
StatePublished - Jan 1 2019

Publication series

NameMethods in Molecular Biology
Volume1910
ISSN (Print)1064-3745
ISSN (Electronic)1940-6029

Fingerprint

Genes
Gene Flow
Expressed Sequence Tags
Bayes Theorem
Datasets

All Science Journal Classification (ASJC) codes

  • Molecular Biology
  • Genetics

Cite this

Liu, L., Anderson, C., Pearl, D. K., & Edwards, S. V. (2019). Modern phylogenomics: Building phylogenetic trees using the multispecies coalescent model. In Methods in Molecular Biology (pp. 211-239). (Methods in Molecular Biology; Vol. 1910). Humana Press Inc.. https://doi.org/10.1007/978-1-4939-9074-0_7
Liu, Liang ; Anderson, Christian ; Pearl, Dennis Keith ; Edwards, Scott V. / Modern phylogenomics : Building phylogenetic trees using the multispecies coalescent model. Methods in Molecular Biology. Humana Press Inc., 2019. pp. 211-239 (Methods in Molecular Biology).
@inbook{32368aba26d74886ab6b0d7ad6c84d46,
title = "Modern phylogenomics: Building phylogenetic trees using the multispecies coalescent model",
abstract = "The multispecies coalescent (MSC) model provides a compelling framework for building phylogenetic trees from multilocus DNA sequence data. The pure MSC is best thought of as a special case of so-called “multispecies network coalescent” models, in which gene flow is allowed among branches of the tree, whereas MSC methods assume there is no gene flow between diverging species. Early implementations of the MSC, such as “parsimony” or “democratic vote” approaches to combining information from multiple gene trees, as well as concatenation, in which DNA sequences from multiple gene trees are combined into a single “supergene,” were quickly shown to be inconsistent in some regions of tree space, in so far as they converged on the incorrect species tree as more gene trees and sequence data were accumulated. The anomaly zone, a region of tree space in which the most frequent gene tree is different from the species tree, is one such region where many so-called “coalescent” methods are inconsistent. Second-generation implementations of the MSC employed Bayesian or likelihood models; these are consistent in all regions of gene tree space, but Bayesian methods in particular are incapable of handling the large phylogenomic data sets currently available. Two-step methods, such as MP-EST and ASTRAL, in which gene trees are first estimated and then combined to estimate an overarching species tree, are currently popular in part because they can handle large phylogenomic data sets. These methods are consistent in the anomaly zone but can sometimes provide inappropriate measures of tree support or apportion error and signal in the data inappropriately. MP-EST in particular employs a likelihood model which can be conveniently manipulated to perform statistical tests of competing species trees, incorporating the likelihood of the collected gene trees on each species tree in a likelihood ratio test. Such tests provide a useful alternative to the multilocus bootstrap, which only indirectly tests the appropriateness of competing species trees. We illustrate these tests and implementations of the MSC with examples and suggest that MSC methods are a useful class of models effectively using information from multiple loci to build phylogenetic trees.",
author = "Liang Liu and Christian Anderson and Pearl, {Dennis Keith} and Edwards, {Scott V.}",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-1-4939-9074-0_7",
language = "English (US)",
series = "Methods in Molecular Biology",
publisher = "Humana Press Inc.",
pages = "211--239",
booktitle = "Methods in Molecular Biology",

}

Liu, L, Anderson, C, Pearl, DK & Edwards, SV 2019, Modern phylogenomics: Building phylogenetic trees using the multispecies coalescent model. in Methods in Molecular Biology. Methods in Molecular Biology, vol. 1910, Humana Press Inc., pp. 211-239. https://doi.org/10.1007/978-1-4939-9074-0_7

Modern phylogenomics : Building phylogenetic trees using the multispecies coalescent model. / Liu, Liang; Anderson, Christian; Pearl, Dennis Keith; Edwards, Scott V.

Methods in Molecular Biology. Humana Press Inc., 2019. p. 211-239 (Methods in Molecular Biology; Vol. 1910).

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Modern phylogenomics

T2 - Building phylogenetic trees using the multispecies coalescent model

AU - Liu, Liang

AU - Anderson, Christian

AU - Pearl, Dennis Keith

AU - Edwards, Scott V.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - The multispecies coalescent (MSC) model provides a compelling framework for building phylogenetic trees from multilocus DNA sequence data. The pure MSC is best thought of as a special case of so-called “multispecies network coalescent” models, in which gene flow is allowed among branches of the tree, whereas MSC methods assume there is no gene flow between diverging species. Early implementations of the MSC, such as “parsimony” or “democratic vote” approaches to combining information from multiple gene trees, as well as concatenation, in which DNA sequences from multiple gene trees are combined into a single “supergene,” were quickly shown to be inconsistent in some regions of tree space, in so far as they converged on the incorrect species tree as more gene trees and sequence data were accumulated. The anomaly zone, a region of tree space in which the most frequent gene tree is different from the species tree, is one such region where many so-called “coalescent” methods are inconsistent. Second-generation implementations of the MSC employed Bayesian or likelihood models; these are consistent in all regions of gene tree space, but Bayesian methods in particular are incapable of handling the large phylogenomic data sets currently available. Two-step methods, such as MP-EST and ASTRAL, in which gene trees are first estimated and then combined to estimate an overarching species tree, are currently popular in part because they can handle large phylogenomic data sets. These methods are consistent in the anomaly zone but can sometimes provide inappropriate measures of tree support or apportion error and signal in the data inappropriately. MP-EST in particular employs a likelihood model which can be conveniently manipulated to perform statistical tests of competing species trees, incorporating the likelihood of the collected gene trees on each species tree in a likelihood ratio test. Such tests provide a useful alternative to the multilocus bootstrap, which only indirectly tests the appropriateness of competing species trees. We illustrate these tests and implementations of the MSC with examples and suggest that MSC methods are a useful class of models effectively using information from multiple loci to build phylogenetic trees.

AB - The multispecies coalescent (MSC) model provides a compelling framework for building phylogenetic trees from multilocus DNA sequence data. The pure MSC is best thought of as a special case of so-called “multispecies network coalescent” models, in which gene flow is allowed among branches of the tree, whereas MSC methods assume there is no gene flow between diverging species. Early implementations of the MSC, such as “parsimony” or “democratic vote” approaches to combining information from multiple gene trees, as well as concatenation, in which DNA sequences from multiple gene trees are combined into a single “supergene,” were quickly shown to be inconsistent in some regions of tree space, in so far as they converged on the incorrect species tree as more gene trees and sequence data were accumulated. The anomaly zone, a region of tree space in which the most frequent gene tree is different from the species tree, is one such region where many so-called “coalescent” methods are inconsistent. Second-generation implementations of the MSC employed Bayesian or likelihood models; these are consistent in all regions of gene tree space, but Bayesian methods in particular are incapable of handling the large phylogenomic data sets currently available. Two-step methods, such as MP-EST and ASTRAL, in which gene trees are first estimated and then combined to estimate an overarching species tree, are currently popular in part because they can handle large phylogenomic data sets. These methods are consistent in the anomaly zone but can sometimes provide inappropriate measures of tree support or apportion error and signal in the data inappropriately. MP-EST in particular employs a likelihood model which can be conveniently manipulated to perform statistical tests of competing species trees, incorporating the likelihood of the collected gene trees on each species tree in a likelihood ratio test. Such tests provide a useful alternative to the multilocus bootstrap, which only indirectly tests the appropriateness of competing species trees. We illustrate these tests and implementations of the MSC with examples and suggest that MSC methods are a useful class of models effectively using information from multiple loci to build phylogenetic trees.

UR - http://www.scopus.com/inward/record.url?scp=85068871668&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068871668&partnerID=8YFLogxK

U2 - 10.1007/978-1-4939-9074-0_7

DO - 10.1007/978-1-4939-9074-0_7

M3 - Chapter

C2 - 31278666

AN - SCOPUS:85068871668

T3 - Methods in Molecular Biology

SP - 211

EP - 239

BT - Methods in Molecular Biology

PB - Humana Press Inc.

ER -

Liu L, Anderson C, Pearl DK, Edwards SV. Modern phylogenomics: Building phylogenetic trees using the multispecies coalescent model. In Methods in Molecular Biology. Humana Press Inc. 2019. p. 211-239. (Methods in Molecular Biology). https://doi.org/10.1007/978-1-4939-9074-0_7