Phyrn: A robust method for phylogenetic analysis of highly divergent sequences

Gaurav Bhardwaj, Kyung Dae Ko, Yoojin Hong, Zhenhai Zhang, Ngai Lam Ho, Sree V. Chintapalli, Lindsay A. Kline, Matthew Gotlin, David Nicholas Hartranft, Morgen E. Patterson, Foram Dave, Evan J. Smith, Edward C. Holmes, Randen L. Patterson, Damian B. van Rossum

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Both multiple sequence alignment and phylogenetic analysis are problematic in the "twilight zone" of sequence similarity (≤25% amino acid identity). Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA) methods (MAFFT, T-COFFEE, CLUSTAL, and MUSCLE) and six commonly used programs of tree estimation (Distance-based: Neighbor-Joining; Character-based: PhyML, RAxML, GARLI, Maximum Parsimony, and Bayesian) against a novel MSA-independent method (PHYRN) described here. Strikingly, at "midnight zone" genetic distances (~7% pairwise identity and 4.0 gaps per position), PHYRN returns high-resolution phylogenies that outperform traditional approaches. We reason this is due to PHRYN's capability to amplify informative positions, even at the most extreme levels of sequence divergence. We also assess the applicability of the PHYRN algorithm for inferring deep evolutionary relationships in the divergent DANGER protein superfamily, for which PHYRN infers a more robust tree compared to MSA-based approaches. Taken together, these results demonstrate that PHYRN represents a powerful mechanism for mapping uncharted frontiers in highly divergent protein sequence data sets.

Original languageEnglish (US)
Article numbere34261
JournalPloS one
Volume7
Issue number4
DOIs
StatePublished - Apr 13 2012

Fingerprint

Sequence Alignment
sequence alignment
phylogeny
Phylogeny
methodology
Joining
genetic distance
Proteins
amino acid sequences
Amino Acids
amino acids
proteins
Datasets

All Science Journal Classification (ASJC) codes

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Bhardwaj, G., Ko, K. D., Hong, Y., Zhang, Z., Ho, N. L., Chintapalli, S. V., ... van Rossum, D. B. (2012). Phyrn: A robust method for phylogenetic analysis of highly divergent sequences. PloS one, 7(4), [e34261]. https://doi.org/10.1371/journal.pone.0034261
Bhardwaj, Gaurav ; Ko, Kyung Dae ; Hong, Yoojin ; Zhang, Zhenhai ; Ho, Ngai Lam ; Chintapalli, Sree V. ; Kline, Lindsay A. ; Gotlin, Matthew ; Hartranft, David Nicholas ; Patterson, Morgen E. ; Dave, Foram ; Smith, Evan J. ; Holmes, Edward C. ; Patterson, Randen L. ; van Rossum, Damian B. / Phyrn : A robust method for phylogenetic analysis of highly divergent sequences. In: PloS one. 2012 ; Vol. 7, No. 4.
@article{7e400d73f1ec437ea9a1bddf37667b62,
title = "Phyrn: A robust method for phylogenetic analysis of highly divergent sequences",
abstract = "Both multiple sequence alignment and phylogenetic analysis are problematic in the {"}twilight zone{"} of sequence similarity (≤25{\%} amino acid identity). Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA) methods (MAFFT, T-COFFEE, CLUSTAL, and MUSCLE) and six commonly used programs of tree estimation (Distance-based: Neighbor-Joining; Character-based: PhyML, RAxML, GARLI, Maximum Parsimony, and Bayesian) against a novel MSA-independent method (PHYRN) described here. Strikingly, at {"}midnight zone{"} genetic distances (~7{\%} pairwise identity and 4.0 gaps per position), PHYRN returns high-resolution phylogenies that outperform traditional approaches. We reason this is due to PHRYN's capability to amplify informative positions, even at the most extreme levels of sequence divergence. We also assess the applicability of the PHYRN algorithm for inferring deep evolutionary relationships in the divergent DANGER protein superfamily, for which PHYRN infers a more robust tree compared to MSA-based approaches. Taken together, these results demonstrate that PHYRN represents a powerful mechanism for mapping uncharted frontiers in highly divergent protein sequence data sets.",
author = "Gaurav Bhardwaj and Ko, {Kyung Dae} and Yoojin Hong and Zhenhai Zhang and Ho, {Ngai Lam} and Chintapalli, {Sree V.} and Kline, {Lindsay A.} and Matthew Gotlin and Hartranft, {David Nicholas} and Patterson, {Morgen E.} and Foram Dave and Smith, {Evan J.} and Holmes, {Edward C.} and Patterson, {Randen L.} and {van Rossum}, {Damian B.}",
year = "2012",
month = "4",
day = "13",
doi = "10.1371/journal.pone.0034261",
language = "English (US)",
volume = "7",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "4",

}

Bhardwaj, G, Ko, KD, Hong, Y, Zhang, Z, Ho, NL, Chintapalli, SV, Kline, LA, Gotlin, M, Hartranft, DN, Patterson, ME, Dave, F, Smith, EJ, Holmes, EC, Patterson, RL & van Rossum, DB 2012, 'Phyrn: A robust method for phylogenetic analysis of highly divergent sequences', PloS one, vol. 7, no. 4, e34261. https://doi.org/10.1371/journal.pone.0034261

Phyrn : A robust method for phylogenetic analysis of highly divergent sequences. / Bhardwaj, Gaurav; Ko, Kyung Dae; Hong, Yoojin; Zhang, Zhenhai; Ho, Ngai Lam; Chintapalli, Sree V.; Kline, Lindsay A.; Gotlin, Matthew; Hartranft, David Nicholas; Patterson, Morgen E.; Dave, Foram; Smith, Evan J.; Holmes, Edward C.; Patterson, Randen L.; van Rossum, Damian B.

In: PloS one, Vol. 7, No. 4, e34261, 13.04.2012.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Phyrn

T2 - A robust method for phylogenetic analysis of highly divergent sequences

AU - Bhardwaj, Gaurav

AU - Ko, Kyung Dae

AU - Hong, Yoojin

AU - Zhang, Zhenhai

AU - Ho, Ngai Lam

AU - Chintapalli, Sree V.

AU - Kline, Lindsay A.

AU - Gotlin, Matthew

AU - Hartranft, David Nicholas

AU - Patterson, Morgen E.

AU - Dave, Foram

AU - Smith, Evan J.

AU - Holmes, Edward C.

AU - Patterson, Randen L.

AU - van Rossum, Damian B.

PY - 2012/4/13

Y1 - 2012/4/13

N2 - Both multiple sequence alignment and phylogenetic analysis are problematic in the "twilight zone" of sequence similarity (≤25% amino acid identity). Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA) methods (MAFFT, T-COFFEE, CLUSTAL, and MUSCLE) and six commonly used programs of tree estimation (Distance-based: Neighbor-Joining; Character-based: PhyML, RAxML, GARLI, Maximum Parsimony, and Bayesian) against a novel MSA-independent method (PHYRN) described here. Strikingly, at "midnight zone" genetic distances (~7% pairwise identity and 4.0 gaps per position), PHYRN returns high-resolution phylogenies that outperform traditional approaches. We reason this is due to PHRYN's capability to amplify informative positions, even at the most extreme levels of sequence divergence. We also assess the applicability of the PHYRN algorithm for inferring deep evolutionary relationships in the divergent DANGER protein superfamily, for which PHYRN infers a more robust tree compared to MSA-based approaches. Taken together, these results demonstrate that PHYRN represents a powerful mechanism for mapping uncharted frontiers in highly divergent protein sequence data sets.

AB - Both multiple sequence alignment and phylogenetic analysis are problematic in the "twilight zone" of sequence similarity (≤25% amino acid identity). Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA) methods (MAFFT, T-COFFEE, CLUSTAL, and MUSCLE) and six commonly used programs of tree estimation (Distance-based: Neighbor-Joining; Character-based: PhyML, RAxML, GARLI, Maximum Parsimony, and Bayesian) against a novel MSA-independent method (PHYRN) described here. Strikingly, at "midnight zone" genetic distances (~7% pairwise identity and 4.0 gaps per position), PHYRN returns high-resolution phylogenies that outperform traditional approaches. We reason this is due to PHRYN's capability to amplify informative positions, even at the most extreme levels of sequence divergence. We also assess the applicability of the PHYRN algorithm for inferring deep evolutionary relationships in the divergent DANGER protein superfamily, for which PHYRN infers a more robust tree compared to MSA-based approaches. Taken together, these results demonstrate that PHYRN represents a powerful mechanism for mapping uncharted frontiers in highly divergent protein sequence data sets.

UR - http://www.scopus.com/inward/record.url?scp=84859708669&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84859708669&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0034261

DO - 10.1371/journal.pone.0034261

M3 - Article

C2 - 22514627

AN - SCOPUS:84859708669

VL - 7

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 4

M1 - e34261

ER -

Bhardwaj G, Ko KD, Hong Y, Zhang Z, Ho NL, Chintapalli SV et al. Phyrn: A robust method for phylogenetic analysis of highly divergent sequences. PloS one. 2012 Apr 13;7(4). e34261. https://doi.org/10.1371/journal.pone.0034261