Using homolog groups to create a whole-genomic tree of free-living organisms

An update

Christopher H. House, Sorel T. Fitz-Gibbon

Research output: Contribution to journalArticle

73 Citations (Scopus)

Abstract

Genomic trees have been constructed based on the presence and absence of families of protein-encoding genes observed in 27 complete genomes, including genomes of 15 free-living organisms. This method does not rely on the identification of suspected orthologs in each genome, nor the specific alignment used to compare gene sequences because the protein-encoding gene families are formed by grouping any protein with a pairwise similarity score greater than a preset value. Because of this all inclusive grouping, this method is resilient to some effects of lateral gene transfer because transfers of genes are masked when the recipient genome already has a homolog (not necessarily an ortholog) of the incoming gene. Of 71 genes suspected to have been laterally transferred to the genome of Aeropyrum pernix, only approximately 7 to 15 represent genes where a lateral gene transfer appears to have generated homoplasy in our character dataset. The genomic tree of the 15 free-living taxa includes six different bacterial orders, six different archaeal orders, and two different eukaryotic kingdoms. The results are remarkably similar to results obtained by analysis of rRNA. Inclusion of the other 12 genomes resulted in a tree only broadly similar to that suggested by rRNA with at least some of the differences due to artifacts caused by the small genome size of many of these species. Very small genomes, such as those of the two Mycoplasma genomes included, fall to the base of the Bacterial domain, a result expected due to the substantial gene loss inherent to these lineages. Finally, artificial "partial genomes" were generated by randomly selecting ORFs from the complete genomes in order to test our ability to recover the tree generated by the whole genome sequences when only partial data are available. The results indicated that partial genomic data, when sampled randomly, could robustly recover the tree generated by the whole genome sequences.

Original languageEnglish (US)
Pages (from-to)539-547
Number of pages9
JournalJournal of Molecular Evolution
Volume54
Issue number4
DOIs
StatePublished - Apr 13 2002

Fingerprint

genomics
genome
Genome
organisms
gene
Genes
Horizontal Gene Transfer
genes
gene transfer
organism
Aeropyrum
protein
ribosomal RNA
Genome Size
Proteins
Mycoplasma
proteins
Artifacts
Open Reading Frames
artifact

All Science Journal Classification (ASJC) codes

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics

Cite this

@article{325ac9452d654c9fb2e819b12e7b39c2,
title = "Using homolog groups to create a whole-genomic tree of free-living organisms: An update",
abstract = "Genomic trees have been constructed based on the presence and absence of families of protein-encoding genes observed in 27 complete genomes, including genomes of 15 free-living organisms. This method does not rely on the identification of suspected orthologs in each genome, nor the specific alignment used to compare gene sequences because the protein-encoding gene families are formed by grouping any protein with a pairwise similarity score greater than a preset value. Because of this all inclusive grouping, this method is resilient to some effects of lateral gene transfer because transfers of genes are masked when the recipient genome already has a homolog (not necessarily an ortholog) of the incoming gene. Of 71 genes suspected to have been laterally transferred to the genome of Aeropyrum pernix, only approximately 7 to 15 represent genes where a lateral gene transfer appears to have generated homoplasy in our character dataset. The genomic tree of the 15 free-living taxa includes six different bacterial orders, six different archaeal orders, and two different eukaryotic kingdoms. The results are remarkably similar to results obtained by analysis of rRNA. Inclusion of the other 12 genomes resulted in a tree only broadly similar to that suggested by rRNA with at least some of the differences due to artifacts caused by the small genome size of many of these species. Very small genomes, such as those of the two Mycoplasma genomes included, fall to the base of the Bacterial domain, a result expected due to the substantial gene loss inherent to these lineages. Finally, artificial {"}partial genomes{"} were generated by randomly selecting ORFs from the complete genomes in order to test our ability to recover the tree generated by the whole genome sequences when only partial data are available. The results indicated that partial genomic data, when sampled randomly, could robustly recover the tree generated by the whole genome sequences.",
author = "House, {Christopher H.} and Fitz-Gibbon, {Sorel T.}",
year = "2002",
month = "4",
day = "13",
doi = "10.1007/s00239-001-0054-5",
language = "English (US)",
volume = "54",
pages = "539--547",
journal = "Journal of Molecular Evolution",
issn = "0022-2844",
publisher = "Springer New York",
number = "4",

}

Using homolog groups to create a whole-genomic tree of free-living organisms : An update. / House, Christopher H.; Fitz-Gibbon, Sorel T.

In: Journal of Molecular Evolution, Vol. 54, No. 4, 13.04.2002, p. 539-547.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Using homolog groups to create a whole-genomic tree of free-living organisms

T2 - An update

AU - House, Christopher H.

AU - Fitz-Gibbon, Sorel T.

PY - 2002/4/13

Y1 - 2002/4/13

N2 - Genomic trees have been constructed based on the presence and absence of families of protein-encoding genes observed in 27 complete genomes, including genomes of 15 free-living organisms. This method does not rely on the identification of suspected orthologs in each genome, nor the specific alignment used to compare gene sequences because the protein-encoding gene families are formed by grouping any protein with a pairwise similarity score greater than a preset value. Because of this all inclusive grouping, this method is resilient to some effects of lateral gene transfer because transfers of genes are masked when the recipient genome already has a homolog (not necessarily an ortholog) of the incoming gene. Of 71 genes suspected to have been laterally transferred to the genome of Aeropyrum pernix, only approximately 7 to 15 represent genes where a lateral gene transfer appears to have generated homoplasy in our character dataset. The genomic tree of the 15 free-living taxa includes six different bacterial orders, six different archaeal orders, and two different eukaryotic kingdoms. The results are remarkably similar to results obtained by analysis of rRNA. Inclusion of the other 12 genomes resulted in a tree only broadly similar to that suggested by rRNA with at least some of the differences due to artifacts caused by the small genome size of many of these species. Very small genomes, such as those of the two Mycoplasma genomes included, fall to the base of the Bacterial domain, a result expected due to the substantial gene loss inherent to these lineages. Finally, artificial "partial genomes" were generated by randomly selecting ORFs from the complete genomes in order to test our ability to recover the tree generated by the whole genome sequences when only partial data are available. The results indicated that partial genomic data, when sampled randomly, could robustly recover the tree generated by the whole genome sequences.

AB - Genomic trees have been constructed based on the presence and absence of families of protein-encoding genes observed in 27 complete genomes, including genomes of 15 free-living organisms. This method does not rely on the identification of suspected orthologs in each genome, nor the specific alignment used to compare gene sequences because the protein-encoding gene families are formed by grouping any protein with a pairwise similarity score greater than a preset value. Because of this all inclusive grouping, this method is resilient to some effects of lateral gene transfer because transfers of genes are masked when the recipient genome already has a homolog (not necessarily an ortholog) of the incoming gene. Of 71 genes suspected to have been laterally transferred to the genome of Aeropyrum pernix, only approximately 7 to 15 represent genes where a lateral gene transfer appears to have generated homoplasy in our character dataset. The genomic tree of the 15 free-living taxa includes six different bacterial orders, six different archaeal orders, and two different eukaryotic kingdoms. The results are remarkably similar to results obtained by analysis of rRNA. Inclusion of the other 12 genomes resulted in a tree only broadly similar to that suggested by rRNA with at least some of the differences due to artifacts caused by the small genome size of many of these species. Very small genomes, such as those of the two Mycoplasma genomes included, fall to the base of the Bacterial domain, a result expected due to the substantial gene loss inherent to these lineages. Finally, artificial "partial genomes" were generated by randomly selecting ORFs from the complete genomes in order to test our ability to recover the tree generated by the whole genome sequences when only partial data are available. The results indicated that partial genomic data, when sampled randomly, could robustly recover the tree generated by the whole genome sequences.

UR - http://www.scopus.com/inward/record.url?scp=0036218557&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036218557&partnerID=8YFLogxK

U2 - 10.1007/s00239-001-0054-5

DO - 10.1007/s00239-001-0054-5

M3 - Article

VL - 54

SP - 539

EP - 547

JO - Journal of Molecular Evolution

JF - Journal of Molecular Evolution

SN - 0022-2844

IS - 4

ER -