Revealing mammalian evolutionary relationships by comparative analysis of gene clusters

Giltae Song, Cathy Riemer, Benjamin Dickins, Hie Lim Kim, Louxin Zhang, Yu Zhang, Chih Hao Hsu, Ross Cameron Hardison, Eric D. Green, Webb Miller

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events. We developed a computational method for automatically mapping both types of orthology on a per-nucleotide basis in gene cluster regions studied by comparative sequencing, and we make this mapping accessible by visualizing the output. All of these steps are incorporated into our newly extended CHAP 2 package. We evaluate our method using both simulated data and real gene clusters (including the well-characterized α-globin and β-globin clusters). We also illustrate use of CHAP 2 by analyzing four more loci: CCL (chemokine ligand), IFN (interferon), CYP2abf (part of cytochrome P450 family 2), and KIR (killer cell immunoglobulin-like receptors). These new methods facilitate and extend our understanding of evolution at these and other loci by adding automated accurate evolutionary inference to the biologist's toolkit. The CHAP 2 package is freely available from http://www.bx.psu.edu/miller-lab.

Original languageEnglish (US)
Pages (from-to)586-601
Number of pages16
JournalGenome biology and evolution
Volume4
Issue number4
DOIs
StatePublished - Sep 24 2012

Fingerprint

Multigene Family
multigene family
Globins
gene
Sequence Analysis
genomics
KIR Receptors
Gene Conversion
loci
gene conversion
common ancestry
interferons
chemokines
methodology
Chemokines
cytochrome P-450
immunoglobulins
Interferons
ligand
biologists

All Science Journal Classification (ASJC) codes

  • Ecology, Evolution, Behavior and Systematics
  • Genetics

Cite this

Song, G., Riemer, C., Dickins, B., Kim, H. L., Zhang, L., Zhang, Y., ... Miller, W. (2012). Revealing mammalian evolutionary relationships by comparative analysis of gene clusters. Genome biology and evolution, 4(4), 586-601. https://doi.org/10.1093/gbe/evs032
Song, Giltae ; Riemer, Cathy ; Dickins, Benjamin ; Kim, Hie Lim ; Zhang, Louxin ; Zhang, Yu ; Hsu, Chih Hao ; Hardison, Ross Cameron ; Green, Eric D. ; Miller, Webb. / Revealing mammalian evolutionary relationships by comparative analysis of gene clusters. In: Genome biology and evolution. 2012 ; Vol. 4, No. 4. pp. 586-601.
@article{6c533af8da1b40d1af048b92a193bb06,
title = "Revealing mammalian evolutionary relationships by comparative analysis of gene clusters",
abstract = "Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events. We developed a computational method for automatically mapping both types of orthology on a per-nucleotide basis in gene cluster regions studied by comparative sequencing, and we make this mapping accessible by visualizing the output. All of these steps are incorporated into our newly extended CHAP 2 package. We evaluate our method using both simulated data and real gene clusters (including the well-characterized α-globin and β-globin clusters). We also illustrate use of CHAP 2 by analyzing four more loci: CCL (chemokine ligand), IFN (interferon), CYP2abf (part of cytochrome P450 family 2), and KIR (killer cell immunoglobulin-like receptors). These new methods facilitate and extend our understanding of evolution at these and other loci by adding automated accurate evolutionary inference to the biologist's toolkit. The CHAP 2 package is freely available from http://www.bx.psu.edu/miller-lab.",
author = "Giltae Song and Cathy Riemer and Benjamin Dickins and Kim, {Hie Lim} and Louxin Zhang and Yu Zhang and Hsu, {Chih Hao} and Hardison, {Ross Cameron} and Green, {Eric D.} and Webb Miller",
year = "2012",
month = "9",
day = "24",
doi = "10.1093/gbe/evs032",
language = "English (US)",
volume = "4",
pages = "586--601",
journal = "Genome Biology and Evolution",
issn = "1759-6653",
publisher = "Oxford University Press",
number = "4",

}

Song, G, Riemer, C, Dickins, B, Kim, HL, Zhang, L, Zhang, Y, Hsu, CH, Hardison, RC, Green, ED & Miller, W 2012, 'Revealing mammalian evolutionary relationships by comparative analysis of gene clusters', Genome biology and evolution, vol. 4, no. 4, pp. 586-601. https://doi.org/10.1093/gbe/evs032

Revealing mammalian evolutionary relationships by comparative analysis of gene clusters. / Song, Giltae; Riemer, Cathy; Dickins, Benjamin; Kim, Hie Lim; Zhang, Louxin; Zhang, Yu; Hsu, Chih Hao; Hardison, Ross Cameron; Green, Eric D.; Miller, Webb.

In: Genome biology and evolution, Vol. 4, No. 4, 24.09.2012, p. 586-601.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Revealing mammalian evolutionary relationships by comparative analysis of gene clusters

AU - Song, Giltae

AU - Riemer, Cathy

AU - Dickins, Benjamin

AU - Kim, Hie Lim

AU - Zhang, Louxin

AU - Zhang, Yu

AU - Hsu, Chih Hao

AU - Hardison, Ross Cameron

AU - Green, Eric D.

AU - Miller, Webb

PY - 2012/9/24

Y1 - 2012/9/24

N2 - Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events. We developed a computational method for automatically mapping both types of orthology on a per-nucleotide basis in gene cluster regions studied by comparative sequencing, and we make this mapping accessible by visualizing the output. All of these steps are incorporated into our newly extended CHAP 2 package. We evaluate our method using both simulated data and real gene clusters (including the well-characterized α-globin and β-globin clusters). We also illustrate use of CHAP 2 by analyzing four more loci: CCL (chemokine ligand), IFN (interferon), CYP2abf (part of cytochrome P450 family 2), and KIR (killer cell immunoglobulin-like receptors). These new methods facilitate and extend our understanding of evolution at these and other loci by adding automated accurate evolutionary inference to the biologist's toolkit. The CHAP 2 package is freely available from http://www.bx.psu.edu/miller-lab.

AB - Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events. We developed a computational method for automatically mapping both types of orthology on a per-nucleotide basis in gene cluster regions studied by comparative sequencing, and we make this mapping accessible by visualizing the output. All of these steps are incorporated into our newly extended CHAP 2 package. We evaluate our method using both simulated data and real gene clusters (including the well-characterized α-globin and β-globin clusters). We also illustrate use of CHAP 2 by analyzing four more loci: CCL (chemokine ligand), IFN (interferon), CYP2abf (part of cytochrome P450 family 2), and KIR (killer cell immunoglobulin-like receptors). These new methods facilitate and extend our understanding of evolution at these and other loci by adding automated accurate evolutionary inference to the biologist's toolkit. The CHAP 2 package is freely available from http://www.bx.psu.edu/miller-lab.

UR - http://www.scopus.com/inward/record.url?scp=84864077221&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864077221&partnerID=8YFLogxK

U2 - 10.1093/gbe/evs032

DO - 10.1093/gbe/evs032

M3 - Article

VL - 4

SP - 586

EP - 601

JO - Genome Biology and Evolution

JF - Genome Biology and Evolution

SN - 1759-6653

IS - 4

ER -