Computational and statistical analyses of insertional polymorphic endogenous retroviruses in a non-model organism

Le Bao, Daniel Elleder, Raunaq Malhotra, Michael DeGiorgio, Theodora Maravegias, Lindsay Horvath, Laura Carrel, Colin Gillin, Tomáš Hron, Helena Fábryová, David R. Hunter, Mary Poss

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Endogenous retroviruses (ERVs) are a class of transposable elements found in all vertebrate genomes that contribute substantially to genomic functional and structural diversity. A host species acquires an ERV when an exogenous retrovirus infects a germ cell of an individual and becomes part of the genome inherited by viable progeny. ERVs that colonized ancestral lineages are fixed in contemporary species. However, in some extant species, ERV colonization is ongoing, which results in variation in ERV frequency in the population. To study the consequences of ERV colonization of a host genome, methods are needed to assign each ERV to a location in a species' genome and determine which individuals have acquired each ERV by descent. Because well annotated reference genomes are not widely available for all species, de novo clustering approaches provide an alternative to reference mapping that are insensitive to differences between query and reference and that are amenable to mobile element studies in both model and non-model organisms. However, there is substantial uncertainty in both identifying ERV genomic position and assigning each unique ERV integration site to individuals in a population. We present an analysis suitable for detecting ERV integration sites in species without the need for a reference genome. Our approach is based on improved de novo clustering methods and statistical models that take the uncertainty of assignment into account and yield a probability matrix of shared ERV integration sites among individuals. We demonstrate that polymorphic integrations of a recently identified endogenous retrovirus in deer reflect contemporary relationships among individuals and populations.

Original languageEnglish (US)
Pages (from-to)221-245
Number of pages25
JournalComputation
Volume2
Issue number4
DOIs
StatePublished - Jan 1 2014

Fingerprint

Genome
Genes
Functional Genomics
Uncertainty
Descent
Clustering Methods
Statistical Model
Genomics
Assign
Assignment
Cells
Clustering
Query
Alternatives
Cell
Demonstrate

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)
  • Modeling and Simulation
  • Applied Mathematics

Cite this

Bao, Le ; Elleder, Daniel ; Malhotra, Raunaq ; DeGiorgio, Michael ; Maravegias, Theodora ; Horvath, Lindsay ; Carrel, Laura ; Gillin, Colin ; Hron, Tomáš ; Fábryová, Helena ; Hunter, David R. ; Poss, Mary. / Computational and statistical analyses of insertional polymorphic endogenous retroviruses in a non-model organism. In: Computation. 2014 ; Vol. 2, No. 4. pp. 221-245.
@article{b818ac3010c1481ab6baa7d28a9e9854,
title = "Computational and statistical analyses of insertional polymorphic endogenous retroviruses in a non-model organism",
abstract = "Endogenous retroviruses (ERVs) are a class of transposable elements found in all vertebrate genomes that contribute substantially to genomic functional and structural diversity. A host species acquires an ERV when an exogenous retrovirus infects a germ cell of an individual and becomes part of the genome inherited by viable progeny. ERVs that colonized ancestral lineages are fixed in contemporary species. However, in some extant species, ERV colonization is ongoing, which results in variation in ERV frequency in the population. To study the consequences of ERV colonization of a host genome, methods are needed to assign each ERV to a location in a species' genome and determine which individuals have acquired each ERV by descent. Because well annotated reference genomes are not widely available for all species, de novo clustering approaches provide an alternative to reference mapping that are insensitive to differences between query and reference and that are amenable to mobile element studies in both model and non-model organisms. However, there is substantial uncertainty in both identifying ERV genomic position and assigning each unique ERV integration site to individuals in a population. We present an analysis suitable for detecting ERV integration sites in species without the need for a reference genome. Our approach is based on improved de novo clustering methods and statistical models that take the uncertainty of assignment into account and yield a probability matrix of shared ERV integration sites among individuals. We demonstrate that polymorphic integrations of a recently identified endogenous retrovirus in deer reflect contemporary relationships among individuals and populations.",
author = "Le Bao and Daniel Elleder and Raunaq Malhotra and Michael DeGiorgio and Theodora Maravegias and Lindsay Horvath and Laura Carrel and Colin Gillin and Tom{\'a}š Hron and Helena F{\'a}bryov{\'a} and Hunter, {David R.} and Mary Poss",
year = "2014",
month = "1",
day = "1",
doi = "10.3390/computation2040221",
language = "English (US)",
volume = "2",
pages = "221--245",
journal = "Computation",
issn = "2079-3197",
publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",
number = "4",

}

Bao, L, Elleder, D, Malhotra, R, DeGiorgio, M, Maravegias, T, Horvath, L, Carrel, L, Gillin, C, Hron, T, Fábryová, H, Hunter, DR & Poss, M 2014, 'Computational and statistical analyses of insertional polymorphic endogenous retroviruses in a non-model organism', Computation, vol. 2, no. 4, pp. 221-245. https://doi.org/10.3390/computation2040221

Computational and statistical analyses of insertional polymorphic endogenous retroviruses in a non-model organism. / Bao, Le; Elleder, Daniel; Malhotra, Raunaq; DeGiorgio, Michael; Maravegias, Theodora; Horvath, Lindsay; Carrel, Laura; Gillin, Colin; Hron, Tomáš; Fábryová, Helena; Hunter, David R.; Poss, Mary.

In: Computation, Vol. 2, No. 4, 01.01.2014, p. 221-245.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Computational and statistical analyses of insertional polymorphic endogenous retroviruses in a non-model organism

AU - Bao, Le

AU - Elleder, Daniel

AU - Malhotra, Raunaq

AU - DeGiorgio, Michael

AU - Maravegias, Theodora

AU - Horvath, Lindsay

AU - Carrel, Laura

AU - Gillin, Colin

AU - Hron, Tomáš

AU - Fábryová, Helena

AU - Hunter, David R.

AU - Poss, Mary

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Endogenous retroviruses (ERVs) are a class of transposable elements found in all vertebrate genomes that contribute substantially to genomic functional and structural diversity. A host species acquires an ERV when an exogenous retrovirus infects a germ cell of an individual and becomes part of the genome inherited by viable progeny. ERVs that colonized ancestral lineages are fixed in contemporary species. However, in some extant species, ERV colonization is ongoing, which results in variation in ERV frequency in the population. To study the consequences of ERV colonization of a host genome, methods are needed to assign each ERV to a location in a species' genome and determine which individuals have acquired each ERV by descent. Because well annotated reference genomes are not widely available for all species, de novo clustering approaches provide an alternative to reference mapping that are insensitive to differences between query and reference and that are amenable to mobile element studies in both model and non-model organisms. However, there is substantial uncertainty in both identifying ERV genomic position and assigning each unique ERV integration site to individuals in a population. We present an analysis suitable for detecting ERV integration sites in species without the need for a reference genome. Our approach is based on improved de novo clustering methods and statistical models that take the uncertainty of assignment into account and yield a probability matrix of shared ERV integration sites among individuals. We demonstrate that polymorphic integrations of a recently identified endogenous retrovirus in deer reflect contemporary relationships among individuals and populations.

AB - Endogenous retroviruses (ERVs) are a class of transposable elements found in all vertebrate genomes that contribute substantially to genomic functional and structural diversity. A host species acquires an ERV when an exogenous retrovirus infects a germ cell of an individual and becomes part of the genome inherited by viable progeny. ERVs that colonized ancestral lineages are fixed in contemporary species. However, in some extant species, ERV colonization is ongoing, which results in variation in ERV frequency in the population. To study the consequences of ERV colonization of a host genome, methods are needed to assign each ERV to a location in a species' genome and determine which individuals have acquired each ERV by descent. Because well annotated reference genomes are not widely available for all species, de novo clustering approaches provide an alternative to reference mapping that are insensitive to differences between query and reference and that are amenable to mobile element studies in both model and non-model organisms. However, there is substantial uncertainty in both identifying ERV genomic position and assigning each unique ERV integration site to individuals in a population. We present an analysis suitable for detecting ERV integration sites in species without the need for a reference genome. Our approach is based on improved de novo clustering methods and statistical models that take the uncertainty of assignment into account and yield a probability matrix of shared ERV integration sites among individuals. We demonstrate that polymorphic integrations of a recently identified endogenous retrovirus in deer reflect contemporary relationships among individuals and populations.

UR - http://www.scopus.com/inward/record.url?scp=84937830621&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937830621&partnerID=8YFLogxK

U2 - 10.3390/computation2040221

DO - 10.3390/computation2040221

M3 - Article

AN - SCOPUS:84937830621

VL - 2

SP - 221

EP - 245

JO - Computation

JF - Computation

SN - 2079-3197

IS - 4

ER -