Information on Ancestry from Genetic Markers

Carrie Lynn Pfaff, Jill Barnholtz-Sloan, Jennifer Kristin Wagner, Jeffrey C. Long

Research output: Contribution to journalArticle

57 Citations (Scopus)

Abstract

It is possible to estimate the proportionate contributions of ancestral populations to admixed individuals or populations using genetic markers, but different loci and alleles vary considerably in the amount of information that they provide. Conventionally, the allele frequency difference between parental populations (δ) has been used as the criterion to select informative markers. However, it is unclear how to use δ for multiallelic loci, or populations formed by the mixture of more than two groups. Moreover, several other factors, including the actual ancestral proportions and the relative genetic diversities of the parental populations, affect the information provided by genetic markers. We demonstrate here that using δ as the sole criterion for marker selection is inadequate, and we propose, instead, to use Fisher's information, which is the inverse of the variance of the estimated ancestral contributions. This measure is superior because it is directly related to the precision of ancestry estimates. Although δ is related to Fisher's information, the relationship is neither linear nor simple, and the information can vary widely for markers with identical δs. Fortunately, Fisher's information is easily computed and formally extends to the situation of multiple alleles and/or parental populations. We examined the distribution of information for SNP and microsatellite loci available in the public domain for a variety of model admixed populations. The information, on average, is higher for microsatellite loci, but exceptional SNPs exceed the best microsatellites. Despite the large number of genetic markers that have been identified for admixture analysis, it appears that information for estimating admixture proportions is limited, and estimates will typically have wide confidence intervals.

Original languageEnglish (US)
Pages (from-to)305-315
Number of pages11
JournalGenetic Epidemiology
Volume26
Issue number4
DOIs
StatePublished - May 1 2004

Fingerprint

Genetic Markers
Microsatellite Repeats
Population
Single Nucleotide Polymorphism
Alleles
Information Dissemination
Public Sector
Population Genetics
Gene Frequency
Patient Selection
Confidence Intervals

All Science Journal Classification (ASJC) codes

  • Epidemiology
  • Genetics(clinical)

Cite this

Pfaff, C. L., Barnholtz-Sloan, J., Wagner, J. K., & Long, J. C. (2004). Information on Ancestry from Genetic Markers. Genetic Epidemiology, 26(4), 305-315. https://doi.org/10.1002/gepi.10319
Pfaff, Carrie Lynn ; Barnholtz-Sloan, Jill ; Wagner, Jennifer Kristin ; Long, Jeffrey C. / Information on Ancestry from Genetic Markers. In: Genetic Epidemiology. 2004 ; Vol. 26, No. 4. pp. 305-315.
@article{f01d423f0f9a46838422f5dbebd58186,
title = "Information on Ancestry from Genetic Markers",
abstract = "It is possible to estimate the proportionate contributions of ancestral populations to admixed individuals or populations using genetic markers, but different loci and alleles vary considerably in the amount of information that they provide. Conventionally, the allele frequency difference between parental populations (δ) has been used as the criterion to select informative markers. However, it is unclear how to use δ for multiallelic loci, or populations formed by the mixture of more than two groups. Moreover, several other factors, including the actual ancestral proportions and the relative genetic diversities of the parental populations, affect the information provided by genetic markers. We demonstrate here that using δ as the sole criterion for marker selection is inadequate, and we propose, instead, to use Fisher's information, which is the inverse of the variance of the estimated ancestral contributions. This measure is superior because it is directly related to the precision of ancestry estimates. Although δ is related to Fisher's information, the relationship is neither linear nor simple, and the information can vary widely for markers with identical δs. Fortunately, Fisher's information is easily computed and formally extends to the situation of multiple alleles and/or parental populations. We examined the distribution of information for SNP and microsatellite loci available in the public domain for a variety of model admixed populations. The information, on average, is higher for microsatellite loci, but exceptional SNPs exceed the best microsatellites. Despite the large number of genetic markers that have been identified for admixture analysis, it appears that information for estimating admixture proportions is limited, and estimates will typically have wide confidence intervals.",
author = "Pfaff, {Carrie Lynn} and Jill Barnholtz-Sloan and Wagner, {Jennifer Kristin} and Long, {Jeffrey C.}",
year = "2004",
month = "5",
day = "1",
doi = "10.1002/gepi.10319",
language = "English (US)",
volume = "26",
pages = "305--315",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "4",

}

Pfaff, CL, Barnholtz-Sloan, J, Wagner, JK & Long, JC 2004, 'Information on Ancestry from Genetic Markers', Genetic Epidemiology, vol. 26, no. 4, pp. 305-315. https://doi.org/10.1002/gepi.10319

Information on Ancestry from Genetic Markers. / Pfaff, Carrie Lynn; Barnholtz-Sloan, Jill; Wagner, Jennifer Kristin; Long, Jeffrey C.

In: Genetic Epidemiology, Vol. 26, No. 4, 01.05.2004, p. 305-315.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Information on Ancestry from Genetic Markers

AU - Pfaff, Carrie Lynn

AU - Barnholtz-Sloan, Jill

AU - Wagner, Jennifer Kristin

AU - Long, Jeffrey C.

PY - 2004/5/1

Y1 - 2004/5/1

N2 - It is possible to estimate the proportionate contributions of ancestral populations to admixed individuals or populations using genetic markers, but different loci and alleles vary considerably in the amount of information that they provide. Conventionally, the allele frequency difference between parental populations (δ) has been used as the criterion to select informative markers. However, it is unclear how to use δ for multiallelic loci, or populations formed by the mixture of more than two groups. Moreover, several other factors, including the actual ancestral proportions and the relative genetic diversities of the parental populations, affect the information provided by genetic markers. We demonstrate here that using δ as the sole criterion for marker selection is inadequate, and we propose, instead, to use Fisher's information, which is the inverse of the variance of the estimated ancestral contributions. This measure is superior because it is directly related to the precision of ancestry estimates. Although δ is related to Fisher's information, the relationship is neither linear nor simple, and the information can vary widely for markers with identical δs. Fortunately, Fisher's information is easily computed and formally extends to the situation of multiple alleles and/or parental populations. We examined the distribution of information for SNP and microsatellite loci available in the public domain for a variety of model admixed populations. The information, on average, is higher for microsatellite loci, but exceptional SNPs exceed the best microsatellites. Despite the large number of genetic markers that have been identified for admixture analysis, it appears that information for estimating admixture proportions is limited, and estimates will typically have wide confidence intervals.

AB - It is possible to estimate the proportionate contributions of ancestral populations to admixed individuals or populations using genetic markers, but different loci and alleles vary considerably in the amount of information that they provide. Conventionally, the allele frequency difference between parental populations (δ) has been used as the criterion to select informative markers. However, it is unclear how to use δ for multiallelic loci, or populations formed by the mixture of more than two groups. Moreover, several other factors, including the actual ancestral proportions and the relative genetic diversities of the parental populations, affect the information provided by genetic markers. We demonstrate here that using δ as the sole criterion for marker selection is inadequate, and we propose, instead, to use Fisher's information, which is the inverse of the variance of the estimated ancestral contributions. This measure is superior because it is directly related to the precision of ancestry estimates. Although δ is related to Fisher's information, the relationship is neither linear nor simple, and the information can vary widely for markers with identical δs. Fortunately, Fisher's information is easily computed and formally extends to the situation of multiple alleles and/or parental populations. We examined the distribution of information for SNP and microsatellite loci available in the public domain for a variety of model admixed populations. The information, on average, is higher for microsatellite loci, but exceptional SNPs exceed the best microsatellites. Despite the large number of genetic markers that have been identified for admixture analysis, it appears that information for estimating admixture proportions is limited, and estimates will typically have wide confidence intervals.

UR - http://www.scopus.com/inward/record.url?scp=2342488000&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2342488000&partnerID=8YFLogxK

U2 - 10.1002/gepi.10319

DO - 10.1002/gepi.10319

M3 - Article

C2 - 15095390

AN - SCOPUS:2342488000

VL - 26

SP - 305

EP - 315

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 4

ER -

Pfaff CL, Barnholtz-Sloan J, Wagner JK, Long JC. Information on Ancestry from Genetic Markers. Genetic Epidemiology. 2004 May 1;26(4):305-315. https://doi.org/10.1002/gepi.10319