TY - JOUR
T1 - Information on Ancestry from Genetic Markers
AU - Pfaff, Carrie Lynn
AU - Barnholtz-Sloan, Jill
AU - Wagner, Jennifer K.
AU - Long, Jeffrey C.
PY - 2004/5
Y1 - 2004/5
N2 - It is possible to estimate the proportionate contributions of ancestral populations to admixed individuals or populations using genetic markers, but different loci and alleles vary considerably in the amount of information that they provide. Conventionally, the allele frequency difference between parental populations (δ) has been used as the criterion to select informative markers. However, it is unclear how to use δ for multiallelic loci, or populations formed by the mixture of more than two groups. Moreover, several other factors, including the actual ancestral proportions and the relative genetic diversities of the parental populations, affect the information provided by genetic markers. We demonstrate here that using δ as the sole criterion for marker selection is inadequate, and we propose, instead, to use Fisher's information, which is the inverse of the variance of the estimated ancestral contributions. This measure is superior because it is directly related to the precision of ancestry estimates. Although δ is related to Fisher's information, the relationship is neither linear nor simple, and the information can vary widely for markers with identical δs. Fortunately, Fisher's information is easily computed and formally extends to the situation of multiple alleles and/or parental populations. We examined the distribution of information for SNP and microsatellite loci available in the public domain for a variety of model admixed populations. The information, on average, is higher for microsatellite loci, but exceptional SNPs exceed the best microsatellites. Despite the large number of genetic markers that have been identified for admixture analysis, it appears that information for estimating admixture proportions is limited, and estimates will typically have wide confidence intervals.
AB - It is possible to estimate the proportionate contributions of ancestral populations to admixed individuals or populations using genetic markers, but different loci and alleles vary considerably in the amount of information that they provide. Conventionally, the allele frequency difference between parental populations (δ) has been used as the criterion to select informative markers. However, it is unclear how to use δ for multiallelic loci, or populations formed by the mixture of more than two groups. Moreover, several other factors, including the actual ancestral proportions and the relative genetic diversities of the parental populations, affect the information provided by genetic markers. We demonstrate here that using δ as the sole criterion for marker selection is inadequate, and we propose, instead, to use Fisher's information, which is the inverse of the variance of the estimated ancestral contributions. This measure is superior because it is directly related to the precision of ancestry estimates. Although δ is related to Fisher's information, the relationship is neither linear nor simple, and the information can vary widely for markers with identical δs. Fortunately, Fisher's information is easily computed and formally extends to the situation of multiple alleles and/or parental populations. We examined the distribution of information for SNP and microsatellite loci available in the public domain for a variety of model admixed populations. The information, on average, is higher for microsatellite loci, but exceptional SNPs exceed the best microsatellites. Despite the large number of genetic markers that have been identified for admixture analysis, it appears that information for estimating admixture proportions is limited, and estimates will typically have wide confidence intervals.
UR - http://www.scopus.com/inward/record.url?scp=2342488000&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=2342488000&partnerID=8YFLogxK
U2 - 10.1002/gepi.10319
DO - 10.1002/gepi.10319
M3 - Article
C2 - 15095390
AN - SCOPUS:2342488000
VL - 26
SP - 305
EP - 315
JO - Genetic Epidemiology
JF - Genetic Epidemiology
SN - 0741-0395
IS - 4
ER -