While there is considerable appeal to the idea of selecting a few SNPs to represent all, or much, of the DNA sequence variability in a local chromosomal region, it is also important to quantify what detail is lost in adopting such an approach. To address this issue, we compared high- and low-resolution depictions of sequence diversity for the same genomic region, the APOA1/C3/A4/A5 gene cluster on chromosome 11. First, extensive re-sequencing identified all nucleotide and sequence haplotype variation of the linked apolipoprotein genes in 72 individuals from three populations: African-Americans from Jackson, Miss., Europeans from North Karelia, Finland, and European-Americans from Rochester, Minn.. We identified 124 SNPs in 17.7 kb and significant differences in variation among genes. APOC3 gene diversity was particularly distinctive at high resolution, showing large allele frequency differences (FST values >0.250) between Jackson and the other two samples, and divergent population-specific haplotype lineages. Next, we selected haplotype-tagging SNPs (htSNPs) for each gene, at a density of approximately one SNP per kb, using an algorithm suggested by Stram et al. (2003). The 17 htSNPs identified were then used to reconstruct low-resolution haplotypes, from which inferences about the structure of variation were also drawn. This comparison showed that while the htSNPs successfully tagged common haplotype variation, they also left much underlying sequence diversity undetected and failed, in some cases, to co-classify groups of closely related haplotypes. The implications of these findings for other haplotype-based descriptions of human variation are discussed.
All Science Journal Classification (ASJC) codes