Clustering of gene locations

Eli Walters, Naomi S. Altman, Laura Elnitski

Research output: Contribution to journalArticlepeer-review


Genes that are more closely spaced on the chromosome than expected by chance are said to be spatially clustered. Standard tests of clustering versus uniformity do not take into account two important features of genes-the high variability of gene length and the low probability that gene locations overlap (exclusion). We show by simulation that the standard null distributions which ignore length and exclusion do not appropriately approximate the true null distributions of standard tests such as the chi-squared test. We therefore recommend bootstrap sampling to estimate the null distributions. Simulations demonstrate that the chi-squared goodness-of-fit test is a more powerful test of clustering than two other commonly used tests-Kolmogorov and Cramer-von Mises-when the distribution of gene lengths and locations is modeled by a mixture of exponentials and there is a single cluster. The chi-squared test requires binning the gene locations-the number of genes in the bin can be compared to the expected maximum number under random distribution to determine the location of gene clusters and gene deserts. The bootstrap method to test clustering is illustrated using data from human chromosome 22.

Original languageEnglish (US)
Pages (from-to)2920-2932
Number of pages13
JournalComputational Statistics and Data Analysis
Issue number10
StatePublished - Jun 20 2006

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Computational Mathematics
  • Computational Theory and Mathematics
  • Applied Mathematics


Dive into the research topics of 'Clustering of gene locations'. Together they form a unique fingerprint.

Cite this