Genes that are more closely spaced on the chromosome than expected by chance are said to be spatially clustered. Standard tests of clustering versus uniformity do not take into account two important features of genes-the high variability of gene length and the low probability that gene locations overlap (exclusion). We show by simulation that the standard null distributions which ignore length and exclusion do not appropriately approximate the true null distributions of standard tests such as the chi-squared test. We therefore recommend bootstrap sampling to estimate the null distributions. Simulations demonstrate that the chi-squared goodness-of-fit test is a more powerful test of clustering than two other commonly used tests-Kolmogorov and Cramer-von Mises-when the distribution of gene lengths and locations is modeled by a mixture of exponentials and there is a single cluster. The chi-squared test requires binning the gene locations-the number of genes in the bin can be compared to the expected maximum number under random distribution to determine the location of gene clusters and gene deserts. The bootstrap method to test clustering is illustrated using data from human chromosome 22.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Computational Mathematics
- Computational Theory and Mathematics
- Applied Mathematics