Statistical and Computational Methods for Systematically Mining the SNP and Gene

Project: Research project


DESCRIPTION (provided by applicant): This application is submitted in response to PAR-04-159 ``Small Grants Program for Cancer Epidemiology'' `Analyzing existing data that otherwise may have gone unexplored, such as pooled analysis of data from multiple studies coordinated into consortia.' A quick review of research articles in current literature, there are only tens of paper related to cancer study with integrated SNP and gene expression data. The lack of algorithms and user-friendly software to mine the existing data may be partly blamed. The proposed study will provide useful tools to mine the genetic data from different sources. This pilot project focuses on developing efficient algorithms for clustering, molecular network construction, and biomarker discovery with integrated SNP and gene expression data. With the efficient data mining methods we develop, cancer researchers may get much more useful information from the otherwise unexplored data. Therefore it has broad implications in the analysis of all high priority areas in cancer epidemiology research identified by Progress Review Groups, such as multiple myeloma and cancers of the breast, colon/rectum, prostate, lung, pancreas, and brain, and linking genetic polymorphisms with other variable related to cancer risk. Upon complete the proposed research, the methods/algorithms developed can potentially be applied to other mixed data sources such as methylation, gene expression, and others. We hope our researches have the impact of encouraging more people to contribute to this challenging problem.
Effective start/end date5/1/074/30/09


  • National Institutes of Health: $74,250.00
  • National Institutes of Health: $74,250.00


Computational methods
Statistical methods
Gene expression
Data mining