On demand phenotype ranking through subspace clustering

Xiang Zhang, Wei Wang, Jun Huan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

High throughput biotechnologies have enabled scientists to collect a large number of genetic and phenotypic attributes for a large collection of samples. Computational methods are in need to analyze these data for discovering genotype-phenotype associations and inferring possible phenotypes from genotypic attributes. In this paper, we study the problem of on demand phenotype ranking. Given a query sample, for which only its genetic information is available, we want to predict the possible phenotypes it may have, ranked in descending order of their likelihood. This problem is challenging since genotype-phenotype databases are updated often and explicitly mine and maintain all patterns is impractical. We propose an on-demand ranking algorithm that uses a modified pattern-based subspace clustering algorithm to effectively identify the subspaces where these relevant clusters may reside. Using this algorithm, we can compute the clusters and their prediction significance for any phenotypes on the fly. Our experiments demonstrate the efficiency and effectiveness of our algorithm.

Original languageEnglish (US)
Title of host publicationProceedings of the 7th SIAM International Conference on Data Mining
Pages623-628
Number of pages6
Publication statusPublished - Dec 1 2007
Event7th SIAM International Conference on Data Mining - Minneapolis, MN, United States
Duration: Apr 26 2007Apr 28 2007

Publication series

NameProceedings of the 7th SIAM International Conference on Data Mining

Other

Other7th SIAM International Conference on Data Mining
CountryUnited States
CityMinneapolis, MN
Period4/26/074/28/07

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Zhang, X., Wang, W., & Huan, J. (2007). On demand phenotype ranking through subspace clustering. In Proceedings of the 7th SIAM International Conference on Data Mining (pp. 623-628). (Proceedings of the 7th SIAM International Conference on Data Mining).