On demand phenotype ranking through subspace clustering

Xiang Zhang, Wei Wang, Jun Huan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

High throughput biotechnologies have enabled scientists to collect a large number of genetic and phenotypic attributes for a large collection of samples. Computational methods are in need to analyze these data for discovering genotype-phenotype associations and inferring possible phenotypes from genotypic attributes. In this paper, we study the problem of on demand phenotype ranking. Given a query sample, for which only its genetic information is available, we want to predict the possible phenotypes it may have, ranked in descending order of their likelihood. This problem is challenging since genotype-phenotype databases are updated often and explicitly mine and maintain all patterns is impractical. We propose an on-demand ranking algorithm that uses a modified pattern-based subspace clustering algorithm to effectively identify the subspaces where these relevant clusters may reside. Using this algorithm, we can compute the clusters and their prediction significance for any phenotypes on the fly. Our experiments demonstrate the efficiency and effectiveness of our algorithm.

Original languageEnglish (US)
Title of host publicationProceedings of the 7th SIAM International Conference on Data Mining
Pages623-628
Number of pages6
StatePublished - Dec 1 2007
Event7th SIAM International Conference on Data Mining - Minneapolis, MN, United States
Duration: Apr 26 2007Apr 28 2007

Publication series

NameProceedings of the 7th SIAM International Conference on Data Mining

Other

Other7th SIAM International Conference on Data Mining
CountryUnited States
CityMinneapolis, MN
Period4/26/074/28/07

Fingerprint

Biotechnology
Computational methods
Clustering algorithms
Throughput
Experiments

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Zhang, X., Wang, W., & Huan, J. (2007). On demand phenotype ranking through subspace clustering. In Proceedings of the 7th SIAM International Conference on Data Mining (pp. 623-628). (Proceedings of the 7th SIAM International Conference on Data Mining).
Zhang, Xiang ; Wang, Wei ; Huan, Jun. / On demand phenotype ranking through subspace clustering. Proceedings of the 7th SIAM International Conference on Data Mining. 2007. pp. 623-628 (Proceedings of the 7th SIAM International Conference on Data Mining).
@inproceedings{5213710a555842368383bdc1bb7d5e14,
title = "On demand phenotype ranking through subspace clustering",
abstract = "High throughput biotechnologies have enabled scientists to collect a large number of genetic and phenotypic attributes for a large collection of samples. Computational methods are in need to analyze these data for discovering genotype-phenotype associations and inferring possible phenotypes from genotypic attributes. In this paper, we study the problem of on demand phenotype ranking. Given a query sample, for which only its genetic information is available, we want to predict the possible phenotypes it may have, ranked in descending order of their likelihood. This problem is challenging since genotype-phenotype databases are updated often and explicitly mine and maintain all patterns is impractical. We propose an on-demand ranking algorithm that uses a modified pattern-based subspace clustering algorithm to effectively identify the subspaces where these relevant clusters may reside. Using this algorithm, we can compute the clusters and their prediction significance for any phenotypes on the fly. Our experiments demonstrate the efficiency and effectiveness of our algorithm.",
author = "Xiang Zhang and Wei Wang and Jun Huan",
year = "2007",
month = "12",
day = "1",
language = "English (US)",
isbn = "9780898716306",
series = "Proceedings of the 7th SIAM International Conference on Data Mining",
pages = "623--628",
booktitle = "Proceedings of the 7th SIAM International Conference on Data Mining",

}

Zhang, X, Wang, W & Huan, J 2007, On demand phenotype ranking through subspace clustering. in Proceedings of the 7th SIAM International Conference on Data Mining. Proceedings of the 7th SIAM International Conference on Data Mining, pp. 623-628, 7th SIAM International Conference on Data Mining, Minneapolis, MN, United States, 4/26/07.

On demand phenotype ranking through subspace clustering. / Zhang, Xiang; Wang, Wei; Huan, Jun.

Proceedings of the 7th SIAM International Conference on Data Mining. 2007. p. 623-628 (Proceedings of the 7th SIAM International Conference on Data Mining).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - On demand phenotype ranking through subspace clustering

AU - Zhang, Xiang

AU - Wang, Wei

AU - Huan, Jun

PY - 2007/12/1

Y1 - 2007/12/1

N2 - High throughput biotechnologies have enabled scientists to collect a large number of genetic and phenotypic attributes for a large collection of samples. Computational methods are in need to analyze these data for discovering genotype-phenotype associations and inferring possible phenotypes from genotypic attributes. In this paper, we study the problem of on demand phenotype ranking. Given a query sample, for which only its genetic information is available, we want to predict the possible phenotypes it may have, ranked in descending order of their likelihood. This problem is challenging since genotype-phenotype databases are updated often and explicitly mine and maintain all patterns is impractical. We propose an on-demand ranking algorithm that uses a modified pattern-based subspace clustering algorithm to effectively identify the subspaces where these relevant clusters may reside. Using this algorithm, we can compute the clusters and their prediction significance for any phenotypes on the fly. Our experiments demonstrate the efficiency and effectiveness of our algorithm.

AB - High throughput biotechnologies have enabled scientists to collect a large number of genetic and phenotypic attributes for a large collection of samples. Computational methods are in need to analyze these data for discovering genotype-phenotype associations and inferring possible phenotypes from genotypic attributes. In this paper, we study the problem of on demand phenotype ranking. Given a query sample, for which only its genetic information is available, we want to predict the possible phenotypes it may have, ranked in descending order of their likelihood. This problem is challenging since genotype-phenotype databases are updated often and explicitly mine and maintain all patterns is impractical. We propose an on-demand ranking algorithm that uses a modified pattern-based subspace clustering algorithm to effectively identify the subspaces where these relevant clusters may reside. Using this algorithm, we can compute the clusters and their prediction significance for any phenotypes on the fly. Our experiments demonstrate the efficiency and effectiveness of our algorithm.

UR - http://www.scopus.com/inward/record.url?scp=70449134507&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70449134507&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9780898716306

T3 - Proceedings of the 7th SIAM International Conference on Data Mining

SP - 623

EP - 628

BT - Proceedings of the 7th SIAM International Conference on Data Mining

ER -

Zhang X, Wang W, Huan J. On demand phenotype ranking through subspace clustering. In Proceedings of the 7th SIAM International Conference on Data Mining. 2007. p. 623-628. (Proceedings of the 7th SIAM International Conference on Data Mining).