Tissue Classification Using Landmark and Non-Landmark Gene Sets for Feature Selection

Carly L. Clayman, Alakesh Mani, Suraj Bondugula, Satish M. Srinivasan

Research output: Contribution to journalConference articlepeer-review

Abstract

The L1000 dataset, containing gene microarray data from 978 landmark genes has been previously shown to accurately predict expression of ~81% of the remaining 21,290 target genes. Microarray data was utilized to characterize groups of tissue types within the L1000 dataset to assess whether 978 landmark genes, compared to non-landmark genes, would better differentiate samples into clusters containing distinct tissue types. Landmark genes better differentiated k-means clusters, compared to non-landmark genes. These results suggest that landmark genes better characterize heterogeneous samples in their comprehensive genetic profile. Our previous studies showed that categorical separation of samples based on clinical or biological groups generally improves when studying heterogeneous sample types when using landmark genes as features, compared to non-landmark genes. However, the present work indicates that non-landmark genes may also be utilized to separate samples in clustering when there is a large sample size present for training k-means clustering models. In contrast, when studying a small sample size of the same set of heterogenous samples, landmark genes as features improve clustering. This study has implications for assessing various tissue types as landmark genes may be directly measured to predict categorical sample qualities as well as expression of remaining target genes.

Original languageEnglish (US)
Pages (from-to)256-263
Number of pages8
JournalProcedia Computer Science
Volume185
DOIs
StatePublished - 2021
Event2021 Complex Adaptive Systems Conference - Malvern, United States
Duration: Jun 16 2021Jun 18 2021

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Tissue Classification Using Landmark and Non-Landmark Gene Sets for Feature Selection'. Together they form a unique fingerprint.

Cite this