Identifying genes (biomarkers) and predicting the clinical outcomes with censored survival times are important for cancer prognosis and pathogenesis. In this article, we propose a novel method with L1 penalized global AUC summary maximization (L1GAUCS). The L1GAUCS method is developed for simultaneous gene (feature) selection and survival prediction. L1 penalty shrinks coefficients and produces some coefficients that are exactly zero, and therefore selects a small subset of genes (features). It is a well-known fact that many genes are highly correlated in gene expression data and the highly correlated genes may function together. We, therefore, define a correlation measure to identify those genes such that their expression level may be low but they are highly correlated with the downstream highly expressed genes selected with L1GAUCS. Partial pathways associated with the correlated genes are identified with DAVID (http://david.abcc.ncifcrf. gov/). Experimental results with chemotherapy and gene expression data demonstrate that the proposed procedures can be used for identifying important genes and pathways that are related to time to death due to cancer and for building a parsimonious model for predicting the survival of future patients. Software is available upon request from the first author.
All Science Journal Classification (ASJC) codes
- Modeling and Simulation
- Molecular Biology
- Computational Mathematics
- Computational Theory and Mathematics