Classification Models and Survival Analysis for Prostate Cancer Using RNA Sequencing and Clinical Data

Md Faisal Kabir, Simone A. Ludwig

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Early detection of cancer can significantly increase the chance of successful treatment. This research performs a study on early cancer detection for prostate cancer patients from whom cancer tissue was analyzed with Illumina Hi-Seq ribonucleic acid (RNA) Sequencing (RNA-Seq). Cancer relevant genes with the most significant correlations with the clinical outcome of the sample type (cancer /non-cancer) and the overall survival (OS) were assessed. Traditional cancer diagnosis primarily depends on physicians' experience to identify morphological abnormalities. Gene expression level data can assist physicians in detecting cancer cases at a much earlier stage and thus can significantly improve the potential of patient treatment. In this research, for the classification task, we applied machine learning and data mining approaches to detect cancer versus non-cancer based on gene expression data. Our goal was to detect cancer at the earliest stage. Besides, for the regression task, survival outcomes in prostate cancer patients were performed. Regression trees were built using cancer-sensitive genes along with clinical attribute 'Gleason score' as predictors, and the clinical variable 'overall survival' as the target variable. Knowledge in the form of rules is one of the vital tasks in data mining as it provides concise statements of easily understandable and potentially valuable information. For the classification model, we derived rules from a decision tree and interpreted these rules for cancer and non-cancer patients. For the regression or survival model, we generated rules for predicting or estimating the survival time of cancer patients. In this study, cancer-relevant genes were analyzed as predictors, although various genes may interact with genes currently known to contribute to cancer. These findings have implications for assessing gene-gene interactions and gene-environment interactions of prostate cancer as well as for other types of cancer.

    Original languageEnglish (US)
    Title of host publicationProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
    EditorsChaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages2736-2745
    Number of pages10
    ISBN (Electronic)9781728108582
    DOIs
    StatePublished - Dec 2019
    Event2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States
    Duration: Dec 9 2019Dec 12 2019

    Publication series

    NameProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

    Conference

    Conference2019 IEEE International Conference on Big Data, Big Data 2019
    CountryUnited States
    CityLos Angeles
    Period12/9/1912/12/19

    All Science Journal Classification (ASJC) codes

    • Artificial Intelligence
    • Computer Networks and Communications
    • Information Systems
    • Information Systems and Management

    Fingerprint Dive into the research topics of 'Classification Models and Survival Analysis for Prostate Cancer Using RNA Sequencing and Clinical Data'. Together they form a unique fingerprint.

    Cite this