Comparative Study of Disease Classification Using Multiple Machine Learning Models Based on Landmark and Non-Landmark Gene Expression Data

Research output: Contribution to journalConference articlepeer-review

Abstract

This study compares disease classification based on landmark and non-landmark gene expression data, and clinical variable using multiple machine-learning models. The influence of the number of principal components and the genes were also investigated. The results indicate that the ANN model has the best accuracy for disease type prediction among all the models, model using 95 principal components has better accuracy than that of 25 principal components, and the greater number of genes used, the higher the prediction accuracy. Models using landmark genes demonstrated better accuracy than the models using non-landmark genes especially with 95 PCs across all the models except for the decision trees. The optimal model was one that uses landmark genes with 95 PCs as features for an ANN classifier. The AUC measures obtained on the test set were 0.98,0.98,1 and 0.96 for Autoimmune, Bacteremia, Cancer and Healthy classes respectively, and the accuracy for the respective classes were 97.56%, 95.65%, 95.65%, and 58.82%. The ANN model demonstrated a good capability of distinguishing between the true positives and the false positives, and it resulted in high prediction accuracy for the 3 disease classes (Autoimmune, Bacteremia, Cancer), but it misclassified some instances from the Healthy class to the Autoimmune and Bacteremia class, likely due to a wide range of gene expression level for the Healthy class.

Original languageEnglish (US)
Pages (from-to)264-273
Number of pages10
JournalProcedia Computer Science
Volume185
DOIs
StatePublished - 2021
Event2021 Complex Adaptive Systems Conference - Malvern, United States
Duration: Jun 16 2021Jun 18 2021

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Comparative Study of Disease Classification Using Multiple Machine Learning Models Based on Landmark and Non-Landmark Gene Expression Data'. Together they form a unique fingerprint.

Cite this