Post-marketing surveillance of antineoplastic agents is performed to evaluate the efficacy and safety in patients aiming at expanding drug indications and discovering potential adverse events. The real-world data is fraught with missing values. Literature addressing different strategies for dealing with missing data in such a situation is scarce. Using machine learning (ML) algorithms for predicting therapeutic outcomes of PD-1/PD-L1 Inhibitors has attracted attention. However, training a predictive model usually requires imaging or biomarker information, which is rarely available in the post-marketing surveillance data. To address these challenges, we propose an ML-aided framework to predict the outcomes of Anti-PD-1 therapy for gynecological malignancy on a dataset with 117 patient samples, treated by Camrelizumab (with 50 patient samples), Sintilimab (44), and Toripalimab (23). Four therapeutic outcomes, including Response Evaluation Criteria in Solid Tumours (RECIST), organ adverse effect (AE), general AE, and death, are predicted. The proposed framework feeds the dataset into a learning pipeline consisting of imputation, feature engineering, model training, ensemble learning, and model selection to generate the final predictive model. We conduct experiments to justify several critical design choices, such as the specific feature engineering strategies and the SMOTE over-sampling technique. The final model for each learning task is selected from a large pool of model candidates based on a joint consideration of accuracy and F1. Moreover, we conduct thorough and visualized model analysis and gain a deeper understanding of model behavior and feature importance. The results, analysis, and findings demonstrate the superiority of the proposed learning-aided framework.
All Science Journal Classification (ASJC) codes
- Computer Science(all)
- Materials Science(all)