Machine learning analysis using 77,044 genomic and transcriptomic profiles to accurately predict tumor type

Jim Abraham, Amy B. Heimberger, John Marshall, Elisabeth Heath, Joseph Drabick, Anthony Helmstetter, Joanne Xiu, Daniel Magee, Phillip Stafford, Chadi Nabhan, Sourabh Antani, Curtis Johnston, Matthew Oberley, Wolfgang Michael Korn, David Spetzler

Research output: Contribution to journalArticlepeer-review

7 Scopus citations


Cancer of Unknown Primary (CUP) occurs in 3–5% of patients when standard histological diagnostic tests are unable to determine the origin of metastatic cancer. Typically, a CUP diagnosis is treated empirically and has very poor outcomes, with median overall survival less than one year. Gene expression profiling alone has been used to identify the tissue of origin but struggles with low neoplastic percentage in metastatic sites which is where identification is often most needed. MI GPSai, a Genomic Prevalence Score, uses DNA sequencing and whole transcriptome data coupled with machine learning to aid in the diagnosis of cancer. The algorithm trained on genomic data from 34,352 cases and genomic and transcriptomic data from 23,137 cases and was validated on 19,555 cases. MI GPSai predicted the tumor type in the labeled data set with an accuracy of over 94% on 93% of cases while deliberating amongst 21 possible categories of cancer. When also considering the second highest prediction, the accuracy increases to 97%. Additionally, MI GPSai rendered a prediction for 71.7% of CUP cases. Pathologist evaluation of discrepancies between submitted diagnosis and MI GPSai predictions resulted in change of diagnosis in 41.3% of the time. MI GPSai provides clinically meaningful information in a large proportion of CUP cases and inclusion of MI GPSai in clinical routine could improve diagnostic fidelity. Moreover, all genomic markers essential for therapy selection are assessed in this assay, maximizing the clinical utility for patients within a single test.

Original languageEnglish (US)
Article number101016
JournalTranslational Oncology
Issue number3
StatePublished - Mar 2021

All Science Journal Classification (ASJC) codes

  • Oncology
  • Cancer Research


Dive into the research topics of 'Machine learning analysis using 77,044 genomic and transcriptomic profiles to accurately predict tumor type'. Together they form a unique fingerprint.

Cite this