Background: Online pharmacies have grown significantly in recent years, from US $29.35 billion in 2014 to an expected US $128 billion in 2023 worldwide. Although legitimate online pharmacies (LOPs) provide a channel of convenience and potentially lower costs for patients, illicit online pharmacies (IOPs) open the doors to unfettered access to prescription drugs, controlled substances (eg, opioids), and potentially counterfeits, posing a dramatic risk to the drug supply chain and the health of the patient. Unfortunately, we know little about IOPs, and even identifying and monitoring IOPs is challenging because of the large number of online pharmacies (at least 30,000-35,000) and the dynamic nature of the online channel (online pharmacies open and shut down easily). Objective: This study aims to increase our understanding of IOPs through web data traffic analysis and propose a novel framework using referral links to predict and identify IOPs, the first step in fighting IOPs. Methods: We first collected web traffic and engagement data to study and compare how consumers access and engage with LOPs and IOPs. We then proposed a simple but novel framework for predicting the status of online pharmacies (legitimate or illicit) through the referral links between websites. Under this framework, we developed 2 prediction models, the reference rating prediction method (RRPM) and the reference-based K-nearest neighbor. Results: We found that direct (typing URL), search, and referral are the 3 major traffic sources, representing more than 95% traffic to both LOPs and IOPs. It is alarming to see that direct represents the second-highest traffic source (34.32%) to IOPs. When tested on a data set with 763 online pharmacies, both RRPM and R2NN performed well, achieving an accuracy above 95% in their predictions of the status for the online pharmacies. R2NN outperformed RRPM in full performance metrics (accuracy, kappa, specificity, and sensitivity). On implementing the 2 models on Google search results for popular drugs (Xanax [alprazolam], OxyContin, and opioids), they produced an error rate of only 7.96% (R2NN) and 6.20% (RRPM). Conclusions: Our prediction models use what we know (referral links) to tackle the many unknown aspects of IOPs. They have many potential applications for patients, search engines, social media, payment companies, policy makers or government agencies, and drug manufacturers to help fight IOPs. With scarce work in this area, we hope to help address the current opioid crisis from this perspective and inspire future research in the critical area of drug safety.
All Science Journal Classification (ASJC) codes
- Health Informatics