Recent increases in the number of retractions of published papers reflect heightened attention and increased scrutiny in the scientific process motivated, in part, by the replication crisis. These trends motivate computational tools for understanding and assessment of the scholarly record. Here, we sketch the landscape of retracted papers in the Retraction Watch database, a collection of 19k records of published scholarly articles that have been retracted for various reasons (e.g., plagiarism, data error). Using metadata as well as features derived from full-text for a subset of retracted papers in the social and behavioral sciences, we develop a random forest classifier to predict retraction in new samples with 73% accuracy and F1-score of 71%. We believe this study to be the first of its kind to demonstrate the utility of machine learning as a tool for the assessment of retracted work.
|Original language||English (US)|
|Journal||CEUR Workshop Proceedings|
|State||Published - 2021|
|Event||2021 Workshop on Scientific Document Understanding, SDU 2021 - Virtual, Online|
Duration: Feb 9 2021 → …
All Science Journal Classification (ASJC) codes
- Computer Science(all)