Clickbait thumbnails on video-sharing platforms (e.g., YouTube, Dailymotion) are small catchy images that are designed to entice users to click to view the linked videos. Despite their usefulness, the landing videos after click are often inconsistent with what the thumbnails have advertised, causing poor user experience and undermining the reputation of the platforms. In this work, therefore, we aim to develop a computational solution, named as CHECKER, to detect clickbait thumbnails with high accuracy. Due to the fuzziness in the definition of clickbait thumbnails and subsequent challenges in creating high-quality labeled samples, the industry has not coped with clickbait thumbnails adequately. To address this challenge, CHECKER shares a novel clickbait thumbnail dataset and codebase with the industry, and exploits: (1) the weak supervision framework to generate many noisy-but-useful labels, and (2) the co-teaching framework to learn robustly using such noisy labels. Moreover, we also investigate how to detect clickbaits on video-sharing platforms with both thumbnails and titles, and exploit recent advances in vision-language models. In the empirical validation, CHECKER outperforms five baselines by at least 6.4% in F1-score and 4.2% in AUC-ROC. The codebase and dataset from our paper are available at: https://github.com/XPandora/CHECKER.