CHECKER: Detecting Clickbait Thumbnails with Weak Supervision and Co-teaching

Tianyi Xie, Thai Le, Dongwon Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Clickbait thumbnails on video-sharing platforms (e.g., YouTube, Dailymotion) are small catchy images that are designed to entice users to click to view the linked videos. Despite their usefulness, the landing videos after click are often inconsistent with what the thumbnails have advertised, causing poor user experience and undermining the reputation of the platforms. In this work, therefore, we aim to develop a computational solution, named as CHECKER, to detect clickbait thumbnails with high accuracy. Due to the fuzziness in the definition of clickbait thumbnails and subsequent challenges in creating high-quality labeled samples, the industry has not coped with clickbait thumbnails adequately. To address this challenge, CHECKER shares a novel clickbait thumbnail dataset and codebase with the industry, and exploits: (1) the weak supervision framework to generate many noisy-but-useful labels, and (2) the co-teaching framework to learn robustly using such noisy labels. Moreover, we also investigate how to detect clickbaits on video-sharing platforms with both thumbnails and titles, and exploit recent advances in vision-language models. In the empirical validation, CHECKER outperforms five baselines by at least 6.4% in F1-score and 4.2% in AUC-ROC. The codebase and dataset from our paper are available at: https://github.com/XPandora/CHECKER.

Original languageEnglish (US)
Title of host publicationMachine Learning and Knowledge Discovery in Databases
Subtitle of host publicationApplied Data Science Track - European Conference, ECML PKDD 2021, Proceedings
EditorsYuxiao Dong, Nicolas Kourtellis, Barbara Hammer, Jose A. Lozano
PublisherSpringer Science and Business Media Deutschland GmbH
Pages415-430
Number of pages16
ISBN (Print)9783030865160
DOIs
StatePublished - 2021
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021 - Virtual, Online
Duration: Sep 13 2021Sep 17 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12979 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021
CityVirtual, Online
Period9/13/219/17/21

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'CHECKER: Detecting Clickbait Thumbnails with Weak Supervision and Co-teaching'. Together they form a unique fingerprint.

Cite this