Deep Headline Generation for Clickbait Detection

Kai Shu, Suhang Wang, Thai Le, Dongwon Lee, Huan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Clickbaits are catchy social posts or sensational headlines that attempt to lure readers to click. Clickbaits are pervasive on social media and can have significant negative impacts on both users and media ecosystems. For example, users may be misled to receive inaccurate information or fall into click-jacking attacks. Similarly, media platforms could lose readers' trust and revenues due to the prevalence of clickbaits. To computationally detect such clickbaits on social media using a supervised learning framework, one of the major obstacles is the lack of large-scale labeled training data, due to the high cost of labeling. With the recent advancements of deep generative models, to address this challenge, we propose to generate synthetic headlines with specific styles and explore their utilities to help improve clickbait detection. In particular, we propose to generate stylized headlines from original documents with style transfer. Furthermore, as it is non-trivial to generate stylized headlines due to several challenges such as the discrete nature of texts and the requirements of preserving semantic meaning of document while achieving style transfer, we propose a novel solution, named as Stylized Headline Generation (SHG), that can not only generate readable and realistic headlines to enlarge original training data, but also help improve the classification capacity of supervised learning. The experimental results on real-world datasets demonstrate the effectiveness of SHG in generating high-quality and high-utility headlines for clickbait detection.

Original languageEnglish (US)
Title of host publication2018 IEEE International Conference on Data Mining, ICDM 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages467-476
Number of pages10
ISBN (Electronic)9781538691588
DOIs
StatePublished - Dec 27 2018
Event18th IEEE International Conference on Data Mining, ICDM 2018 - Singapore, Singapore
Duration: Nov 17 2018Nov 20 2018

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
Volume2018-November
ISSN (Print)1550-4786

Conference

Conference18th IEEE International Conference on Data Mining, ICDM 2018
CountrySingapore
CitySingapore
Period11/17/1811/20/18

Fingerprint

Supervised learning
Ecosystems
Labeling
Semantics
Costs

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Shu, K., Wang, S., Le, T., Lee, D., & Liu, H. (2018). Deep Headline Generation for Clickbait Detection. In 2018 IEEE International Conference on Data Mining, ICDM 2018 (pp. 467-476). [8594871] (Proceedings - IEEE International Conference on Data Mining, ICDM; Vol. 2018-November). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDM.2018.00062
Shu, Kai ; Wang, Suhang ; Le, Thai ; Lee, Dongwon ; Liu, Huan. / Deep Headline Generation for Clickbait Detection. 2018 IEEE International Conference on Data Mining, ICDM 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 467-476 (Proceedings - IEEE International Conference on Data Mining, ICDM).
@inproceedings{58d8eadea11448e0a6acf8e05cb94a9a,
title = "Deep Headline Generation for Clickbait Detection",
abstract = "Clickbaits are catchy social posts or sensational headlines that attempt to lure readers to click. Clickbaits are pervasive on social media and can have significant negative impacts on both users and media ecosystems. For example, users may be misled to receive inaccurate information or fall into click-jacking attacks. Similarly, media platforms could lose readers' trust and revenues due to the prevalence of clickbaits. To computationally detect such clickbaits on social media using a supervised learning framework, one of the major obstacles is the lack of large-scale labeled training data, due to the high cost of labeling. With the recent advancements of deep generative models, to address this challenge, we propose to generate synthetic headlines with specific styles and explore their utilities to help improve clickbait detection. In particular, we propose to generate stylized headlines from original documents with style transfer. Furthermore, as it is non-trivial to generate stylized headlines due to several challenges such as the discrete nature of texts and the requirements of preserving semantic meaning of document while achieving style transfer, we propose a novel solution, named as Stylized Headline Generation (SHG), that can not only generate readable and realistic headlines to enlarge original training data, but also help improve the classification capacity of supervised learning. The experimental results on real-world datasets demonstrate the effectiveness of SHG in generating high-quality and high-utility headlines for clickbait detection.",
author = "Kai Shu and Suhang Wang and Thai Le and Dongwon Lee and Huan Liu",
year = "2018",
month = "12",
day = "27",
doi = "10.1109/ICDM.2018.00062",
language = "English (US)",
series = "Proceedings - IEEE International Conference on Data Mining, ICDM",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "467--476",
booktitle = "2018 IEEE International Conference on Data Mining, ICDM 2018",
address = "United States",

}

Shu, K, Wang, S, Le, T, Lee, D & Liu, H 2018, Deep Headline Generation for Clickbait Detection. in 2018 IEEE International Conference on Data Mining, ICDM 2018., 8594871, Proceedings - IEEE International Conference on Data Mining, ICDM, vol. 2018-November, Institute of Electrical and Electronics Engineers Inc., pp. 467-476, 18th IEEE International Conference on Data Mining, ICDM 2018, Singapore, Singapore, 11/17/18. https://doi.org/10.1109/ICDM.2018.00062

Deep Headline Generation for Clickbait Detection. / Shu, Kai; Wang, Suhang; Le, Thai; Lee, Dongwon; Liu, Huan.

2018 IEEE International Conference on Data Mining, ICDM 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 467-476 8594871 (Proceedings - IEEE International Conference on Data Mining, ICDM; Vol. 2018-November).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Deep Headline Generation for Clickbait Detection

AU - Shu, Kai

AU - Wang, Suhang

AU - Le, Thai

AU - Lee, Dongwon

AU - Liu, Huan

PY - 2018/12/27

Y1 - 2018/12/27

N2 - Clickbaits are catchy social posts or sensational headlines that attempt to lure readers to click. Clickbaits are pervasive on social media and can have significant negative impacts on both users and media ecosystems. For example, users may be misled to receive inaccurate information or fall into click-jacking attacks. Similarly, media platforms could lose readers' trust and revenues due to the prevalence of clickbaits. To computationally detect such clickbaits on social media using a supervised learning framework, one of the major obstacles is the lack of large-scale labeled training data, due to the high cost of labeling. With the recent advancements of deep generative models, to address this challenge, we propose to generate synthetic headlines with specific styles and explore their utilities to help improve clickbait detection. In particular, we propose to generate stylized headlines from original documents with style transfer. Furthermore, as it is non-trivial to generate stylized headlines due to several challenges such as the discrete nature of texts and the requirements of preserving semantic meaning of document while achieving style transfer, we propose a novel solution, named as Stylized Headline Generation (SHG), that can not only generate readable and realistic headlines to enlarge original training data, but also help improve the classification capacity of supervised learning. The experimental results on real-world datasets demonstrate the effectiveness of SHG in generating high-quality and high-utility headlines for clickbait detection.

AB - Clickbaits are catchy social posts or sensational headlines that attempt to lure readers to click. Clickbaits are pervasive on social media and can have significant negative impacts on both users and media ecosystems. For example, users may be misled to receive inaccurate information or fall into click-jacking attacks. Similarly, media platforms could lose readers' trust and revenues due to the prevalence of clickbaits. To computationally detect such clickbaits on social media using a supervised learning framework, one of the major obstacles is the lack of large-scale labeled training data, due to the high cost of labeling. With the recent advancements of deep generative models, to address this challenge, we propose to generate synthetic headlines with specific styles and explore their utilities to help improve clickbait detection. In particular, we propose to generate stylized headlines from original documents with style transfer. Furthermore, as it is non-trivial to generate stylized headlines due to several challenges such as the discrete nature of texts and the requirements of preserving semantic meaning of document while achieving style transfer, we propose a novel solution, named as Stylized Headline Generation (SHG), that can not only generate readable and realistic headlines to enlarge original training data, but also help improve the classification capacity of supervised learning. The experimental results on real-world datasets demonstrate the effectiveness of SHG in generating high-quality and high-utility headlines for clickbait detection.

UR - http://www.scopus.com/inward/record.url?scp=85061372085&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061372085&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2018.00062

DO - 10.1109/ICDM.2018.00062

M3 - Conference contribution

AN - SCOPUS:85061372085

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 467

EP - 476

BT - 2018 IEEE International Conference on Data Mining, ICDM 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Shu K, Wang S, Le T, Lee D, Liu H. Deep Headline Generation for Clickbait Detection. In 2018 IEEE International Conference on Data Mining, ICDM 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 467-476. 8594871. (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDM.2018.00062