TY - GEN
T1 - Backdoor Pre-trained Models Can Transfer to All
AU - Shen, Lujia
AU - Ji, Shouling
AU - Zhang, Xuhong
AU - Li, Jinfeng
AU - Chen, Jing
AU - Shi, Jie
AU - Fang, Chengfang
AU - Yin, Jianwei
AU - Wang, Ting
N1 - Funding Information:
This work was partly supported by the Zhejiang Provincial Natural Science Foundation for Distinguished Young Scholars under No. LR19F020003, NSFC under No. U1936215, 61772466, and U1836202, and the Fundamental Research Funds for the Central Universities (Zhejiang University NGICS Platform). Ting Wang is partially supported by the National Science Foundation under Grant No. 1953893, 1953813, and 1951729.
Publisher Copyright:
© 2021 ACM.
PY - 2021/11/12
Y1 - 2021/11/12
N2 - Pre-trained general-purpose language models have been a dominating component in enabling real-world natural language processing (NLP) applications. However, a pre-trained model with backdoor can be a severe threat to the applications. Most existing backdoor attacks in NLP are conducted in the fine-tuning phase by introducing malicious triggers in the targeted class, thus relying greatly on the prior knowledge of the fine-tuning task. In this paper, we propose a new approach to map the inputs containing triggers directly to a predefined output representation of the pre-trained NLP models, e.g., a predefined output representation for the classification token in BERT, instead of a target label. It can thus introduce backdoor to a wide range of downstream tasks without any prior knowledge. Additionally, in light of the unique properties of triggers in NLP, we propose two new metrics to measure the performance of backdoor attacks in terms of both effectiveness and stealthiness. Our experiments with various types of triggers show that our method is widely applicable to different fine-tuning tasks (classification and named entity recognition) and to different models (such as BERT, XLNet, BART), which poses a severe threat. Furthermore, by collaborating with the popular online model repository Hugging Face, the threat brought by our method has been confirmed. Finally, we analyze the factors that may affect the attack performance and share insights on the causes of the success of our backdoor attack.
AB - Pre-trained general-purpose language models have been a dominating component in enabling real-world natural language processing (NLP) applications. However, a pre-trained model with backdoor can be a severe threat to the applications. Most existing backdoor attacks in NLP are conducted in the fine-tuning phase by introducing malicious triggers in the targeted class, thus relying greatly on the prior knowledge of the fine-tuning task. In this paper, we propose a new approach to map the inputs containing triggers directly to a predefined output representation of the pre-trained NLP models, e.g., a predefined output representation for the classification token in BERT, instead of a target label. It can thus introduce backdoor to a wide range of downstream tasks without any prior knowledge. Additionally, in light of the unique properties of triggers in NLP, we propose two new metrics to measure the performance of backdoor attacks in terms of both effectiveness and stealthiness. Our experiments with various types of triggers show that our method is widely applicable to different fine-tuning tasks (classification and named entity recognition) and to different models (such as BERT, XLNet, BART), which poses a severe threat. Furthermore, by collaborating with the popular online model repository Hugging Face, the threat brought by our method has been confirmed. Finally, we analyze the factors that may affect the attack performance and share insights on the causes of the success of our backdoor attack.
UR - http://www.scopus.com/inward/record.url?scp=85119356301&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119356301&partnerID=8YFLogxK
U2 - 10.1145/3460120.3485370
DO - 10.1145/3460120.3485370
M3 - Conference contribution
AN - SCOPUS:85119356301
T3 - Proceedings of the ACM Conference on Computer and Communications Security
SP - 3141
EP - 3158
BT - CCS 2021 - Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security
PB - Association for Computing Machinery
T2 - 27th ACM Annual Conference on Computer and Communication Security, CCS 2021
Y2 - 15 November 2021 through 19 November 2021
ER -