Entity Backdoor Attacks Against Fine-Tuned Models

Document Type

Conference Proceeding

Publication Date

1-1-2025

Abstract

Fine-tuning is a training paradigm that allows large models to achieve strong performance on downstream tasks with a small number of samples and minimal training time. However, this study reveals that the models fine-tuned from pre-trained models are vulnerable to a new threat called an entity backdoor attack. Entity backdoor attacks are a new type of backdoor attack that can use arbitrary instances in a given entity to trigger a backdoor attack. More importantly, the arbitrary instances (i.e., poisoned example) in the entity are visually similar to the clean example. For example, an entity backdoor attack can use the husky dog (which belongs to the dog entity) to trigger the stop sign class in the traffic recognition task, but the poisoned example in the training dataset is visually like a stop sign. The advantages of entity backdoor attacks over traditional backdoor attacks are twofold. First, entity backdoor attacks are triggered more stealthily because they do not require a specially defined trigger pattern superimposed on a normal image to trigger the backdoor attack. The instance (e.g., a husky dog) itself is a trigger, using the instance can directly trigger the backdoor attack. Second, the poisoned examples in the training datasets of entity backdoor attacks are more stealthy because we use very small perturbations to generate the poisoned examples, making them hard to distinguish from the clean examples. Experiments on multiple datasets show that systems using fine-tuned models are vulnerable to the threat of entity backdoor attacks.

Publication Title

Lecture Notes in Computer Science

ISBN

[9789819500086]

Share

COinS