Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers

2405.05573

YC

0

Reddit

0

Published 5/10/2024 by Binxiao Huang, Jason Chun Lok, Chang Liu, Ngai Wong
Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers

Abstract

Poisoning-based backdoor attacks expose vulnerabilities in the data preparation stage of deep neural network (DNN) training. The DNNs trained on the poisoned dataset will be embedded with a backdoor, making them behave well on clean data while outputting malicious predictions whenever a trigger is applied. To exploit the abundant information contained in the input data to output label mapping, our scheme utilizes the network trained from the clean dataset as a trigger generator to produce poisons that significantly raise the success rate of backdoor attacks versus conventional approaches. Specifically, we provide a new categorization of triggers inspired by the adversarial technique and develop a multi-label and multi-payload Poisoning-based backdoor attack with Positive Triggers (PPT), which effectively moves the input closer to the target label on benign classifiers. After the classifier is trained on the poisoned dataset, we can generate an input-label-aware trigger to make the infected classifier predict any given input to any target label with a high possibility. Under both dirty- and clean-label settings, we show empirically that the proposed attack achieves a high attack success rate without sacrificing accuracy across various datasets, including SVHN, CIFAR10, GTSRB, and Tiny ImageNet. Furthermore, the PPT attack can elude a variety of classical backdoor defenses, proving its effectiveness.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper introduces a new type of poisoning-based backdoor attack that can target arbitrary labels, using "positive triggers" rather than the more common "negative triggers."
  • The proposed attack methodology is designed to be more practical and effective than previous backdoor attack approaches.
  • The researchers conduct experiments to evaluate the performance of their attack on various image classification models, demonstrating its ability to achieve high attack success rates.

Plain English Explanation

The researchers have developed a new way to secretly insert backdoors into machine learning models. Backdoors are vulnerabilities that allow an attacker to control a model's behavior, causing it to misclassify specific inputs in a desired way.

Previous backdoor attacks often relied on "negative triggers" - small changes to an image that would cause the model to misclassify it. The new approach uses "positive triggers" instead, which are patterns that are added to the image. This makes the attack more practical, as the positive triggers are less likely to be detected.

The researchers tested their attack on different image classification models and found that it was able to achieve high success rates in getting the models to misclassify images in the way the attacker wanted. This is concerning, as it demonstrates how backdoors can be covertly inserted into machine learning systems, potentially causing them to behave in unintended and harmful ways.

Technical Explanation

The paper proposes a new poisoning-based backdoor attack that can target arbitrary labels, using "positive triggers" rather than the more common "negative triggers." [link to related work on backdoor attacks using inverted labels, efficient backdoor attacks, and clean graph backdoor attacks]

The key innovation is the use of positive triggers - patterns or features that are added to the input image, rather than small perturbations that are meant to evade detection. The researchers argue that positive triggers are more practical and effective, as they are less likely to be detected by defenses.

The attack process involves poisoning the training data by injecting a small number of samples with the positive trigger and the attacker's desired target label. During inference, if the positive trigger is present in the input, the model will misclassify it as the target label.

The researchers evaluate their attack on various image classification models, including ResNet and VGG. They find that the attack can achieve high success rates, with the model misclassifying over 90% of triggered inputs as the target label, while maintaining high clean accuracy on unmodified samples.

[link to work on poisoning web-scale training datasets and the LSP framework for defeating trigger reversal]

Critical Analysis

The paper makes a compelling case for the practical advantages of using positive triggers over negative triggers in backdoor attacks. The experimental results demonstrate the potency of this approach, which is concerning from a security and robustness perspective.

However, the paper does not delve deeply into potential countermeasures or defenses against this type of attack. While the authors mention that positive triggers may be harder to detect, more research is needed to understand the broader implications and develop effective mitigation strategies.

Additionally, the paper focuses solely on image classification tasks, and it would be valuable to explore the applicability of this attack methodology to other domains, such as natural language processing or speech recognition models.

Conclusion

This paper introduces a novel poisoning-based backdoor attack that leverages positive triggers to achieve targeted misclassification. The researchers show that this approach can be highly effective, with the potential to undermine the reliability and security of machine learning systems.

The findings highlight the importance of ongoing research into the robustness and safety of machine learning models, as well as the development of comprehensive defense mechanisms to safeguard against such attacks. As the field of artificial intelligence continues to advance, addressing these types of vulnerabilities will be crucial to ensuring the trustworthiness and responsible deployment of these technologies.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

A Backdoor Approach with Inverted Labels Using Dirty Label-Flipping Attacks

A Backdoor Approach with Inverted Labels Using Dirty Label-Flipping Attacks

Orson Mengara

YC

0

Reddit

0

Audio-based machine learning systems frequently use public or third-party data, which might be inaccurate. This exposes deep neural network (DNN) models trained on such data to potential data poisoning attacks. In this type of assault, attackers can train the DNN model using poisoned data, potentially degrading its performance. Another type of data poisoning attack that is extremely relevant to our investigation is label flipping, in which the attacker manipulates the labels for a subset of data. It has been demonstrated that these assaults may drastically reduce system performance, even for attackers with minimal abilities. In this study, we propose a backdoor attack named 'DirtyFlipping', which uses dirty label techniques, label-on-label, to input triggers (clapping) in the selected data patterns associated with the target class, thereby enabling a stealthy backdoor.

Read more

4/9/2024

🏋️

SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks

Xuanli He, Qiongkai Xu, Jun Wang, Benjamin I. P. Rubinstein, Trevor Cohn

YC

0

Reddit

0

Modern NLP models are often trained on public datasets drawn from diverse sources, rendering them vulnerable to data poisoning attacks. These attacks can manipulate the model's behavior in ways engineered by the attacker. One such tactic involves the implantation of backdoors, achieved by poisoning specific training instances with a textual trigger and a target class label. Several strategies have been proposed to mitigate the risks associated with backdoor attacks by identifying and removing suspected poisoned examples. However, we observe that these strategies fail to offer effective protection against several advanced backdoor attacks. To remedy this deficiency, we propose a novel defensive mechanism that first exploits training dynamics to identify poisoned samples with high precision, followed by a label propagation step to improve recall and thus remove the majority of poisoned instances. Compared with recent advanced defense methods, our method considerably reduces the success rates of several backdoor attacks while maintaining high classification accuracy on clean test sets.

Read more

5/21/2024

Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios

Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios

Ziqiang Li, Hong Sun, Pengfei Xia, Heng Li, Beihao Xia, Yi Wu, Bin Li

YC

0

Reddit

0

Recent deep neural networks (DNNs) have came to rely on vast amounts of training data, providing an opportunity for malicious attackers to exploit and contaminate the data to carry out backdoor attacks. However, existing backdoor attack methods make unrealistic assumptions, assuming that all training data comes from a single source and that attackers have full access to the training data. In this paper, we introduce a more realistic attack scenario where victims collect data from multiple sources, and attackers cannot access the complete training data. We refer to this scenario as data-constrained backdoor attacks. In such cases, previous attack methods suffer from severe efficiency degradation due to the entanglement between benign and poisoning features during the backdoor injection process. To tackle this problem, we introduce three CLIP-based technologies from two distinct streams: Clean Feature Suppression and Poisoning Feature Augmentation.effective solution for data-constrained backdoor attacks. The results demonstrate remarkable improvements, with some settings achieving over 100% improvement compared to existing attacks in data-constrained scenarios. Code is available at https://github.com/sunh1113/Efficient-backdoor-attacks-for-deep-neural-networks-in-real-world-scenarios

Read more

4/22/2024

Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

Shaokui Wei, Hongyuan Zha, Baoyuan Wu

YC

0

Reddit

0

Data-poisoning backdoor attacks are serious security threats to machine learning models, where an adversary can manipulate the training dataset to inject backdoors into models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. Unlike most existing methods that primarily detect and remove/unlearn suspicious samples to mitigate malicious backdoor attacks, we propose a novel defense approach called PDB (Proactive Defensive Backdoor). Specifically, PDB leverages the home field advantage of defenders by proactively injecting a defensive backdoor into the model during training. Taking advantage of controlling the training process, the defensive backdoor is designed to suppress the malicious backdoor effectively while remaining secret to attackers. In addition, we introduce a reversible mapping to determine the defensive target label. During inference, PDB embeds a defensive trigger in the inputs and reverses the model's prediction, suppressing malicious backdoor and ensuring the model's utility on the original task. Experimental results across various datasets and models demonstrate that our approach achieves state-of-the-art defense performance against a wide range of backdoor attacks.

Read more

5/28/2024