Poisoning Web-Scale Training Datasets is Practical

2302.10149

YC

171

Reddit

0

Published 5/7/2024 by Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, Florian Tram`er

🏋️

Abstract

Deep learning models are often trained on distributed, web-scale datasets crawled from the internet. In this paper, we introduce two new dataset poisoning attacks that intentionally introduce malicious examples to a model's performance. Our attacks are immediately practical and could, today, poison 10 popular datasets. Our first attack, split-view poisoning, exploits the mutable nature of internet content to ensure a dataset annotator's initial view of the dataset differs from the view downloaded by subsequent clients. By exploiting specific invalid trust assumptions, we show how we could have poisoned 0.01% of the LAION-400M or COYO-700M datasets for just $60 USD. Our second attack, frontrunning poisoning, targets web-scale datasets that periodically snapshot crowd-sourced content -- such as Wikipedia -- where an attacker only needs a time-limited window to inject malicious examples. In light of both attacks, we notify the maintainers of each affected dataset and recommended several low-overhead defenses.

Get summaries of the top AI research delivered straight to your inbox:

Overview

  • Deep learning models are often trained on large datasets crawled from the internet
  • This paper introduces two new attacks that can intentionally introduce malicious examples into these datasets
  • The attacks could be used to poison 10 popular datasets today for a low cost

Plain English Explanation

The paper describes two new ways that bad actors could secretly insert malicious content into the massive online datasets used to train popular AI models.

The first attack, called "split-view poisoning", exploits the fact that internet content can change over time. An attacker could make a dataset annotator see one version of a web page, while secretly providing a different, malicious version to the model during training. By doing this for just 0.01% of a huge dataset like LAION-400M or COYO-700M, an attacker could poison the entire dataset for only $60.

The second attack, "frontrunning poisoning", targets datasets that regularly take snapshots of crowd-sourced content like Wikipedia. Here, an attacker only needs a short window of time to inject malicious examples before the snapshot is taken, allowing them to contaminate the entire dataset.

The researchers notified the maintainers of the affected datasets and suggested some simple defenses against these attacks.

Technical Explanation

The paper introduces two novel dataset poisoning attacks that could be used to maliciously contaminate the large web-crawled datasets commonly used to train deep learning models.

In the "split-view poisoning" attack, the researchers exploit the mutable nature of internet content. By ensuring that a dataset annotator sees a benign version of a web page, while a subsequent client downloading the dataset receives a malicious version, the researchers show how they could have poisoned 0.01% of the LAION-400M or COYO-700M datasets for just $60.

The "frontrunning poisoning" attack targets datasets that take periodic snapshots of crowd-sourced content, like Wikipedia. Here, an attacker only needs a limited time window to inject malicious examples before the snapshot is taken, allowing them to contaminate the entire dataset.

The researchers responsibly disclosed these attacks to the maintainers of the affected datasets and provided recommendations for low-overhead defenses.

Critical Analysis

The paper presents compelling evidence of the vulnerability of web-crawled datasets to poisoning attacks. The "split-view poisoning" and "frontrunning poisoning" techniques appear to be immediately practical and could be used to contaminate major datasets used in deep learning today.

One limitation is that the paper does not explore the long-term impact of these attacks on downstream model performance and robustness. While the researchers demonstrate the ability to insert malicious content, more research is needed to understand how this would translate to real-world harms.

Additionally, the proposed defenses, while sensible, may not be sufficient to fully mitigate these threats. Ongoing vigilance and more sophisticated techniques for detecting and removing malicious content may be necessary as attackers become more sophisticated.

Overall, this research highlights the need for deep learning practitioners to carefully scrutinize the data they use and implement robust safeguards against adversarial manipulation. As the field of AI continues to advance, addressing dataset security will be crucial to ensuring the reliability and trustworthiness of these powerful technologies.

Conclusion

This paper introduces two practical attacks that bad actors could use to secretly contaminate the large web-crawled datasets commonly used to train deep learning models. By exploiting the mutable nature of internet content and the periodic snapshot approach of some datasets, the researchers demonstrate how attackers could poison 10 popular datasets for a low cost.

While the researchers provided some initial defenses, this work underscores the broader challenge of maintaining the integrity of training data in an era of web-scale AI. Addressing these dataset security vulnerabilities will be crucial to ensuring the reliability and trustworthiness of deep learning models as they become increasingly ubiquitous in our lives.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers

Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers

Binxiao Huang, Jason Chun Lok, Chang Liu, Ngai Wong

YC

0

Reddit

0

Poisoning-based backdoor attacks expose vulnerabilities in the data preparation stage of deep neural network (DNN) training. The DNNs trained on the poisoned dataset will be embedded with a backdoor, making them behave well on clean data while outputting malicious predictions whenever a trigger is applied. To exploit the abundant information contained in the input data to output label mapping, our scheme utilizes the network trained from the clean dataset as a trigger generator to produce poisons that significantly raise the success rate of backdoor attacks versus conventional approaches. Specifically, we provide a new categorization of triggers inspired by the adversarial technique and develop a multi-label and multi-payload Poisoning-based backdoor attack with Positive Triggers (PPT), which effectively moves the input closer to the target label on benign classifiers. After the classifier is trained on the poisoned dataset, we can generate an input-label-aware trigger to make the infected classifier predict any given input to any target label with a high possibility. Under both dirty- and clean-label settings, we show empirically that the proposed attack achieves a high attack success rate without sacrificing accuracy across various datasets, including SVHN, CIFAR10, GTSRB, and Tiny ImageNet. Furthermore, the PPT attack can elude a variety of classical backdoor defenses, proving its effectiveness.

Read more

5/10/2024

Manipulating Recommender Systems: A Survey of Poisoning Attacks and Countermeasures

Manipulating Recommender Systems: A Survey of Poisoning Attacks and Countermeasures

Thanh Toan Nguyen, Quoc Viet Hung Nguyen, Thanh Tam Nguyen, Thanh Trung Huynh, Thanh Thi Nguyen, Matthias Weidlich, Hongzhi Yin

YC

0

Reddit

0

Recommender systems have become an integral part of online services to help users locate specific information in a sea of data. However, existing studies show that some recommender systems are vulnerable to poisoning attacks, particularly those that involve learning schemes. A poisoning attack is where an adversary injects carefully crafted data into the process of training a model, with the goal of manipulating the system's final recommendations. Based on recent advancements in artificial intelligence, such attacks have gained importance recently. While numerous countermeasures to poisoning attacks have been developed, they have not yet been systematically linked to the properties of the attacks. Consequently, assessing the respective risks and potential success of mitigation strategies is difficult, if not impossible. This survey aims to fill this gap by primarily focusing on poisoning attacks and their countermeasures. This is in contrast to prior surveys that mainly focus on attacks and their detection methods. Through an exhaustive literature review, we provide a novel taxonomy for poisoning attacks, formalise its dimensions, and accordingly organise 30+ attacks described in the literature. Further, we review 40+ countermeasures to detect and/or prevent poisoning attacks, evaluating their effectiveness against specific types of attacks. This comprehensive survey should serve as a point of reference for protecting recommender systems against poisoning attacks. The article concludes with a discussion on open issues in the field and impactful directions for future research. A rich repository of resources associated with poisoning attacks is available at https://github.com/tamlhp/awesome-recsys-poisoning.

Read more

4/24/2024

A GAN-Based Data Poisoning Attack Against Federated Learning Systems and Its Countermeasure

A GAN-Based Data Poisoning Attack Against Federated Learning Systems and Its Countermeasure

Wei Sun, Bo Gao, Ke Xiong, Yuwei Wang

YC

0

Reddit

0

As a distributed machine learning paradigm, federated learning (FL) is collaboratively carried out on privately owned datasets but without direct data access. Although the original intention is to allay data privacy concerns, available but not visible data in FL potentially brings new security threats, particularly poisoning attacks that target such not visible local data. Initial attempts have been made to conduct data poisoning attacks against FL systems, but cannot be fully successful due to their high chance of causing statistical anomalies. To unleash the potential for truly invisible attacks and build a more deterrent threat model, in this paper, a new data poisoning attack model named VagueGAN is proposed, which can generate seemingly legitimate but noisy poisoned data by untraditionally taking advantage of generative adversarial network (GAN) variants. Capable of manipulating the quality of poisoned data on demand, VagueGAN enables to trade-off attack effectiveness and stealthiness. Furthermore, a cost-effective countermeasure named Model Consistency-Based Defense (MCD) is proposed to identify GAN-poisoned data or models after finding out the consistency of GAN outputs. Extensive experiments on multiple datasets indicate that our attack method is generally much more stealthy as well as more effective in degrading FL performance with low complexity. Our defense method is also shown to be more competent in identifying GAN-poisoned data or models. The source codes are publicly available at href{https://github.com/SSssWEIssSS/VagueGAN-Data-Poisoning-Attack-and-Its-Countermeasure}{https://github.com/SSssWEIssSS/VagueGAN-Data-Poisoning-Attack-and-Its-Countermeasure}.

Read more

5/22/2024

🏋️

SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks

Xuanli He, Qiongkai Xu, Jun Wang, Benjamin I. P. Rubinstein, Trevor Cohn

YC

0

Reddit

0

Modern NLP models are often trained on public datasets drawn from diverse sources, rendering them vulnerable to data poisoning attacks. These attacks can manipulate the model's behavior in ways engineered by the attacker. One such tactic involves the implantation of backdoors, achieved by poisoning specific training instances with a textual trigger and a target class label. Several strategies have been proposed to mitigate the risks associated with backdoor attacks by identifying and removing suspected poisoned examples. However, we observe that these strategies fail to offer effective protection against several advanced backdoor attacks. To remedy this deficiency, we propose a novel defensive mechanism that first exploits training dynamics to identify poisoned samples with high precision, followed by a label propagation step to improve recall and thus remove the majority of poisoned instances. Compared with recent advanced defense methods, our method considerably reduces the success rates of several backdoor attacks while maintaining high classification accuracy on clean test sets.

Read more

5/21/2024