Deep learning models are often trained on distributed, web-scale datasets crawled from the internet. In this paper, we introduce two new dataset poisoning attacks that intentionally introduce malicious examples to a model's performance. Our attacks are immediately practical and could, today, poison 10 popular datasets. Our first attack, split-view poisoning, exploits the mutable nature of internet content to ensure a dataset annotator's initial view of the dataset differs from the view downloaded by subsequent clients. By exploiting specific invalid trust assumptions, we show how we could have poisoned 0.01% of the LAION-400M or COYO-700M datasets for just $60 USD. Our second attack, frontrunning poisoning, targets web-scale datasets that periodically snapshot crowd-sourced content -- such as Wikipedia -- where an attacker only needs a time-limited window to inject malicious examples. In light of both attacks, we notify the maintainers of each affected dataset and recommended several low-overhead defenses.

## Overview

- Deep learning models are often trained on large datasets crawled from the internet
- This paper introduces two new attacks that can intentionally introduce malicious examples into these datasets
- The attacks could be used to poison 10 popular datasets today for a low cost

## Plain English Explanation

The paper describes two new ways that bad actors could secretly insert malicious content into the massive online datasets used to train popular AI models. 

The first attack, called "split-view poisoning", exploits the fact that internet content can change over time. An attacker could make a dataset annotator see one version of a web page, while secretly providing a different, malicious version to the model during training. By doing this for just 0.01% of a huge dataset like LAION-400M or COYO-700M, an attacker could poison the entire dataset for only $60.

The second attack, "frontrunning poisoning", targets datasets that regularly take snapshots of crowd-sourced content like Wikipedia. Here, an attacker only needs a short window of time to inject malicious examples before the snapshot is taken, allowing them to contaminate the entire dataset.

The researchers notified the maintainers of the affected datasets and suggested some simple defenses against these attacks.

## Technical Explanation

The paper introduces two novel dataset poisoning attacks that could be used to maliciously contaminate the large web-crawled datasets commonly used to train deep learning models.

In the "split-view poisoning" attack, the researchers exploit the mutable nature of internet content. By ensuring that a dataset annotator sees a benign version of a web page, while a subsequent client downloading the dataset receives a malicious version, the researchers show how they could have poisoned 0.01% of the LAION-400M or COYO-700M datasets for just $60.

The "frontrunning poisoning" attack targets datasets that take periodic snapshots of crowd-sourced content, like Wikipedia. Here, an attacker only needs a limited time window to inject malicious examples before the snapshot is taken, allowing them to contaminate the entire dataset.

The researchers responsibly disclosed these attacks to the maintainers of the affected datasets and provided recommendations for low-overhead defenses.

## Critical Analysis

The paper presents compelling evidence of the vulnerability of web-crawled datasets to poisoning attacks. The "split-view poisoning" and "frontrunning poisoning" techniques appear to be immediately practical and could be used to contaminate major datasets used in deep learning today.

One limitation is that the paper does not explore the long-term impact of these attacks on downstream model performance and robustness. While the researchers demonstrate the ability to insert malicious content, more research is needed to understand how this would translate to real-world harms.

Additionally, the proposed defenses, while sensible, may not be sufficient to fully mitigate these threats. Ongoing vigilance and more sophisticated techniques for detecting and removing malicious content may be necessary as attackers become more sophisticated.

Overall, this research highlights the need for deep learning practitioners to carefully scrutinize the data they use and implement robust safeguards against adversarial manipulation. As the field of AI continues to advance, addressing dataset security will be crucial to ensuring the reliability and trustworthiness of these powerful technologies.

## Conclusion

This paper introduces two practical attacks that bad actors could use to secretly contaminate the large web-crawled datasets commonly used to train deep learning models. By exploiting the mutable nature of internet content and the periodic snapshot approach of some datasets, the researchers demonstrate how attackers could poison 10 popular datasets for a low cost.

While the researchers provided some initial defenses, this work underscores the broader challenge of maintaining the integrity of training data in an era of web-scale AI. Addressing these dataset security vulnerabilities will be crucial to ensuring the reliability and trustworthiness of deep learning models as they become increasingly ubiquitous in our lives.