Deep neural networks (DNNs) trained for image denoising are able to generate high-quality samples with score-based reverse diffusion algorithms. These impressive capabilities seem to imply an escape from the curse of dimensionality, but recent reports of memorization of the training set raise the question of whether these networks are learning the true continuous density of the data. Here, we show that two DNNs trained on non-overlapping subsets of a dataset learn nearly the same score function, and thus the same density, when the number of training images is large enough. In this regime of strong generalization, diffusion-generated images are distinct from the training set, and are of high visual quality, suggesting that the inductive biases of the DNNs are well-aligned with the data density. We analyze the learned denoising functions and show that the inductive biases give rise to a shrinkage operation in a basis adapted to the underlying image. Examination of these bases reveals oscillating harmonic structures along contours and in homogeneous regions. We demonstrate that trained denoisers are inductively biased towards these geometry-adaptive harmonic bases since they arise not only when the network is trained on photographic images, but also when it is trained on image classes supported on low-dimensional manifolds for which the harmonic basis is suboptimal. Finally, we show that when trained on regular image classes for which the optimal basis is known to be geometry-adaptive and harmonic, the denoising performance of the networks is near-optimal.

## Overview

- Deep neural networks (DNNs) trained for image denoising can generate high-quality samples using score-based reverse diffusion algorithms.
- However, recent reports of training set memorization raise questions about whether these networks are truly learning the underlying data distribution.
- This paper investigates whether DNNs trained on non-overlapping subsets of a dataset learn the same score function and data density when the training set is large enough.

## Plain English Explanation

The paper looks at deep neural networks that have been trained to remove noise from images. These networks have shown impressive capabilities, generating high-quality images by [reversing the diffusion process](https://aimodels.fyi/papers/arxiv/lossy-image-compression-foundation-diffusion-models). This suggests the networks have learned a deep understanding of the underlying image data.

However, there have been concerns that the networks might simply be memorizing the training data, rather than learning the true continuous density of the data. To investigate this, the researchers trained two separate networks on non-overlapping subsets of the same dataset. They found that when the dataset was large enough, the two networks learned **nearly the same score function** - meaning they had learned the same underlying data density.

This suggests the networks' inductive biases are well-aligned with the true data distribution, and the high-quality images they generate are distinct from the training data. The researchers analyze the learned denoising functions and find that the networks are biased towards **geometry-adaptive harmonic bases**, which can capture important structures in the images.

Importantly, this bias towards harmonic bases arises even when the networks are trained on image classes that are not well-described by such bases, indicating it is a fundamental inductive bias of the networks. When trained on image classes where the optimal basis is known to be harmonic, the networks achieve near-optimal denoising performance.

## Technical Explanation

The researchers trained two separate deep neural networks (DNNs) on non-overlapping subsets of a dataset and found that when the dataset was large enough, the two networks learned **nearly the same score function**. The score function is a key component of [score-based generative models](https://aimodels.fyi/papers/arxiv/gda-generalized-diffusion-robust-test-time-adaptation) like the ones used to generate high-quality samples from the trained DNNs.

This suggests that in the regime of **strong generalization**, the inductive biases of the DNNs are well-aligned with the true underlying data density, and the generated samples are distinct from the training set.

Analysis of the learned denoising functions reveals that the networks are biased towards **geometry-adaptive harmonic bases**, which can efficiently capture important structures in the images, such as oscillating patterns along contours and in homogeneous regions. Interestingly, this bias arises even when the networks are trained on image classes that are not well-described by harmonic bases, indicating it is a fundamental inductive bias of the architecture.

When the networks are trained on image classes for which the optimal basis is known to be geometry-adaptive and harmonic, they achieve **near-optimal denoising performance**. This further supports the idea that the networks' inductive biases are well-matched to the true data distribution, allowing them to learn efficient representations.

## Critical Analysis

The paper provides compelling evidence that deep neural networks trained for image denoising are learning the true underlying data distribution, rather than simply memorizing the training set. The finding that two networks trained on non-overlapping subsets learn the same score function is a strong indicator of generalization.

However, the paper does not address the **scalability** of this approach. The researchers used a relatively small dataset (CIFAR-10) and it's unclear whether the same level of generalization would be observed with larger, more complex datasets like [ImageNet](https://aimodels.fyi/papers/arxiv/can-biases-imagenet-models-explain-generalization).

Additionally, the analysis of the learned denoising functions and the networks' bias towards harmonic bases is intriguing, but the paper does not provide a **theoretical explanation** for why this bias arises. Further research is needed to understand the underlying mechanisms that give rise to this inductive bias.

Overall, the paper makes a valuable contribution to our understanding of how deep neural networks learn representations of image data, and suggests that these models may be able to **escape the curse of dimensionality** under certain conditions. However, more work is needed to fully characterize the capabilities and limitations of these approaches.

## Conclusion

This paper provides evidence that deep neural networks trained for image denoising can learn the true underlying data distribution, rather than simply memorizing the training set. By training two networks on non-overlapping subsets of a dataset, the researchers show that the networks converge to the same **score function**, indicating strong generalization.

Analysis of the learned denoising functions reveals that the networks are biased towards **geometry-adaptive harmonic bases**, which can efficiently capture important structures in the images. This bias arises even for image classes that are not well-described by harmonic bases, suggesting it is a fundamental inductive bias of the architecture.

These findings have important implications for the [development of efficient and robust generative models](https://aimodels.fyi/papers/arxiv/addp-learning-general-representations-image-recognition-generation) that can escape the curse of dimensionality. Further research is needed to understand the scalability of these approaches and the theoretical underpinnings of the observed inductive biases.