# Generalization in diffusion models arises from geometry-adaptive harmonic representations

2310.02557

0

18

🎯

## Abstract

Deep neural networks (DNNs) trained for image denoising are able to generate high-quality samples with score-based reverse diffusion algorithms. These impressive capabilities seem to imply an escape from the curse of dimensionality, but recent reports of memorization of the training set raise the question of whether these networks are learning the true continuous density of the data. Here, we show that two DNNs trained on non-overlapping subsets of a dataset learn nearly the same score function, and thus the same density, when the number of training images is large enough. In this regime of strong generalization, diffusion-generated images are distinct from the training set, and are of high visual quality, suggesting that the inductive biases of the DNNs are well-aligned with the data density. We analyze the learned denoising functions and show that the inductive biases give rise to a shrinkage operation in a basis adapted to the underlying image. Examination of these bases reveals oscillating harmonic structures along contours and in homogeneous regions. We demonstrate that trained denoisers are inductively biased towards these geometry-adaptive harmonic bases since they arise not only when the network is trained on photographic images, but also when it is trained on image classes supported on low-dimensional manifolds for which the harmonic basis is suboptimal. Finally, we show that when trained on regular image classes for which the optimal basis is known to be geometry-adaptive and harmonic, the denoising performance of the networks is near-optimal.

Get summaries of the top AI research delivered straight to your inbox:

## Overview

- Deep neural networks (DNNs) trained for image denoising can generate high-quality samples using score-based reverse diffusion algorithms.
- However, recent reports of training set memorization raise questions about whether these networks are truly learning the underlying data distribution.
- This paper investigates whether DNNs trained on non-overlapping subsets of a dataset learn the same score function and data density when the training set is large enough.

## Plain English Explanation

The paper looks at deep neural networks that have been trained to remove noise from images. These networks have shown impressive capabilities, generating high-quality images by reversing the diffusion process. This suggests the networks have learned a deep understanding of the underlying image data.

However, there have been concerns that the networks might simply be memorizing the training data, rather than learning the true continuous density of the data. To investigate this, the researchers trained two separate networks on non-overlapping subsets of the same dataset. They found that when the dataset was large enough, the two networks learned **nearly the same score function** - meaning they had learned the same underlying data density.

This suggests the networks' inductive biases are well-aligned with the true data distribution, and the high-quality images they generate are distinct from the training data. The researchers analyze the learned denoising functions and find that the networks are biased towards **geometry-adaptive harmonic bases**, which can capture important structures in the images.

Importantly, this bias towards harmonic bases arises even when the networks are trained on image classes that are not well-described by such bases, indicating it is a fundamental inductive bias of the networks. When trained on image classes where the optimal basis is known to be harmonic, the networks achieve near-optimal denoising performance.

## Technical Explanation

The researchers trained two separate deep neural networks (DNNs) on non-overlapping subsets of a dataset and found that when the dataset was large enough, the two networks learned **nearly the same score function**. The score function is a key component of score-based generative models like the ones used to generate high-quality samples from the trained DNNs.

This suggests that in the regime of **strong generalization**, the inductive biases of the DNNs are well-aligned with the true underlying data density, and the generated samples are distinct from the training set.

Analysis of the learned denoising functions reveals that the networks are biased towards **geometry-adaptive harmonic bases**, which can efficiently capture important structures in the images, such as oscillating patterns along contours and in homogeneous regions. Interestingly, this bias arises even when the networks are trained on image classes that are not well-described by harmonic bases, indicating it is a fundamental inductive bias of the architecture.

When the networks are trained on image classes for which the optimal basis is known to be geometry-adaptive and harmonic, they achieve **near-optimal denoising performance**. This further supports the idea that the networks' inductive biases are well-matched to the true data distribution, allowing them to learn efficient representations.

## Critical Analysis

The paper provides compelling evidence that deep neural networks trained for image denoising are learning the true underlying data distribution, rather than simply memorizing the training set. The finding that two networks trained on non-overlapping subsets learn the same score function is a strong indicator of generalization.

However, the paper does not address the **scalability** of this approach. The researchers used a relatively small dataset (CIFAR-10) and it's unclear whether the same level of generalization would be observed with larger, more complex datasets like ImageNet.

Additionally, the analysis of the learned denoising functions and the networks' bias towards harmonic bases is intriguing, but the paper does not provide a **theoretical explanation** for why this bias arises. Further research is needed to understand the underlying mechanisms that give rise to this inductive bias.

Overall, the paper makes a valuable contribution to our understanding of how deep neural networks learn representations of image data, and suggests that these models may be able to **escape the curse of dimensionality** under certain conditions. However, more work is needed to fully characterize the capabilities and limitations of these approaches.

## Conclusion

This paper provides evidence that deep neural networks trained for image denoising can learn the true underlying data distribution, rather than simply memorizing the training set. By training two networks on non-overlapping subsets of a dataset, the researchers show that the networks converge to the same **score function**, indicating strong generalization.

Analysis of the learned denoising functions reveals that the networks are biased towards **geometry-adaptive harmonic bases**, which can efficiently capture important structures in the images. This bias arises even for image classes that are not well-described by harmonic bases, suggesting it is a fundamental inductive bias of the architecture.

These findings have important implications for the development of efficient and robust generative models that can escape the curse of dimensionality. Further research is needed to understand the scalability of these approaches and the theoretical underpinnings of the observed inductive biases.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

## Related Papers

🤿

### Denoising: from classical methods to deep CNNs

Jean-Eric Campagne

0

0

This paper aims to explore the evolution of image denoising in a pedagological way. We briefly review classical methods such as Fourier analysis and wavelet bases, highlighting the challenges they faced until the emergence of neural networks, notably the U-Net, in the 2010s. The remarkable performance of these networks has been demonstrated in studies such as Kadkhodaie et al. (2024). They exhibit adaptability to various image types, including those with fixed regularity, facial images, and bedroom scenes, achieving optimal results and biased towards geometry-adaptive harmonic basis. The introduction of score diffusion has played a crucial role in image generation. In this context, denoising becomes essential as it facilitates the estimation of probability density scores. We discuss the prerequisites for genuine learning of probability densities, offering insights that extend from mathematical research to the implications of universal structures.

4/30/2024

### Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance

Tomer Garber, Tom Tirer

0

0

Training deep neural networks has become a common approach for addressing image restoration problems. An alternative for training a task-specific network for each observation model is to use pretrained deep denoisers for imposing only the signal's prior within iterative algorithms, without additional training. Recently, a sampling-based variant of this approach has become popular with the rise of diffusion/score-based generative models. Using denoisers for general purpose restoration requires guiding the iterations to ensure agreement of the signal with the observations. In low-noise settings, guidance that is based on back-projection (BP) has been shown to be a promising strategy (used recently also under the names pseudoinverse or range/null-space guidance). However, the presence of noise in the observations hinders the gains from this approach. In this paper, we propose a novel guidance technique, based on preconditioning that allows traversing from BP-based guidance to least squares based guidance along the restoration scheme. The proposed approach is robust to noise while still having much simpler implementation than alternative methods (e.g., it does not require SVD or a large number of iterations). We use it within both an optimization scheme and a sampling-based scheme, and demonstrate its advantages over existing methods for image deblurring and super-resolution.

4/16/2024

### Stimulating the Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling

Tong Li, Hansen Feng, Lizhi Wang, Zhiwei Xiong, Hua Huang

0

0

Image denoising is a fundamental problem in computational photography, where achieving high perception with low distortion is highly demanding. Current methods either struggle with perceptual quality or suffer from significant distortion. Recently, the emerging diffusion model has achieved state-of-the-art performance in various tasks and demonstrates great potential for image denoising. However, stimulating diffusion models for image denoising is not straightforward and requires solving several critical problems. For one thing, the input inconsistency hinders the connection between diffusion models and image denoising. For another, the content inconsistency between the generated image and the desired denoised image introduces distortion. To tackle these problems, we present a novel strategy called the Diffusion Model for Image Denoising (DMID) by understanding and rethinking the diffusion model from a denoising perspective. Our DMID strategy includes an adaptive embedding method that embeds the noisy image into a pre-trained unconditional diffusion model and an adaptive ensembling method that reduces distortion in the denoised image. Our DMID strategy achieves state-of-the-art performance on both distortion-based and perception-based metrics, for both Gaussian and real-world image denoising.The code is available at https://github.com/Li-Tong-621/DMID.

4/16/2024

🌿

### Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models

Takami Sato, Justin Yue, Nanze Chen, Ningfei Wang, Qi Alfred Chen

0

0

Denoising probabilistic diffusion models have shown breakthrough performance to generate more photo-realistic images or human-level illustrations than the prior models such as GANs. This high image-generation capability has stimulated the creation of many downstream applications in various areas. However, we find that this technology is actually a double-edged sword: We identify a new type of attack, called the Natural Denoising Diffusion (NDD) attack based on the finding that state-of-the-art deep neural network (DNN) models still hold their prediction even if we intentionally remove their robust features, which are essential to the human visual system (HVS), through text prompts. The NDD attack shows a significantly high capability to generate low-cost, model-agnostic, and transferable adversarial attacks by exploiting the natural attack capability in diffusion models. To systematically evaluate the risk of the NDD attack, we perform a large-scale empirical study with our newly created dataset, the Natural Denoising Diffusion Attack (NDDA) dataset. We evaluate the natural attack capability by answering 6 research questions. Through a user study, we find that it can achieve an 88% detection rate while being stealthy to 93% of human subjects; we also find that the non-robust features embedded by diffusion models contribute to the natural attack capability. To confirm the model-agnostic and transferable attack capability, we perform the NDD attack against the Tesla Model 3 and find that 73% of the physically printed attacks can be detected as stop signs. Our hope is that the study and dataset can help our community be aware of the risks in diffusion models and facilitate further research toward robust DNN models.

5/3/2024