Diffusion model

A diffusion model is a type of generative model in deep learning that can create realistic synthetic data like images or audio. Diffusion models work by starting with random noise and gradually adding structure through a diffusion process which models how real data is generated.

Key advantages of diffusion models include:

Ability to generate high-quality, diverse outputs rivaling GAN models
Stable training process unlike normal autoencoder approaches
Flexibility to condition on text, labels, or other data
Controllable step-by-step generation by sampling at different timesteps

Diffusion models were introduced in 2020 but only recently have become widely used in computer vision through models like DALL-E and Stable Diffusion. The diffusion process adds noise to data x_0 via a Markov chain to get x_t at timestep t. This noisy x_t is then modeled by a neural network to estimate x_{t-1}, working backwards until x_0 is reconstructed.

Diffusion models can be thought of as denoising autoencoders that are trained to progressively denoise data x_t by removing the added noise. Sampling at earlier timesteps produces more variable outputs, while later timesteps give more realistic outputs. Important methods for improving diffusion models include classifier guidance and adversarial training techniques.