Diffusion and score-based generative models have achieved remarkable sample quality on difficult image synthesis tasks. Many works have proposed samplers for pretrained diffusion models, including ancestral samplers, SDE and ODE integrators and annealed MCMC approaches. So far, the best sample quality has been achieved with samplers that use time-conditional score functions and move between several noise levels. However, estimating an accurate score function at many noise levels can be challenging and requires an architecture that is more expressive than would be needed for a single noise level. In this work, we explore MCMC sampling algorithms that operate at a single noise level, yet synthesize images with acceptable sample quality on the CIFAR-10 dataset. We show that while näive application of Langevin dynamics and a related noise-denoise sampler produces poor samples, methods built on integrators of underdamped Langevin dynamics using splitting methods can perform well. Further, by combining MCMC methods with existing multiscale samplers, we begin to approach competitive sample quality without using scores at large noise levels.
Our sampler generates a large diversity of images in a single run. These videos can be generated on one consumer-grade GPU in around 5 minutes. Unlike past work, we also only use a single noise level of the pretrained diffusion model, Stable Diffusion v2. Hover to pause, and drag the slider to change video speed.
We propose two samplers for diffusion models: the noise-denoise sampler and the infinite friction limit of the BAOAB sampler, then show how a special case of the noise-denoise sampler makes the two equivalent. BAOAB-limit is an MCMC sampler is sometimes used in statistical mechanics work.
Typically, diffusion models are sampled in a coarse-to-fine manner, first from a smoothed score at a high noise level and gradually at lower noise levels to approach the desired distribution. However, the loss function used to train diffusion models, denoising score matching, was originally proposed to estimate scores only at a single noise level, which is theoretically enough estimate the distribution. Our samplers are interesting for diffusion models because they allow good mixing between modes of the score function at a single noise level. This could simplify training.
They also have great diversity within a chain. That's useful because a user could generate a large batch of images or a video with a single run, rather than running many sampling chains in parallel. However, image quality is often slightly worse than alternative samplers.
@article{jain2022journey,
author = {Jain, Ajay and Poole, Ben},
title = {Journey to the BAOAB-limit: finding effective MCMC samplers for score-based models},
journal = {Workshop on Score-Based Methods at NeurIPS},
year = {2022},
}