To improve the quality of low-resolution photos, Google has released new AI-based diffusion models. Image super-resolution (SR3) and cascaded diffusion models (CDM) are two novel diffusion models that can generate high-fidelity images using AI. These models may be used for a variety of tasks, including recovering ancient family pictures and improving medical imaging systems, as well as boosting the performance of downstream image classification, segmentation, and other models. In human assessments, the SR3 model, for example, is trained to turn a low-resolution image into a detailed high-resolution image result, outperforming current deep generative models such as generative adversarial networks (GANs).
Google Research’s Brain Team has detailed both the SR3 and CDM diffusion models in a post on Google’s AI blog. SR3 is a super-resolution diffusion model that takes a low-resolution image as input and generates a high-resolution image from pure noise. The model is based on an image corruption process that adds noise to a high-resolution image until all that’s left is pure noise. The SR3 model then reverses the process, starting with pure noise and gradually eliminating it to attain a goal distribution using the input low-resolution image as a guide.
Google has provided a few remarkable samples of how SR3 can scale a 64×64 pixel resolution image to a 1,024×1,024 pixel quality image. The end result of a 1,024×1,024 pixel resolution output is really remarkable, especially for face and natural photographs. When scaling to 4x to 8x higher resolutions, the tech giant claims that SR3 can produce strong benchmark results on the super-resolution assignment for face and natural photos.
To generate high-resolution natural images, the CDM diffusion model is trained on ImageNet data. Google created CDM as a cascade of many diffusion models since ImageNet is a tough, high-entropy dataset. Multiple generative models are chained together over multiple spatial resolutions in this cascade technique. The chain begins with a low-resolution diffusion model, followed by a series of SR3 super-resolution diffusion models that gradually enhance the resolution of the generated image to the greatest resolution. The low-resolution input image of each super-resolution model in the cascading pipeline, according to Google, is subjected to Gaussian noise and Gaussian blur. This is referred to as process conditioning augmentation, and it allows for better and higher resolution CDM sample quality.
Google claims that SR3 and CDM have “pushed diffusion model performance to the state-of-the-art on super-resolution and class-conditional ImageNet generation benchmarks.”