Implicit generation and generalization methods for energy-based models

We’ve made progress towards stable and scalable training ofenergy-based models⁠(opens in a new window)(EBMs) resulting in better sample quality and generalization ability than existing models. Generation in EBMs spends more compute to continually refine its answers and doing so can generate samples competitive withGANs⁠(opens in a new window)at low temperatures,while also having mode coverage guarantees oflikelihood-based models⁠(opens in a new window). We hope these findings stimulate further research into this promising class of models.

Generative modeling⁠(opens in a new window)is the task of observing data, such as images or text, and learning to model the underlying data distribution. Accomplishing this task leads models to understand high level features in data and synthesize examples that look like real data. Generative models have many applications in natural language, robotics, and computer vision.

Energy-based models represent probability distributions over data by assigning an unnormalized probability scalar (or “energy”) to each input data point. This provides useful modeling flexibility—any arbitrary model that outputs a real number given an input can be used as an energy model. The difficulty however, lies in sampling from these models.

To generate samples from EBMs, we use an iterative refinement process based onLangevin dynamics⁠(opens in a new window). Informally, this involves performing noisy gradient descent on the energy function to arrive at low-energy configurations (see paper for more details⁠(opens in a new window)). UnlikeGANs⁠(opens in a new window),VAEs⁠(opens in a new window), andFlow-based models⁠(opens in a new window), this approach does not require an explicit neural network to generate samples - samples are generated implicitly. The combination of EBMs and iterative refinement have the following benefits:

We found energy-based models are able to generate qualitatively and quantitatively high-quality images, especially when running the refinement process for a longer period at test time. By running iterative optimization on individual images, we can auto-complete images and morph images from one class (such as truck) to another (such as frog).

Original Completions Corruption

Image completions on conditional ImageNet model. Our models exhibit diversity in inpainting. Note that inputs are from test distribution and are not model samples, indicating coverage of test data.

Cross-class implicit sampling on a conditional model. The model is conditioned on a particular class but is initialized with an image from a separate class.

In addition to generating images, we found that energy-based models are able to generate stable robot dynamics trajectories across large number of timesteps. EBMs can generate a diverse set of possible futures, while feedforward models collapse to a mean prediction.

T = 0 T = 20 T = 40 T = 60 T = 80

Top down views of robot hand manipulation trajectories generated unconditionally from the same starting state (1st frame). The FC network predicts a hand that does not move, while the EBM is able to generate distinctively different trajectories that are feasible.

We tested energy-based models on classifying several differentout-of-distribution datasets⁠(opens in a new window)and found that energy-based models outperform other likelihood models such as Flow based and autoregressive models. We also tested classification using conditional energy-based models, and found that the resultant classification exhibited good generalization to adversarial perturbations. Our model—despite never being trained for classification—performed classification better than models explicitly trained againstadversarial perturbations⁠(opens in a new window).

We found evidence that suggest the following observations, though in no way are we certain that these observations are correct:

More tips, observations and failures from this research can be found inSection A.8 of the paper⁠(opens in a new window).

We found preliminary indications that we can compose multiple energy-based models via a product of experts model. We trained one model on different size shapes at a set position and another model on same size shape at different positions. By combining the resultant energy-based models, we were able to generate different size shapes at different locations, despite never seeing examples of both being changed.

Energy flow visualization

A 2D example of combining energy functions through their summation and the resulting sampling trajectories.

Compositionality is one of theunsolved challenges⁠(opens in a new window)facing AI systems today, and we are excited about what energy-based models can do here. If you are excited to work on energy-based models please considerapplying⁠to OpenAI!

Yilun Du, Igor Mordatch

Thanks to Ilya Sutskever, Greg Brockman, Bob McGrew, Johannes Otterbach, Jacob Steinhardt, Harri Edwards, Yura Burda, Jack Clark and Ashley Pilipiszyn for feedback on this blog post and manuscript.

Hierarchical text-conditional image generation with CLIP latents Publication Apr 13, 2022

DALL·E: Creating images from text Milestone Jan 5, 2021

Image GPT Publication Jun 17, 2020

Our Research * Research Index * Research Overview * Research Residency * Economic Research

Latest Advancements * GPT-5.5 * GPT-5.4 * GPT-5.3 Instant * GPT-5.3-Codex

Safety * Safety Approach * Security & Privacy * Trust & Transparency

ChatGPT * Explore ChatGPT(opens in a new window) * Business * Enterprise * Education * Pricing(opens in a new window) * Download(opens in a new window)

Sora * Sora Overview * Features * Pricing * Sora log in(opens in a new window)

API Platform * Platform Overview * Pricing * API log in(opens in a new window) * Documentation(opens in a new window) * Developer Forum(opens in a new window)

For Business * Business Overview * Solutions * Contact Sales

Company * About Us * Our Charter * Foundation(opens in a new window) * Careers * Brand

Support * Help Center(opens in a new window)

More * News * Stories * Academy * Livestreams * Podcast * RSS

Terms & Policies * Terms of Use * Privacy Policy * Other Policies

(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)

English United States

Implicit generation and generalization methods for energy-based models

AI Companies Intensify Lobbying Efforts in the US and Europe

Elon Musk and Sam Altman's Diverging Paths in AI Development

Lachy Groom to back India startup Pronto at a $200M valuation, sources say

Wikimedia and Indonesia Reach Agreement to Avoid Wikipedia Block

Latest Briefs