## DALL-E 2 - Pytorch (wip) Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch The main novelty seems to be an extra layer of indirection with the prior network (whether it is a transformer or a diffusion network), which predicts an image embedding based on the text embedding from CLIP. This is SOTA for text-to-image now, but probably not for long. It may also explore an extension of using latent diffusion in the decoder ## Citations ```bibtex @misc{ramesh2022, title = {Hierarchical Text-Conditional Image Generation with CLIP Latents}, author = {Aditya Ramesh et al}, year = {2022} } ``` ```bibtex @misc{crowson2022, author = {Katherine Crowson}, url = {https://twitter.com/rivershavewings} } ``` ```bibtex @misc{rombach2021highresolution, title = {High-Resolution Image Synthesis with Latent Diffusion Models}, author = {Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer}, year = {2021}, eprint = {2112.10752}, archivePrefix = {arXiv}, primaryClass = {cs.CV} } ```