Files
DALLE2-pytorch/README.md
2022-04-07 09:53:56 -07:00

40 lines
1.2 KiB
Markdown

<img src="./dalle2.png" width="450px"></img>
## DALL-E 2 - Pytorch (wip)
Implementation of <a href="https://openai.com/dall-e-2/">DALL-E 2</a>, OpenAI's updated text-to-image synthesis neural network, in Pytorch
The main novelty seems to be an extra layer of indirection with the prior network (whether it is a transformer or a diffusion network), which predicts an image embedding based on the text embedding from CLIP.
This is SOTA for text-to-image now, but probably not for long.
It may also explore an extension of using latent diffusion in the decoder
## Citations
```bibtex
@misc{ramesh2022,
title = {Hierarchical Text-Conditional Image Generation with CLIP Latents},
author = {Aditya Ramesh et al},
year = {2022}
}
```
```bibtex
@misc{crowson2022,
author = {Katherine Crowson},
url = {https://twitter.com/rivershavewings}
}
```
```bibtex
@misc{rombach2021highresolution,
title = {High-Resolution Image Synthesis with Latent Diffusion Models},
author = {Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
year = {2021},
eprint = {2112.10752},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}
```