Commit Graph

174 Commits

Author SHA1 Message Date
Phil Wang
d1f02e8f49 always use sandwich norm for attention layer 2022-05-14 12:13:41 -07:00
Phil Wang
9faab59b23 use post-attn-branch layernorm in attempt to stabilize cross attention conditioning in decoder 2022-05-14 11:58:09 -07:00
Phil Wang
5d27029e98 make sure lowres conditioning image is properly normalized to -1 to 1 for cascading ddpm 2022-05-14 01:23:54 -07:00
Phil Wang
3115fa17b3 fix everything around normalizing images to -1 to 1 for ddpm training automatically 2022-05-14 01:17:11 -07:00
Phil Wang
124d8577c8 move the inverse normalization function called before image embeddings are derived from clip to within the diffusion prior and decoder classes 2022-05-14 00:37:52 -07:00
Phil Wang
2db0c9794c comments 2022-05-12 14:25:20 -07:00
Phil Wang
2277b47ffd make sure learned variance can work for any number of unets in the decoder, defaults to first unet, as suggested was used in the paper 2022-05-12 14:18:15 -07:00
Phil Wang
28b58e568c cleanup in preparation of option for learned variance 2022-05-12 12:04:52 -07:00
Phil Wang
924455d97d align the ema model device back after sampling from the cascading ddpm in the decoder 2022-05-11 19:56:54 -07:00
Phil Wang
6021945fc8 default to l2 loss 2022-05-11 19:24:51 -07:00
Phil Wang
3dda2570ed fix amp issue for https://github.com/lucidrains/DALLE2-pytorch/issues/82 2022-05-11 08:21:39 -07:00
Phil Wang
2f3c02dba8 numerical accuracy for noise schedule parameters 2022-05-10 15:28:46 -07:00
Phil Wang
908088cfea wrap up cross embed layer feature 2022-05-10 12:19:34 -07:00
Phil Wang
35f89556ba bring in the cross embed layer from Crossformer paper for initial convolution in unet 2022-05-10 11:50:38 -07:00
Phil Wang
fc8fce38fb make sure cascading DDPM can be trained unconditionally, to ready for CLI one command training for the public 2022-05-10 10:48:10 -07:00
Phil Wang
b1e7b5f6bb make sure resnet groups in unet is finely customizable 2022-05-10 10:12:50 -07:00
Phil Wang
9b322ea634 patch 2022-05-09 19:46:19 -07:00
Phil Wang
64f7be1926 some cleanup 2022-05-09 16:50:21 -07:00
Phil Wang
db805e73e1 fix a bug with numerical stability in attention, sorry! 🐛 2022-05-09 16:23:37 -07:00
Phil Wang
e46eaec817 deal the diffusion prior problem yet another blow 2022-05-09 11:08:52 -07:00
Kumar R
8647cb5e76 Val loss changes, with quite a few other changes. This is in place of the earlier PR(https://github.com/lucidrains/DALLE2-pytorch/pull/67) (#77)
* Val_loss changes - no rebased with lucidrains' master.

* Val Loss changes - now rebased with lucidrains' master

* train_diffusion_prior.py updates

* dalle2_pytorch.py updates

* __init__.py changes

* Update train_diffusion_prior.py

* Update dalle2_pytorch.py

* Update train_diffusion_prior.py

* Update train_diffusion_prior.py

* Update dalle2_pytorch.py

* Update train_diffusion_prior.py

* Update train_diffusion_prior.py

* Update train_diffusion_prior.py

* Update train_diffusion_prior.py

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md
2022-05-09 08:53:29 -07:00
Phil Wang
53c189e46a give more surface area for attention in diffusion prior 2022-05-09 08:08:11 -07:00
Phil Wang
dde51fd362 revert restriction for classifier free guidance for diffusion prior, given @crowsonkb advice 2022-05-07 20:55:41 -07:00
Phil Wang
4010aec033 turn off classifier free guidance if predicting x_start for diffusion prior 2022-05-07 09:38:17 -07:00
Phil Wang
830afd3c15 sinusoidal embed time embeddings for diffusion prior as well, for continuous version 2022-05-07 08:32:43 -07:00
Phil Wang
8f93729d19 when in doubt, make it a hyperparameter 2022-05-07 07:52:17 -07:00
Phil Wang
85ed77d512 fix a potentially huge bug thanks to @CiaoHe https://github.com/lucidrains/DALLE2-pytorch/issues/71 2022-05-07 05:05:54 -07:00
Phil Wang
3676ef4d49 make sure vqgan-vae trainer supports mixed precision 2022-05-06 10:44:16 -07:00
Phil Wang
28e944f328 make sure openai clip adapter outputs l2normed embeddings 2022-05-06 10:12:03 -07:00
Phil Wang
14e63a3f67 also offer l2norm clamping in diffusion prior during training, if one were using predict x0 objective 2022-05-06 10:05:14 -07:00
Phil Wang
ad20a14a4d bring in rotary embeddings for diffusion prior causal transformer (the most powerful relative positional encoding, used in PaLM) - 0.1.0 because of breaking change 2022-05-06 08:45:30 -07:00
Phil Wang
0be1e0d64c support CoCa, which seems to be better than CLIP (has an autoregressive text encoder) https://arxiv.org/abs/2205.01917 2022-05-06 08:27:12 -07:00
Phil Wang
98df1ba51e add diffusion prior trainer, which automatically takes care of the exponential moving average (training and sampling), as well as mixed precision, gradient clipping 2022-05-06 08:11:09 -07:00
Phil Wang
878b555ef7 fix training with clip 2022-05-06 07:37:57 -07:00
Phil Wang
c76a964fd6 allow for CLIP to be optional in Decoder, and allow DecoderTrainer to work off training pre-encoded image embeddings 2022-05-05 08:11:01 -07:00
Phil Wang
8518684ae9 does not make much sense, as researchers may want to try predicting noise with diffusionprior instead of predicting x0 2022-05-05 07:37:00 -07:00
Phil Wang
1d5dc08810 take @crowsonkb 's suggestion at https://github.com/lucidrains/DALLE2-pytorch/issues/60#issue-1226116132 2022-05-05 07:28:53 -07:00
Aidan Dempster
15acc03bd4 Add a dataloader for training the decoder (#57)
* Added dataloader and updated requirements

* Added option to set embedding shard width separately from webdataset shard length.
There must be a better way to do this.

* Changed embedding loader to read using fsspec

* Moved the loader into a more compatible location

* Removed unnecessary package

* Fixed typo (Embeding -> Embedding)

* Simplified example embedding finder code to remove unnecessary get_file_list function

* Added example usage of ImageEmbeddingDataset

* Changed the name of create_dataloader to be more verbose
Added a dataloaders __init__.py
2022-05-05 07:08:45 -07:00
Phil Wang
896f19786d remove convnext blocks, they are illsuited for generative work, validated by early experimental results at https://github.com/lucidrains/video-diffusion-pytorch 2022-05-05 07:07:21 -07:00
Phil Wang
aec5575d09 take a bet on resize right, given Katherine is using it 2022-05-04 19:26:45 -07:00
Phil Wang
9773f10d6c use inference mode whenever possible, cleanup 2022-05-04 15:25:05 -07:00
Phil Wang
86e692d24f fix random crop probability 2022-05-04 11:52:24 -07:00
Phil Wang
97b751209f allow for last unet in the cascade to be trained on crops, if it is convolution-only 2022-05-04 11:48:48 -07:00
Phil Wang
5b619c2fd5 make sure some hyperparameters for unet block is configurable 2022-05-04 11:18:32 -07:00
Phil Wang
9ff228188b offer old resnet blocks, from the original DDPM paper, just in case convnexts are unsuitable for generative work 2022-05-04 10:52:58 -07:00
Ray Bell
44b319cb57 add missing import (#56) 2022-05-04 07:42:20 -07:00
Phil Wang
70282de23b add ability to turn on normformer settings, given @borisdayma reported good results and some personal anecdata 2022-05-02 11:33:15 -07:00
Phil Wang
11469dc0c6 makes more sense to keep this as True as default, for stability 2022-05-02 10:50:55 -07:00
Phil Wang
0fc6c9cdf3 provide option to l2norm the output of the diffusion prior 2022-05-02 09:41:03 -07:00
Phil Wang
1924c7cc3d fix issue with mixed precision and gradient clipping 2022-05-02 09:20:19 -07:00