Commit Graph

198 Commits

Author SHA1 Message Date
Phil Wang
4ec6d0ba81 backwards pass is not recommended under the autocast context, per pytorch docs 2022-05-14 18:26:19 -07:00
Phil Wang
aee92dba4a simplify more 2022-05-14 17:16:46 -07:00
Phil Wang
b0cd5f24b6 take care of gradient accumulation automatically for researchers, by passing in a max_batch_size on the decoder or diffusion prior trainer forward 2022-05-14 17:04:09 -07:00
Phil Wang
b494ed81d4 take care of backwards within trainer classes for diffusion prior and decoder, readying to take care of gradient accumulation as well (plus, unsure if loss should be backwards within autocast block) 2022-05-14 15:49:24 -07:00
Phil Wang
ff3474f05c normalize conditioning tokens outside of cross attention blocks 2022-05-14 14:23:52 -07:00
Phil Wang
d5293f19f1 lineup with paper 2022-05-14 13:57:00 -07:00
Phil Wang
e697183849 be able to customize adam eps 2022-05-14 13:55:04 -07:00
Phil Wang
591d37e266 lower default initial learning rate to what Jonathan Ho had in his original repo 2022-05-14 13:22:43 -07:00
Phil Wang
d1f02e8f49 always use sandwich norm for attention layer 2022-05-14 12:13:41 -07:00
Phil Wang
9faab59b23 use post-attn-branch layernorm in attempt to stabilize cross attention conditioning in decoder 2022-05-14 11:58:09 -07:00
Phil Wang
5d27029e98 make sure lowres conditioning image is properly normalized to -1 to 1 for cascading ddpm 2022-05-14 01:23:54 -07:00
Phil Wang
3115fa17b3 fix everything around normalizing images to -1 to 1 for ddpm training automatically 2022-05-14 01:17:11 -07:00
Phil Wang
124d8577c8 move the inverse normalization function called before image embeddings are derived from clip to within the diffusion prior and decoder classes 2022-05-14 00:37:52 -07:00
Phil Wang
2277b47ffd make sure learned variance can work for any number of unets in the decoder, defaults to first unet, as suggested was used in the paper 2022-05-12 14:18:15 -07:00
Phil Wang
924455d97d align the ema model device back after sampling from the cascading ddpm in the decoder 2022-05-11 19:56:54 -07:00
Phil Wang
6021945fc8 default to l2 loss 2022-05-11 19:24:51 -07:00
Phil Wang
3dda2570ed fix amp issue for https://github.com/lucidrains/DALLE2-pytorch/issues/82 2022-05-11 08:21:39 -07:00
Phil Wang
2f3c02dba8 numerical accuracy for noise schedule parameters 2022-05-10 15:28:46 -07:00
Phil Wang
908088cfea wrap up cross embed layer feature 2022-05-10 12:19:34 -07:00
Phil Wang
35f89556ba bring in the cross embed layer from Crossformer paper for initial convolution in unet 2022-05-10 11:50:38 -07:00
Phil Wang
2b55f753b9 fix new issue with github actions and auto pypi package uploading 2022-05-10 10:51:15 -07:00
Phil Wang
fc8fce38fb make sure cascading DDPM can be trained unconditionally, to ready for CLI one command training for the public 2022-05-10 10:48:10 -07:00
Phil Wang
b1e7b5f6bb make sure resnet groups in unet is finely customizable 2022-05-10 10:12:50 -07:00
Phil Wang
9b322ea634 patch 2022-05-09 19:46:19 -07:00
Phil Wang
ba64ea45cc 0.2.3 2022-05-09 16:50:31 -07:00
Phil Wang
db805e73e1 fix a bug with numerical stability in attention, sorry! 🐛 2022-05-09 16:23:37 -07:00
Phil Wang
e46eaec817 deal the diffusion prior problem yet another blow 2022-05-09 11:08:52 -07:00
Phil Wang
53c189e46a give more surface area for attention in diffusion prior 2022-05-09 08:08:11 -07:00
Phil Wang
dde51fd362 revert restriction for classifier free guidance for diffusion prior, given @crowsonkb advice 2022-05-07 20:55:41 -07:00
Phil Wang
4010aec033 turn off classifier free guidance if predicting x_start for diffusion prior 2022-05-07 09:38:17 -07:00
Phil Wang
830afd3c15 sinusoidal embed time embeddings for diffusion prior as well, for continuous version 2022-05-07 08:32:43 -07:00
Phil Wang
8f93729d19 when in doubt, make it a hyperparameter 2022-05-07 07:52:17 -07:00
Phil Wang
85ed77d512 fix a potentially huge bug thanks to @CiaoHe https://github.com/lucidrains/DALLE2-pytorch/issues/71 2022-05-07 05:05:54 -07:00
Phil Wang
3676ef4d49 make sure vqgan-vae trainer supports mixed precision 2022-05-06 10:44:16 -07:00
Phil Wang
28e944f328 make sure openai clip adapter outputs l2normed embeddings 2022-05-06 10:12:03 -07:00
Phil Wang
14e63a3f67 also offer l2norm clamping in diffusion prior during training, if one were using predict x0 objective 2022-05-06 10:05:14 -07:00
Phil Wang
ad20a14a4d bring in rotary embeddings for diffusion prior causal transformer (the most powerful relative positional encoding, used in PaLM) - 0.1.0 because of breaking change 2022-05-06 08:45:30 -07:00
Phil Wang
0be1e0d64c support CoCa, which seems to be better than CLIP (has an autoregressive text encoder) https://arxiv.org/abs/2205.01917 2022-05-06 08:27:12 -07:00
Phil Wang
98df1ba51e add diffusion prior trainer, which automatically takes care of the exponential moving average (training and sampling), as well as mixed precision, gradient clipping 2022-05-06 08:11:09 -07:00
Phil Wang
878b555ef7 fix training with clip 2022-05-06 07:37:57 -07:00
Phil Wang
c76a964fd6 allow for CLIP to be optional in Decoder, and allow DecoderTrainer to work off training pre-encoded image embeddings 2022-05-05 08:11:01 -07:00
Phil Wang
8518684ae9 does not make much sense, as researchers may want to try predicting noise with diffusionprior instead of predicting x0 2022-05-05 07:37:00 -07:00
Phil Wang
1d5dc08810 take @crowsonkb 's suggestion at https://github.com/lucidrains/DALLE2-pytorch/issues/60#issue-1226116132 2022-05-05 07:28:53 -07:00
Phil Wang
d8d8b6caf1 dataloaders for decoder training, from @Veldrovive 2022-05-05 07:09:45 -07:00
Aidan Dempster
15acc03bd4 Add a dataloader for training the decoder (#57)
* Added dataloader and updated requirements

* Added option to set embedding shard width separately from webdataset shard length.
There must be a better way to do this.

* Changed embedding loader to read using fsspec

* Moved the loader into a more compatible location

* Removed unnecessary package

* Fixed typo (Embeding -> Embedding)

* Simplified example embedding finder code to remove unnecessary get_file_list function

* Added example usage of ImageEmbeddingDataset

* Changed the name of create_dataloader to be more verbose
Added a dataloaders __init__.py
2022-05-05 07:08:45 -07:00
Phil Wang
896f19786d remove convnext blocks, they are illsuited for generative work, validated by early experimental results at https://github.com/lucidrains/video-diffusion-pytorch 2022-05-05 07:07:21 -07:00
Phil Wang
aec5575d09 take a bet on resize right, given Katherine is using it 2022-05-04 19:26:45 -07:00
Phil Wang
9773f10d6c use inference mode whenever possible, cleanup 2022-05-04 15:25:05 -07:00
Phil Wang
86e692d24f fix random crop probability 2022-05-04 11:52:24 -07:00
Phil Wang
97b751209f allow for last unet in the cascade to be trained on crops, if it is convolution-only 2022-05-04 11:48:48 -07:00