Phil Wang
|
1212f7058d
|
allow text encodings and text mask to be passed in on forward and sampling for Decoder class
|
2022-05-16 10:40:32 -07:00 |
|
Phil Wang
|
dab106d4e5
|
back to no_grad for now, also keep track and restore unet devices in one_unet_in_gpu contextmanager
|
2022-05-16 09:36:14 -07:00 |
|
Phil Wang
|
ecf9e8027d
|
make sure classifier free guidance is used only if conditional dropout is present on the DiffusionPrior and Decoder classes. also make sure prior can have a different conditional scale than decoder
|
2022-05-15 19:09:38 -07:00 |
|
Phil Wang
|
36c5079bd7
|
LazyLinear is not mature, make users pass in text_embed_dim if text conditioning is turned on
|
2022-05-15 18:56:52 -07:00 |
|
Phil Wang
|
4a4c7ac9e6
|
cond drop prob for diffusion prior network should default to 0
|
2022-05-15 18:47:45 -07:00 |
|
Phil Wang
|
11d4e11f10
|
allow for training unconditional ddpm or cascading ddpms
|
2022-05-15 16:54:56 -07:00 |
|
Phil Wang
|
156fe5ed9f
|
final cleanup for the day
|
2022-05-15 12:38:41 -07:00 |
|
Phil Wang
|
ff3474f05c
|
normalize conditioning tokens outside of cross attention blocks
|
2022-05-14 14:23:52 -07:00 |
|
Phil Wang
|
d1f02e8f49
|
always use sandwich norm for attention layer
|
2022-05-14 12:13:41 -07:00 |
|
Phil Wang
|
9faab59b23
|
use post-attn-branch layernorm in attempt to stabilize cross attention conditioning in decoder
|
2022-05-14 11:58:09 -07:00 |
|
Phil Wang
|
5d27029e98
|
make sure lowres conditioning image is properly normalized to -1 to 1 for cascading ddpm
|
2022-05-14 01:23:54 -07:00 |
|
Phil Wang
|
3115fa17b3
|
fix everything around normalizing images to -1 to 1 for ddpm training automatically
|
2022-05-14 01:17:11 -07:00 |
|
Phil Wang
|
124d8577c8
|
move the inverse normalization function called before image embeddings are derived from clip to within the diffusion prior and decoder classes
|
2022-05-14 00:37:52 -07:00 |
|
Phil Wang
|
2db0c9794c
|
comments
|
2022-05-12 14:25:20 -07:00 |
|
Phil Wang
|
2277b47ffd
|
make sure learned variance can work for any number of unets in the decoder, defaults to first unet, as suggested was used in the paper
|
2022-05-12 14:18:15 -07:00 |
|
Phil Wang
|
28b58e568c
|
cleanup in preparation of option for learned variance
|
2022-05-12 12:04:52 -07:00 |
|
Phil Wang
|
6021945fc8
|
default to l2 loss
|
2022-05-11 19:24:51 -07:00 |
|
Phil Wang
|
3dda2570ed
|
fix amp issue for https://github.com/lucidrains/DALLE2-pytorch/issues/82
|
2022-05-11 08:21:39 -07:00 |
|
Phil Wang
|
2f3c02dba8
|
numerical accuracy for noise schedule parameters
|
2022-05-10 15:28:46 -07:00 |
|
Phil Wang
|
908088cfea
|
wrap up cross embed layer feature
|
2022-05-10 12:19:34 -07:00 |
|
Phil Wang
|
35f89556ba
|
bring in the cross embed layer from Crossformer paper for initial convolution in unet
|
2022-05-10 11:50:38 -07:00 |
|
Phil Wang
|
fc8fce38fb
|
make sure cascading DDPM can be trained unconditionally, to ready for CLI one command training for the public
|
2022-05-10 10:48:10 -07:00 |
|
Phil Wang
|
b1e7b5f6bb
|
make sure resnet groups in unet is finely customizable
|
2022-05-10 10:12:50 -07:00 |
|
Phil Wang
|
9b322ea634
|
patch
|
2022-05-09 19:46:19 -07:00 |
|
Phil Wang
|
64f7be1926
|
some cleanup
|
2022-05-09 16:50:21 -07:00 |
|
Phil Wang
|
db805e73e1
|
fix a bug with numerical stability in attention, sorry! 🐛
|
2022-05-09 16:23:37 -07:00 |
|
Phil Wang
|
e46eaec817
|
deal the diffusion prior problem yet another blow
|
2022-05-09 11:08:52 -07:00 |
|
Kumar R
|
8647cb5e76
|
Val loss changes, with quite a few other changes. This is in place of the earlier PR(https://github.com/lucidrains/DALLE2-pytorch/pull/67) (#77)
* Val_loss changes - no rebased with lucidrains' master.
* Val Loss changes - now rebased with lucidrains' master
* train_diffusion_prior.py updates
* dalle2_pytorch.py updates
* __init__.py changes
* Update train_diffusion_prior.py
* Update dalle2_pytorch.py
* Update train_diffusion_prior.py
* Update train_diffusion_prior.py
* Update dalle2_pytorch.py
* Update train_diffusion_prior.py
* Update train_diffusion_prior.py
* Update train_diffusion_prior.py
* Update train_diffusion_prior.py
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
|
2022-05-09 08:53:29 -07:00 |
|
Phil Wang
|
53c189e46a
|
give more surface area for attention in diffusion prior
|
2022-05-09 08:08:11 -07:00 |
|
Phil Wang
|
dde51fd362
|
revert restriction for classifier free guidance for diffusion prior, given @crowsonkb advice
|
2022-05-07 20:55:41 -07:00 |
|
Phil Wang
|
4010aec033
|
turn off classifier free guidance if predicting x_start for diffusion prior
|
2022-05-07 09:38:17 -07:00 |
|
Phil Wang
|
830afd3c15
|
sinusoidal embed time embeddings for diffusion prior as well, for continuous version
|
2022-05-07 08:32:43 -07:00 |
|
Phil Wang
|
8f93729d19
|
when in doubt, make it a hyperparameter
|
2022-05-07 07:52:17 -07:00 |
|
Phil Wang
|
85ed77d512
|
fix a potentially huge bug thanks to @CiaoHe https://github.com/lucidrains/DALLE2-pytorch/issues/71
|
2022-05-07 05:05:54 -07:00 |
|
Phil Wang
|
28e944f328
|
make sure openai clip adapter outputs l2normed embeddings
|
2022-05-06 10:12:03 -07:00 |
|
Phil Wang
|
14e63a3f67
|
also offer l2norm clamping in diffusion prior during training, if one were using predict x0 objective
|
2022-05-06 10:05:14 -07:00 |
|
Phil Wang
|
ad20a14a4d
|
bring in rotary embeddings for diffusion prior causal transformer (the most powerful relative positional encoding, used in PaLM) - 0.1.0 because of breaking change
|
2022-05-06 08:45:30 -07:00 |
|
Phil Wang
|
0be1e0d64c
|
support CoCa, which seems to be better than CLIP (has an autoregressive text encoder) https://arxiv.org/abs/2205.01917
|
2022-05-06 08:27:12 -07:00 |
|
Phil Wang
|
98df1ba51e
|
add diffusion prior trainer, which automatically takes care of the exponential moving average (training and sampling), as well as mixed precision, gradient clipping
|
2022-05-06 08:11:09 -07:00 |
|
Phil Wang
|
878b555ef7
|
fix training with clip
|
2022-05-06 07:37:57 -07:00 |
|
Phil Wang
|
c76a964fd6
|
allow for CLIP to be optional in Decoder, and allow DecoderTrainer to work off training pre-encoded image embeddings
|
2022-05-05 08:11:01 -07:00 |
|
Phil Wang
|
8518684ae9
|
does not make much sense, as researchers may want to try predicting noise with diffusionprior instead of predicting x0
|
2022-05-05 07:37:00 -07:00 |
|
Phil Wang
|
1d5dc08810
|
take @crowsonkb 's suggestion at https://github.com/lucidrains/DALLE2-pytorch/issues/60#issue-1226116132
|
2022-05-05 07:28:53 -07:00 |
|
Phil Wang
|
896f19786d
|
remove convnext blocks, they are illsuited for generative work, validated by early experimental results at https://github.com/lucidrains/video-diffusion-pytorch
|
2022-05-05 07:07:21 -07:00 |
|
Phil Wang
|
aec5575d09
|
take a bet on resize right, given Katherine is using it
|
2022-05-04 19:26:45 -07:00 |
|
Phil Wang
|
9773f10d6c
|
use inference mode whenever possible, cleanup
|
2022-05-04 15:25:05 -07:00 |
|
Phil Wang
|
86e692d24f
|
fix random crop probability
|
2022-05-04 11:52:24 -07:00 |
|
Phil Wang
|
97b751209f
|
allow for last unet in the cascade to be trained on crops, if it is convolution-only
|
2022-05-04 11:48:48 -07:00 |
|
Phil Wang
|
5b619c2fd5
|
make sure some hyperparameters for unet block is configurable
|
2022-05-04 11:18:32 -07:00 |
|
Phil Wang
|
9ff228188b
|
offer old resnet blocks, from the original DDPM paper, just in case convnexts are unsuitable for generative work
|
2022-05-04 10:52:58 -07:00 |
|