Phil Wang
|
5d958713c0
|
fix classifier free guidance for image hiddens summed to time hiddens, thanks to @xvjiarui for finding this bug
|
2022-06-13 21:01:50 -07:00 |
|
Phil Wang
|
0f31980362
|
cleanup
|
2022-06-07 17:31:38 -07:00 |
|
Kashif Rasul
|
1a81670718
|
fix quadratic_beta_schedule (#141)
|
2022-06-06 08:45:14 -07:00 |
|
Phil Wang
|
ffd342e9d0
|
allow for an option to constrain the variance interpolation fraction coming out from the unet for learned variance, if it is turned on
|
2022-06-03 09:34:57 -07:00 |
|
Phil Wang
|
8cc278447e
|
just cast to right types for blur sigma and kernel size augs
|
2022-06-02 11:21:58 -07:00 |
|
Phil Wang
|
38cd62010c
|
allow for random blur sigma and kernel size augmentations on low res conditioning (need to reread paper to see if the augmentation value needs to be fed into the unet for conditioning as well)
|
2022-06-02 11:11:25 -07:00 |
|
Phil Wang
|
b693e0be03
|
default number of resnet blocks per layer in unet to 2 (in imagen it was 3 for base 64x64)
|
2022-05-30 10:06:48 -07:00 |
|
Phil Wang
|
a0bed30a84
|
additional conditioning on image embedding by summing to time embeddings (for FiLM like conditioning in subsequent layers), from passage found in paper by @mhh0318
|
2022-05-30 09:26:51 -07:00 |
|
Phil Wang
|
f4fe6c570d
|
allow for full customization of number of resnet blocks per down or upsampling layers in unet, as in imagen
|
2022-05-26 08:33:31 -07:00 |
|
Phil Wang
|
f23fab7ef7
|
switch over to scale shift conditioning, as it seems like Imagen and Glide used it and it may be important
|
2022-05-24 21:46:12 -07:00 |
|
Phil Wang
|
8864fd0aa7
|
bring in the dynamic thresholding technique from the Imagen paper, which purportedly improves classifier free guidance for the cascading ddpm
|
2022-05-24 18:15:14 -07:00 |
|
Phil Wang
|
fa533962bd
|
just use an assert to make sure clip image channels is never different than the channels of the diffusion prior and decoder, if clip is given
|
2022-05-22 22:43:14 -07:00 |
|
Phil Wang
|
276abf337b
|
fix and cleanup image size determination logic in decoder
|
2022-05-22 22:28:45 -07:00 |
|
Phil Wang
|
ae42d03006
|
allow for saving of additional fields on save method in trainers, and return loaded objects from the load method
|
2022-05-22 22:14:25 -07:00 |
|
Phil Wang
|
5c397c9d66
|
move neural network creations off the configuration file into the pydantic classes
|
2022-05-22 19:18:18 -07:00 |
|
Phil Wang
|
80497e9839
|
accept unets as list for decoder
|
2022-05-20 20:31:26 -07:00 |
|
Phil Wang
|
db0642c4cd
|
quick fix for @marunine
|
2022-05-18 20:22:52 -07:00 |
|
Phil Wang
|
f4016f6302
|
allow for overriding use of EMA during sampling in decoder trainer with use_non_ema keyword, also fix some issues with automatic normalization of images and low res conditioning image if latent diffusion is in play
|
2022-05-16 11:18:30 -07:00 |
|
Phil Wang
|
1212f7058d
|
allow text encodings and text mask to be passed in on forward and sampling for Decoder class
|
2022-05-16 10:40:32 -07:00 |
|
Phil Wang
|
dab106d4e5
|
back to no_grad for now, also keep track and restore unet devices in one_unet_in_gpu contextmanager
|
2022-05-16 09:36:14 -07:00 |
|
Phil Wang
|
ecf9e8027d
|
make sure classifier free guidance is used only if conditional dropout is present on the DiffusionPrior and Decoder classes. also make sure prior can have a different conditional scale than decoder
|
2022-05-15 19:09:38 -07:00 |
|
Phil Wang
|
36c5079bd7
|
LazyLinear is not mature, make users pass in text_embed_dim if text conditioning is turned on
|
2022-05-15 18:56:52 -07:00 |
|
Phil Wang
|
4a4c7ac9e6
|
cond drop prob for diffusion prior network should default to 0
|
2022-05-15 18:47:45 -07:00 |
|
Phil Wang
|
11d4e11f10
|
allow for training unconditional ddpm or cascading ddpms
|
2022-05-15 16:54:56 -07:00 |
|
Phil Wang
|
156fe5ed9f
|
final cleanup for the day
|
2022-05-15 12:38:41 -07:00 |
|
Phil Wang
|
ff3474f05c
|
normalize conditioning tokens outside of cross attention blocks
|
2022-05-14 14:23:52 -07:00 |
|
Phil Wang
|
d1f02e8f49
|
always use sandwich norm for attention layer
|
2022-05-14 12:13:41 -07:00 |
|
Phil Wang
|
9faab59b23
|
use post-attn-branch layernorm in attempt to stabilize cross attention conditioning in decoder
|
2022-05-14 11:58:09 -07:00 |
|
Phil Wang
|
5d27029e98
|
make sure lowres conditioning image is properly normalized to -1 to 1 for cascading ddpm
|
2022-05-14 01:23:54 -07:00 |
|
Phil Wang
|
3115fa17b3
|
fix everything around normalizing images to -1 to 1 for ddpm training automatically
|
2022-05-14 01:17:11 -07:00 |
|
Phil Wang
|
124d8577c8
|
move the inverse normalization function called before image embeddings are derived from clip to within the diffusion prior and decoder classes
|
2022-05-14 00:37:52 -07:00 |
|
Phil Wang
|
2db0c9794c
|
comments
|
2022-05-12 14:25:20 -07:00 |
|
Phil Wang
|
2277b47ffd
|
make sure learned variance can work for any number of unets in the decoder, defaults to first unet, as suggested was used in the paper
|
2022-05-12 14:18:15 -07:00 |
|
Phil Wang
|
28b58e568c
|
cleanup in preparation of option for learned variance
|
2022-05-12 12:04:52 -07:00 |
|
Phil Wang
|
6021945fc8
|
default to l2 loss
|
2022-05-11 19:24:51 -07:00 |
|
Phil Wang
|
3dda2570ed
|
fix amp issue for https://github.com/lucidrains/DALLE2-pytorch/issues/82
|
2022-05-11 08:21:39 -07:00 |
|
Phil Wang
|
2f3c02dba8
|
numerical accuracy for noise schedule parameters
|
2022-05-10 15:28:46 -07:00 |
|
Phil Wang
|
908088cfea
|
wrap up cross embed layer feature
|
2022-05-10 12:19:34 -07:00 |
|
Phil Wang
|
35f89556ba
|
bring in the cross embed layer from Crossformer paper for initial convolution in unet
|
2022-05-10 11:50:38 -07:00 |
|
Phil Wang
|
fc8fce38fb
|
make sure cascading DDPM can be trained unconditionally, to ready for CLI one command training for the public
|
2022-05-10 10:48:10 -07:00 |
|
Phil Wang
|
b1e7b5f6bb
|
make sure resnet groups in unet is finely customizable
|
2022-05-10 10:12:50 -07:00 |
|
Phil Wang
|
9b322ea634
|
patch
|
2022-05-09 19:46:19 -07:00 |
|
Phil Wang
|
64f7be1926
|
some cleanup
|
2022-05-09 16:50:21 -07:00 |
|
Phil Wang
|
db805e73e1
|
fix a bug with numerical stability in attention, sorry! 🐛
|
2022-05-09 16:23:37 -07:00 |
|
Phil Wang
|
e46eaec817
|
deal the diffusion prior problem yet another blow
|
2022-05-09 11:08:52 -07:00 |
|
Kumar R
|
8647cb5e76
|
Val loss changes, with quite a few other changes. This is in place of the earlier PR(https://github.com/lucidrains/DALLE2-pytorch/pull/67) (#77)
* Val_loss changes - no rebased with lucidrains' master.
* Val Loss changes - now rebased with lucidrains' master
* train_diffusion_prior.py updates
* dalle2_pytorch.py updates
* __init__.py changes
* Update train_diffusion_prior.py
* Update dalle2_pytorch.py
* Update train_diffusion_prior.py
* Update train_diffusion_prior.py
* Update dalle2_pytorch.py
* Update train_diffusion_prior.py
* Update train_diffusion_prior.py
* Update train_diffusion_prior.py
* Update train_diffusion_prior.py
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
|
2022-05-09 08:53:29 -07:00 |
|
Phil Wang
|
53c189e46a
|
give more surface area for attention in diffusion prior
|
2022-05-09 08:08:11 -07:00 |
|
Phil Wang
|
dde51fd362
|
revert restriction for classifier free guidance for diffusion prior, given @crowsonkb advice
|
2022-05-07 20:55:41 -07:00 |
|
Phil Wang
|
4010aec033
|
turn off classifier free guidance if predicting x_start for diffusion prior
|
2022-05-07 09:38:17 -07:00 |
|
Phil Wang
|
830afd3c15
|
sinusoidal embed time embeddings for diffusion prior as well, for continuous version
|
2022-05-07 08:32:43 -07:00 |
|