Phil Wang
1212f7058d
allow text encodings and text mask to be passed in on forward and sampling for Decoder class
2022-05-16 10:40:32 -07:00
Phil Wang
dab106d4e5
back to no_grad for now, also keep track and restore unet devices in one_unet_in_gpu contextmanager
2022-05-16 09:36:14 -07:00
Phil Wang
bb151ca6b1
unet_number on decoder trainer only needs to be passed in if there is greater than 1 unet, so that unconditional training of a single ddpm is seamless (experiment in progress locally)
2022-05-16 09:17:17 -07:00
zion
4a59dea4cf
Migrate to text-conditioned prior training ( #95 )
...
* migrate to conditioned prior
* unify reader logic with a wrapper (#1 )
* separate out reader logic
* support both training methods
* Update train prior to use embedding wrapper (#3 )
* Support Both Methods
* bug fixes
* small bug fixes
* embedding only wrapper bug
* use smaller val perc
* final bug fix for embedding-only
Co-authored-by: nousr <>
2022-05-15 20:16:38 -07:00
Phil Wang
ecf9e8027d
make sure classifier free guidance is used only if conditional dropout is present on the DiffusionPrior and Decoder classes. also make sure prior can have a different conditional scale than decoder
2022-05-15 19:09:38 -07:00
Phil Wang
36c5079bd7
LazyLinear is not mature, make users pass in text_embed_dim if text conditioning is turned on
2022-05-15 18:56:52 -07:00
Phil Wang
4a4c7ac9e6
cond drop prob for diffusion prior network should default to 0
2022-05-15 18:47:45 -07:00
Phil Wang
11d4e11f10
allow for training unconditional ddpm or cascading ddpms
2022-05-15 16:54:56 -07:00
Phil Wang
99778e12de
trainer classes now takes care of auto-casting numpy to torch tensors, and setting correct device based on model parameter devices
2022-05-15 15:25:45 -07:00
Phil Wang
7b7a62044a
use eval vs training mode to determine whether to call backprop on trainer forward
2022-05-15 14:20:59 -07:00
Phil Wang
156fe5ed9f
final cleanup for the day
2022-05-15 12:38:41 -07:00
Phil Wang
e66c7b0249
incorrect naming
2022-05-15 11:23:52 -07:00
Phil Wang
68e7d2f241
make sure gradient accumulation feature works even if all arguments passed in are keyword arguments
2022-05-15 11:16:16 -07:00
Phil Wang
aa6772dcff
make sure optimizer and scaler is reloaded on resume for training diffusion prior script, move argparse to click
2022-05-15 10:48:10 -07:00
Phil Wang
89de5af63e
experiment tracker agnostic
2022-05-15 09:56:40 -07:00
Phil Wang
4ec6d0ba81
backwards pass is not recommended under the autocast context, per pytorch docs
2022-05-14 18:26:19 -07:00
Phil Wang
aee92dba4a
simplify more
2022-05-14 17:16:46 -07:00
Phil Wang
b0cd5f24b6
take care of gradient accumulation automatically for researchers, by passing in a max_batch_size on the decoder or diffusion prior trainer forward
2022-05-14 17:04:09 -07:00
Phil Wang
b494ed81d4
take care of backwards within trainer classes for diffusion prior and decoder, readying to take care of gradient accumulation as well (plus, unsure if loss should be backwards within autocast block)
2022-05-14 15:49:24 -07:00
Phil Wang
ff3474f05c
normalize conditioning tokens outside of cross attention blocks
2022-05-14 14:23:52 -07:00
Phil Wang
d5293f19f1
lineup with paper
2022-05-14 13:57:00 -07:00
Phil Wang
e697183849
be able to customize adam eps
2022-05-14 13:55:04 -07:00
Phil Wang
591d37e266
lower default initial learning rate to what Jonathan Ho had in his original repo
2022-05-14 13:22:43 -07:00
Phil Wang
d1f02e8f49
always use sandwich norm for attention layer
2022-05-14 12:13:41 -07:00
Phil Wang
9faab59b23
use post-attn-branch layernorm in attempt to stabilize cross attention conditioning in decoder
2022-05-14 11:58:09 -07:00
Phil Wang
5d27029e98
make sure lowres conditioning image is properly normalized to -1 to 1 for cascading ddpm
2022-05-14 01:23:54 -07:00
Phil Wang
3115fa17b3
fix everything around normalizing images to -1 to 1 for ddpm training automatically
2022-05-14 01:17:11 -07:00
Phil Wang
124d8577c8
move the inverse normalization function called before image embeddings are derived from clip to within the diffusion prior and decoder classes
2022-05-14 00:37:52 -07:00
Phil Wang
2db0c9794c
comments
2022-05-12 14:25:20 -07:00
Phil Wang
2277b47ffd
make sure learned variance can work for any number of unets in the decoder, defaults to first unet, as suggested was used in the paper
2022-05-12 14:18:15 -07:00
Phil Wang
28b58e568c
cleanup in preparation of option for learned variance
2022-05-12 12:04:52 -07:00
Phil Wang
924455d97d
align the ema model device back after sampling from the cascading ddpm in the decoder
2022-05-11 19:56:54 -07:00
Phil Wang
6021945fc8
default to l2 loss
2022-05-11 19:24:51 -07:00
Phil Wang
3dda2570ed
fix amp issue for https://github.com/lucidrains/DALLE2-pytorch/issues/82
2022-05-11 08:21:39 -07:00
Phil Wang
2f3c02dba8
numerical accuracy for noise schedule parameters
2022-05-10 15:28:46 -07:00
Phil Wang
908088cfea
wrap up cross embed layer feature
2022-05-10 12:19:34 -07:00
Phil Wang
35f89556ba
bring in the cross embed layer from Crossformer paper for initial convolution in unet
2022-05-10 11:50:38 -07:00
Phil Wang
fc8fce38fb
make sure cascading DDPM can be trained unconditionally, to ready for CLI one command training for the public
2022-05-10 10:48:10 -07:00
Phil Wang
b1e7b5f6bb
make sure resnet groups in unet is finely customizable
2022-05-10 10:12:50 -07:00
Phil Wang
9b322ea634
patch
2022-05-09 19:46:19 -07:00
Phil Wang
64f7be1926
some cleanup
2022-05-09 16:50:21 -07:00
Phil Wang
db805e73e1
fix a bug with numerical stability in attention, sorry! 🐛
2022-05-09 16:23:37 -07:00
Phil Wang
e46eaec817
deal the diffusion prior problem yet another blow
2022-05-09 11:08:52 -07:00
Kumar R
8647cb5e76
Val loss changes, with quite a few other changes. This is in place of the earlier PR( https://github.com/lucidrains/DALLE2-pytorch/pull/67 ) ( #77 )
...
* Val_loss changes - no rebased with lucidrains' master.
* Val Loss changes - now rebased with lucidrains' master
* train_diffusion_prior.py updates
* dalle2_pytorch.py updates
* __init__.py changes
* Update train_diffusion_prior.py
* Update dalle2_pytorch.py
* Update train_diffusion_prior.py
* Update train_diffusion_prior.py
* Update dalle2_pytorch.py
* Update train_diffusion_prior.py
* Update train_diffusion_prior.py
* Update train_diffusion_prior.py
* Update train_diffusion_prior.py
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
2022-05-09 08:53:29 -07:00
Phil Wang
53c189e46a
give more surface area for attention in diffusion prior
2022-05-09 08:08:11 -07:00
Phil Wang
dde51fd362
revert restriction for classifier free guidance for diffusion prior, given @crowsonkb advice
2022-05-07 20:55:41 -07:00
Phil Wang
4010aec033
turn off classifier free guidance if predicting x_start for diffusion prior
2022-05-07 09:38:17 -07:00
Phil Wang
830afd3c15
sinusoidal embed time embeddings for diffusion prior as well, for continuous version
2022-05-07 08:32:43 -07:00
Phil Wang
8f93729d19
when in doubt, make it a hyperparameter
2022-05-07 07:52:17 -07:00
Phil Wang
85ed77d512
fix a potentially huge bug thanks to @CiaoHe https://github.com/lucidrains/DALLE2-pytorch/issues/71
2022-05-07 05:05:54 -07:00