Phil Wang
|
db0642c4cd
|
quick fix for @marunine
|
2022-05-18 20:22:52 -07:00 |
|
Phil Wang
|
bb86ab2404
|
update sample, and set default gradient clipping value for decoder training
|
2022-05-16 17:38:30 -07:00 |
|
Phil Wang
|
c7ea8748db
|
default decoder learning rate to what was in the paper
|
2022-05-16 13:33:54 -07:00 |
|
Phil Wang
|
13382885d9
|
final update to dalle2 repository for a while - sampling from prior in chunks automatically with max_batch_size keyword given
|
2022-05-16 12:57:31 -07:00 |
|
Phil Wang
|
164d9be444
|
use a decorator and take care of sampling in chunks (max_batch_size keyword), in case one is sampling a huge grid of images
|
2022-05-16 12:34:28 -07:00 |
|
Phil Wang
|
89ff04cfe2
|
final tweak to EMA class
|
2022-05-16 11:54:34 -07:00 |
|
Phil Wang
|
f4016f6302
|
allow for overriding use of EMA during sampling in decoder trainer with use_non_ema keyword, also fix some issues with automatic normalization of images and low res conditioning image if latent diffusion is in play
|
2022-05-16 11:18:30 -07:00 |
|
Phil Wang
|
1212f7058d
|
allow text encodings and text mask to be passed in on forward and sampling for Decoder class
|
2022-05-16 10:40:32 -07:00 |
|
Phil Wang
|
dab106d4e5
|
back to no_grad for now, also keep track and restore unet devices in one_unet_in_gpu contextmanager
|
2022-05-16 09:36:14 -07:00 |
|
Phil Wang
|
bb151ca6b1
|
unet_number on decoder trainer only needs to be passed in if there is greater than 1 unet, so that unconditional training of a single ddpm is seamless (experiment in progress locally)
|
2022-05-16 09:17:17 -07:00 |
|
Phil Wang
|
ecf9e8027d
|
make sure classifier free guidance is used only if conditional dropout is present on the DiffusionPrior and Decoder classes. also make sure prior can have a different conditional scale than decoder
|
2022-05-15 19:09:38 -07:00 |
|
Phil Wang
|
36c5079bd7
|
LazyLinear is not mature, make users pass in text_embed_dim if text conditioning is turned on
|
2022-05-15 18:56:52 -07:00 |
|
Phil Wang
|
4a4c7ac9e6
|
cond drop prob for diffusion prior network should default to 0
|
2022-05-15 18:47:45 -07:00 |
|
Phil Wang
|
11d4e11f10
|
allow for training unconditional ddpm or cascading ddpms
|
2022-05-15 16:54:56 -07:00 |
|
Phil Wang
|
99778e12de
|
trainer classes now takes care of auto-casting numpy to torch tensors, and setting correct device based on model parameter devices
|
2022-05-15 15:25:45 -07:00 |
|
Phil Wang
|
7b7a62044a
|
use eval vs training mode to determine whether to call backprop on trainer forward
|
2022-05-15 14:20:59 -07:00 |
|
Phil Wang
|
68e7d2f241
|
make sure gradient accumulation feature works even if all arguments passed in are keyword arguments
|
2022-05-15 11:16:16 -07:00 |
|
Phil Wang
|
f7eee09d8b
|
0.2.30
|
2022-05-15 09:56:59 -07:00 |
|
Phil Wang
|
4ec6d0ba81
|
backwards pass is not recommended under the autocast context, per pytorch docs
|
2022-05-14 18:26:19 -07:00 |
|
Phil Wang
|
aee92dba4a
|
simplify more
|
2022-05-14 17:16:46 -07:00 |
|
Phil Wang
|
b0cd5f24b6
|
take care of gradient accumulation automatically for researchers, by passing in a max_batch_size on the decoder or diffusion prior trainer forward
|
2022-05-14 17:04:09 -07:00 |
|
Phil Wang
|
b494ed81d4
|
take care of backwards within trainer classes for diffusion prior and decoder, readying to take care of gradient accumulation as well (plus, unsure if loss should be backwards within autocast block)
|
2022-05-14 15:49:24 -07:00 |
|
Phil Wang
|
ff3474f05c
|
normalize conditioning tokens outside of cross attention blocks
|
2022-05-14 14:23:52 -07:00 |
|
Phil Wang
|
d5293f19f1
|
lineup with paper
|
2022-05-14 13:57:00 -07:00 |
|
Phil Wang
|
e697183849
|
be able to customize adam eps
|
2022-05-14 13:55:04 -07:00 |
|
Phil Wang
|
591d37e266
|
lower default initial learning rate to what Jonathan Ho had in his original repo
|
2022-05-14 13:22:43 -07:00 |
|
Phil Wang
|
d1f02e8f49
|
always use sandwich norm for attention layer
|
2022-05-14 12:13:41 -07:00 |
|
Phil Wang
|
9faab59b23
|
use post-attn-branch layernorm in attempt to stabilize cross attention conditioning in decoder
|
2022-05-14 11:58:09 -07:00 |
|
Phil Wang
|
5d27029e98
|
make sure lowres conditioning image is properly normalized to -1 to 1 for cascading ddpm
|
2022-05-14 01:23:54 -07:00 |
|
Phil Wang
|
3115fa17b3
|
fix everything around normalizing images to -1 to 1 for ddpm training automatically
|
2022-05-14 01:17:11 -07:00 |
|
Phil Wang
|
124d8577c8
|
move the inverse normalization function called before image embeddings are derived from clip to within the diffusion prior and decoder classes
|
2022-05-14 00:37:52 -07:00 |
|
Phil Wang
|
2277b47ffd
|
make sure learned variance can work for any number of unets in the decoder, defaults to first unet, as suggested was used in the paper
|
2022-05-12 14:18:15 -07:00 |
|
Phil Wang
|
924455d97d
|
align the ema model device back after sampling from the cascading ddpm in the decoder
|
2022-05-11 19:56:54 -07:00 |
|
Phil Wang
|
6021945fc8
|
default to l2 loss
|
2022-05-11 19:24:51 -07:00 |
|
Phil Wang
|
3dda2570ed
|
fix amp issue for https://github.com/lucidrains/DALLE2-pytorch/issues/82
|
2022-05-11 08:21:39 -07:00 |
|
Phil Wang
|
2f3c02dba8
|
numerical accuracy for noise schedule parameters
|
2022-05-10 15:28:46 -07:00 |
|
Phil Wang
|
908088cfea
|
wrap up cross embed layer feature
|
2022-05-10 12:19:34 -07:00 |
|
Phil Wang
|
35f89556ba
|
bring in the cross embed layer from Crossformer paper for initial convolution in unet
|
2022-05-10 11:50:38 -07:00 |
|
Phil Wang
|
2b55f753b9
|
fix new issue with github actions and auto pypi package uploading
|
2022-05-10 10:51:15 -07:00 |
|
Phil Wang
|
fc8fce38fb
|
make sure cascading DDPM can be trained unconditionally, to ready for CLI one command training for the public
|
2022-05-10 10:48:10 -07:00 |
|
Phil Wang
|
b1e7b5f6bb
|
make sure resnet groups in unet is finely customizable
|
2022-05-10 10:12:50 -07:00 |
|
Phil Wang
|
9b322ea634
|
patch
|
2022-05-09 19:46:19 -07:00 |
|
Phil Wang
|
ba64ea45cc
|
0.2.3
|
2022-05-09 16:50:31 -07:00 |
|
Phil Wang
|
db805e73e1
|
fix a bug with numerical stability in attention, sorry! 🐛
|
2022-05-09 16:23:37 -07:00 |
|
Phil Wang
|
e46eaec817
|
deal the diffusion prior problem yet another blow
|
2022-05-09 11:08:52 -07:00 |
|
Phil Wang
|
53c189e46a
|
give more surface area for attention in diffusion prior
|
2022-05-09 08:08:11 -07:00 |
|
Phil Wang
|
dde51fd362
|
revert restriction for classifier free guidance for diffusion prior, given @crowsonkb advice
|
2022-05-07 20:55:41 -07:00 |
|
Phil Wang
|
4010aec033
|
turn off classifier free guidance if predicting x_start for diffusion prior
|
2022-05-07 09:38:17 -07:00 |
|
Phil Wang
|
830afd3c15
|
sinusoidal embed time embeddings for diffusion prior as well, for continuous version
|
2022-05-07 08:32:43 -07:00 |
|
Phil Wang
|
8f93729d19
|
when in doubt, make it a hyperparameter
|
2022-05-07 07:52:17 -07:00 |
|