Phil Wang
|
dab106d4e5
|
back to no_grad for now, also keep track and restore unet devices in one_unet_in_gpu contextmanager
|
2022-05-16 09:36:14 -07:00 |
|
Phil Wang
|
bb151ca6b1
|
unet_number on decoder trainer only needs to be passed in if there is greater than 1 unet, so that unconditional training of a single ddpm is seamless (experiment in progress locally)
|
2022-05-16 09:17:17 -07:00 |
|
Phil Wang
|
ecf9e8027d
|
make sure classifier free guidance is used only if conditional dropout is present on the DiffusionPrior and Decoder classes. also make sure prior can have a different conditional scale than decoder
|
2022-05-15 19:09:38 -07:00 |
|
Phil Wang
|
36c5079bd7
|
LazyLinear is not mature, make users pass in text_embed_dim if text conditioning is turned on
|
2022-05-15 18:56:52 -07:00 |
|
Phil Wang
|
4a4c7ac9e6
|
cond drop prob for diffusion prior network should default to 0
|
2022-05-15 18:47:45 -07:00 |
|
Phil Wang
|
11d4e11f10
|
allow for training unconditional ddpm or cascading ddpms
|
2022-05-15 16:54:56 -07:00 |
|
Phil Wang
|
99778e12de
|
trainer classes now takes care of auto-casting numpy to torch tensors, and setting correct device based on model parameter devices
|
2022-05-15 15:25:45 -07:00 |
|
Phil Wang
|
7b7a62044a
|
use eval vs training mode to determine whether to call backprop on trainer forward
|
2022-05-15 14:20:59 -07:00 |
|
Phil Wang
|
68e7d2f241
|
make sure gradient accumulation feature works even if all arguments passed in are keyword arguments
|
2022-05-15 11:16:16 -07:00 |
|
Phil Wang
|
f7eee09d8b
|
0.2.30
|
2022-05-15 09:56:59 -07:00 |
|
Phil Wang
|
4ec6d0ba81
|
backwards pass is not recommended under the autocast context, per pytorch docs
|
2022-05-14 18:26:19 -07:00 |
|
Phil Wang
|
aee92dba4a
|
simplify more
|
2022-05-14 17:16:46 -07:00 |
|
Phil Wang
|
b0cd5f24b6
|
take care of gradient accumulation automatically for researchers, by passing in a max_batch_size on the decoder or diffusion prior trainer forward
|
2022-05-14 17:04:09 -07:00 |
|
Phil Wang
|
b494ed81d4
|
take care of backwards within trainer classes for diffusion prior and decoder, readying to take care of gradient accumulation as well (plus, unsure if loss should be backwards within autocast block)
|
2022-05-14 15:49:24 -07:00 |
|
Phil Wang
|
ff3474f05c
|
normalize conditioning tokens outside of cross attention blocks
|
2022-05-14 14:23:52 -07:00 |
|
Phil Wang
|
d5293f19f1
|
lineup with paper
|
2022-05-14 13:57:00 -07:00 |
|
Phil Wang
|
e697183849
|
be able to customize adam eps
|
2022-05-14 13:55:04 -07:00 |
|
Phil Wang
|
591d37e266
|
lower default initial learning rate to what Jonathan Ho had in his original repo
|
2022-05-14 13:22:43 -07:00 |
|
Phil Wang
|
d1f02e8f49
|
always use sandwich norm for attention layer
|
2022-05-14 12:13:41 -07:00 |
|
Phil Wang
|
9faab59b23
|
use post-attn-branch layernorm in attempt to stabilize cross attention conditioning in decoder
|
2022-05-14 11:58:09 -07:00 |
|
Phil Wang
|
5d27029e98
|
make sure lowres conditioning image is properly normalized to -1 to 1 for cascading ddpm
|
2022-05-14 01:23:54 -07:00 |
|
Phil Wang
|
3115fa17b3
|
fix everything around normalizing images to -1 to 1 for ddpm training automatically
|
2022-05-14 01:17:11 -07:00 |
|
Phil Wang
|
124d8577c8
|
move the inverse normalization function called before image embeddings are derived from clip to within the diffusion prior and decoder classes
|
2022-05-14 00:37:52 -07:00 |
|
Phil Wang
|
2277b47ffd
|
make sure learned variance can work for any number of unets in the decoder, defaults to first unet, as suggested was used in the paper
|
2022-05-12 14:18:15 -07:00 |
|
Phil Wang
|
924455d97d
|
align the ema model device back after sampling from the cascading ddpm in the decoder
|
2022-05-11 19:56:54 -07:00 |
|
Phil Wang
|
6021945fc8
|
default to l2 loss
|
2022-05-11 19:24:51 -07:00 |
|
Phil Wang
|
3dda2570ed
|
fix amp issue for https://github.com/lucidrains/DALLE2-pytorch/issues/82
|
2022-05-11 08:21:39 -07:00 |
|
Phil Wang
|
2f3c02dba8
|
numerical accuracy for noise schedule parameters
|
2022-05-10 15:28:46 -07:00 |
|
Phil Wang
|
908088cfea
|
wrap up cross embed layer feature
|
2022-05-10 12:19:34 -07:00 |
|
Phil Wang
|
35f89556ba
|
bring in the cross embed layer from Crossformer paper for initial convolution in unet
|
2022-05-10 11:50:38 -07:00 |
|
Phil Wang
|
2b55f753b9
|
fix new issue with github actions and auto pypi package uploading
|
2022-05-10 10:51:15 -07:00 |
|
Phil Wang
|
fc8fce38fb
|
make sure cascading DDPM can be trained unconditionally, to ready for CLI one command training for the public
|
2022-05-10 10:48:10 -07:00 |
|
Phil Wang
|
b1e7b5f6bb
|
make sure resnet groups in unet is finely customizable
|
2022-05-10 10:12:50 -07:00 |
|
Phil Wang
|
9b322ea634
|
patch
|
2022-05-09 19:46:19 -07:00 |
|
Phil Wang
|
ba64ea45cc
|
0.2.3
|
2022-05-09 16:50:31 -07:00 |
|
Phil Wang
|
db805e73e1
|
fix a bug with numerical stability in attention, sorry! 🐛
|
2022-05-09 16:23:37 -07:00 |
|
Phil Wang
|
e46eaec817
|
deal the diffusion prior problem yet another blow
|
2022-05-09 11:08:52 -07:00 |
|
Phil Wang
|
53c189e46a
|
give more surface area for attention in diffusion prior
|
2022-05-09 08:08:11 -07:00 |
|
Phil Wang
|
dde51fd362
|
revert restriction for classifier free guidance for diffusion prior, given @crowsonkb advice
|
2022-05-07 20:55:41 -07:00 |
|
Phil Wang
|
4010aec033
|
turn off classifier free guidance if predicting x_start for diffusion prior
|
2022-05-07 09:38:17 -07:00 |
|
Phil Wang
|
830afd3c15
|
sinusoidal embed time embeddings for diffusion prior as well, for continuous version
|
2022-05-07 08:32:43 -07:00 |
|
Phil Wang
|
8f93729d19
|
when in doubt, make it a hyperparameter
|
2022-05-07 07:52:17 -07:00 |
|
Phil Wang
|
85ed77d512
|
fix a potentially huge bug thanks to @CiaoHe https://github.com/lucidrains/DALLE2-pytorch/issues/71
|
2022-05-07 05:05:54 -07:00 |
|
Phil Wang
|
3676ef4d49
|
make sure vqgan-vae trainer supports mixed precision
|
2022-05-06 10:44:16 -07:00 |
|
Phil Wang
|
28e944f328
|
make sure openai clip adapter outputs l2normed embeddings
|
2022-05-06 10:12:03 -07:00 |
|
Phil Wang
|
14e63a3f67
|
also offer l2norm clamping in diffusion prior during training, if one were using predict x0 objective
|
2022-05-06 10:05:14 -07:00 |
|
Phil Wang
|
ad20a14a4d
|
bring in rotary embeddings for diffusion prior causal transformer (the most powerful relative positional encoding, used in PaLM) - 0.1.0 because of breaking change
|
2022-05-06 08:45:30 -07:00 |
|
Phil Wang
|
0be1e0d64c
|
support CoCa, which seems to be better than CLIP (has an autoregressive text encoder) https://arxiv.org/abs/2205.01917
|
2022-05-06 08:27:12 -07:00 |
|
Phil Wang
|
98df1ba51e
|
add diffusion prior trainer, which automatically takes care of the exponential moving average (training and sampling), as well as mixed precision, gradient clipping
|
2022-05-06 08:11:09 -07:00 |
|
Phil Wang
|
878b555ef7
|
fix training with clip
|
2022-05-06 07:37:57 -07:00 |
|