Phil Wang
35f89556ba
bring in the cross embed layer from Crossformer paper for initial convolution in unet
2022-05-10 11:50:38 -07:00
Phil Wang
2b55f753b9
fix new issue with github actions and auto pypi package uploading
2022-05-10 10:51:15 -07:00
Phil Wang
fc8fce38fb
make sure cascading DDPM can be trained unconditionally, to ready for CLI one command training for the public
2022-05-10 10:48:10 -07:00
Phil Wang
b1e7b5f6bb
make sure resnet groups in unet is finely customizable
2022-05-10 10:12:50 -07:00
Phil Wang
9b322ea634
patch
2022-05-09 19:46:19 -07:00
Phil Wang
ba64ea45cc
0.2.3
2022-05-09 16:50:31 -07:00
Phil Wang
db805e73e1
fix a bug with numerical stability in attention, sorry! 🐛
2022-05-09 16:23:37 -07:00
Phil Wang
e46eaec817
deal the diffusion prior problem yet another blow
2022-05-09 11:08:52 -07:00
Phil Wang
53c189e46a
give more surface area for attention in diffusion prior
2022-05-09 08:08:11 -07:00
Phil Wang
dde51fd362
revert restriction for classifier free guidance for diffusion prior, given @crowsonkb advice
2022-05-07 20:55:41 -07:00
Phil Wang
4010aec033
turn off classifier free guidance if predicting x_start for diffusion prior
2022-05-07 09:38:17 -07:00
Phil Wang
830afd3c15
sinusoidal embed time embeddings for diffusion prior as well, for continuous version
2022-05-07 08:32:43 -07:00
Phil Wang
8f93729d19
when in doubt, make it a hyperparameter
2022-05-07 07:52:17 -07:00
Phil Wang
85ed77d512
fix a potentially huge bug thanks to @CiaoHe https://github.com/lucidrains/DALLE2-pytorch/issues/71
2022-05-07 05:05:54 -07:00
Phil Wang
3676ef4d49
make sure vqgan-vae trainer supports mixed precision
2022-05-06 10:44:16 -07:00
Phil Wang
28e944f328
make sure openai clip adapter outputs l2normed embeddings
2022-05-06 10:12:03 -07:00
Phil Wang
14e63a3f67
also offer l2norm clamping in diffusion prior during training, if one were using predict x0 objective
2022-05-06 10:05:14 -07:00
Phil Wang
ad20a14a4d
bring in rotary embeddings for diffusion prior causal transformer (the most powerful relative positional encoding, used in PaLM) - 0.1.0 because of breaking change
2022-05-06 08:45:30 -07:00
Phil Wang
0be1e0d64c
support CoCa, which seems to be better than CLIP (has an autoregressive text encoder) https://arxiv.org/abs/2205.01917
2022-05-06 08:27:12 -07:00
Phil Wang
98df1ba51e
add diffusion prior trainer, which automatically takes care of the exponential moving average (training and sampling), as well as mixed precision, gradient clipping
2022-05-06 08:11:09 -07:00
Phil Wang
878b555ef7
fix training with clip
2022-05-06 07:37:57 -07:00
Phil Wang
c76a964fd6
allow for CLIP to be optional in Decoder, and allow DecoderTrainer to work off training pre-encoded image embeddings
2022-05-05 08:11:01 -07:00
Phil Wang
8518684ae9
does not make much sense, as researchers may want to try predicting noise with diffusionprior instead of predicting x0
2022-05-05 07:37:00 -07:00
Phil Wang
1d5dc08810
take @crowsonkb 's suggestion at https://github.com/lucidrains/DALLE2-pytorch/issues/60#issue-1226116132
2022-05-05 07:28:53 -07:00
Phil Wang
d8d8b6caf1
dataloaders for decoder training, from @Veldrovive
2022-05-05 07:09:45 -07:00
Aidan Dempster
15acc03bd4
Add a dataloader for training the decoder ( #57 )
...
* Added dataloader and updated requirements
* Added option to set embedding shard width separately from webdataset shard length.
There must be a better way to do this.
* Changed embedding loader to read using fsspec
* Moved the loader into a more compatible location
* Removed unnecessary package
* Fixed typo (Embeding -> Embedding)
* Simplified example embedding finder code to remove unnecessary get_file_list function
* Added example usage of ImageEmbeddingDataset
* Changed the name of create_dataloader to be more verbose
Added a dataloaders __init__.py
2022-05-05 07:08:45 -07:00
Phil Wang
896f19786d
remove convnext blocks, they are illsuited for generative work, validated by early experimental results at https://github.com/lucidrains/video-diffusion-pytorch
2022-05-05 07:07:21 -07:00
Phil Wang
aec5575d09
take a bet on resize right, given Katherine is using it
2022-05-04 19:26:45 -07:00
Phil Wang
9773f10d6c
use inference mode whenever possible, cleanup
2022-05-04 15:25:05 -07:00
Phil Wang
86e692d24f
fix random crop probability
2022-05-04 11:52:24 -07:00
Phil Wang
97b751209f
allow for last unet in the cascade to be trained on crops, if it is convolution-only
2022-05-04 11:48:48 -07:00
Phil Wang
5b619c2fd5
make sure some hyperparameters for unet block is configurable
2022-05-04 11:18:32 -07:00
Phil Wang
9359ad2e91
0.0.95
2022-05-04 10:53:05 -07:00
Phil Wang
58d9b422f3
0.0.94
2022-05-04 07:42:33 -07:00
Phil Wang
70282de23b
add ability to turn on normformer settings, given @borisdayma reported good results and some personal anecdata
2022-05-02 11:33:15 -07:00
Phil Wang
11469dc0c6
makes more sense to keep this as True as default, for stability
2022-05-02 10:50:55 -07:00
Phil Wang
0fc6c9cdf3
provide option to l2norm the output of the diffusion prior
2022-05-02 09:41:03 -07:00
Phil Wang
1924c7cc3d
fix issue with mixed precision and gradient clipping
2022-05-02 09:20:19 -07:00
Phil Wang
fc954ee788
fix calculation of adaptive weight for vit-vqgan, thanks to @CiaoHe
2022-05-02 07:58:14 -07:00
Phil Wang
ad87bfe28f
switch to using linear attention for the sparse attention layers within unet, given success in GAN projects
2022-05-01 17:59:03 -07:00
Phil Wang
76c767b1ce
update deps, commit to using webdatasets, per @rom1504 consultation
2022-05-01 12:22:15 -07:00
Kumar R
53ce6dfdf6
All changes implemented, current run happening. Link to wandb run in comments. ( #43 )
...
* Train DiffusionPrior with pre-computed embeddings
This is in response to https://github.com/lucidrains/DALLE2-pytorch/issues/29 - more metrics will get added.
2022-05-01 11:46:59 -07:00
Phil Wang
b8cf1e5c20
more attention
2022-05-01 11:00:33 -07:00
Phil Wang
1bb9fc9829
add convnext backbone for vqgan-vae, still need to fix groupnorms in resnet encdec
2022-05-01 09:32:24 -07:00
Phil Wang
5e421bd5bb
let researchers do the hyperparameter search
2022-05-01 08:46:21 -07:00
Phil Wang
67fcab1122
add MLP based time conditioning to all convnexts, in addition to cross attention. also add an initial convolution, given convnext first depthwise conv
2022-05-01 08:41:02 -07:00
Phil Wang
d1a697ac23
allows one to shortcut sampling at a specific unet number, if one were to be training in stages
2022-04-30 16:05:13 -07:00
Phil Wang
ebe01749ed
DecoderTrainer sample method uses the exponentially moving averaged
2022-04-30 14:55:34 -07:00
Phil Wang
63195cc2cb
allow for division of loss prior to scaling, for gradient accumulation purposes
2022-04-30 12:56:47 -07:00
Phil Wang
a2ef69af66
take care of mixed precision, and make gradient accumulation do-able externally
2022-04-30 12:27:24 -07:00