Phil Wang
1d5dc08810
take @crowsonkb 's suggestion at https://github.com/lucidrains/DALLE2-pytorch/issues/60#issue-1226116132
2022-05-05 07:28:53 -07:00
Phil Wang
d8d8b6caf1
dataloaders for decoder training, from @Veldrovive
2022-05-05 07:09:45 -07:00
Aidan Dempster
15acc03bd4
Add a dataloader for training the decoder ( #57 )
...
* Added dataloader and updated requirements
* Added option to set embedding shard width separately from webdataset shard length.
There must be a better way to do this.
* Changed embedding loader to read using fsspec
* Moved the loader into a more compatible location
* Removed unnecessary package
* Fixed typo (Embeding -> Embedding)
* Simplified example embedding finder code to remove unnecessary get_file_list function
* Added example usage of ImageEmbeddingDataset
* Changed the name of create_dataloader to be more verbose
Added a dataloaders __init__.py
2022-05-05 07:08:45 -07:00
Phil Wang
896f19786d
remove convnext blocks, they are illsuited for generative work, validated by early experimental results at https://github.com/lucidrains/video-diffusion-pytorch
2022-05-05 07:07:21 -07:00
Phil Wang
aec5575d09
take a bet on resize right, given Katherine is using it
2022-05-04 19:26:45 -07:00
Phil Wang
9773f10d6c
use inference mode whenever possible, cleanup
2022-05-04 15:25:05 -07:00
Phil Wang
86e692d24f
fix random crop probability
2022-05-04 11:52:24 -07:00
Phil Wang
97b751209f
allow for last unet in the cascade to be trained on crops, if it is convolution-only
2022-05-04 11:48:48 -07:00
Phil Wang
5b619c2fd5
make sure some hyperparameters for unet block is configurable
2022-05-04 11:18:32 -07:00
Phil Wang
9359ad2e91
0.0.95
2022-05-04 10:53:05 -07:00
Phil Wang
58d9b422f3
0.0.94
2022-05-04 07:42:33 -07:00
Phil Wang
70282de23b
add ability to turn on normformer settings, given @borisdayma reported good results and some personal anecdata
2022-05-02 11:33:15 -07:00
Phil Wang
11469dc0c6
makes more sense to keep this as True as default, for stability
2022-05-02 10:50:55 -07:00
Phil Wang
0fc6c9cdf3
provide option to l2norm the output of the diffusion prior
2022-05-02 09:41:03 -07:00
Phil Wang
1924c7cc3d
fix issue with mixed precision and gradient clipping
2022-05-02 09:20:19 -07:00
Phil Wang
fc954ee788
fix calculation of adaptive weight for vit-vqgan, thanks to @CiaoHe
2022-05-02 07:58:14 -07:00
Phil Wang
ad87bfe28f
switch to using linear attention for the sparse attention layers within unet, given success in GAN projects
2022-05-01 17:59:03 -07:00
Phil Wang
76c767b1ce
update deps, commit to using webdatasets, per @rom1504 consultation
2022-05-01 12:22:15 -07:00
Kumar R
53ce6dfdf6
All changes implemented, current run happening. Link to wandb run in comments. ( #43 )
...
* Train DiffusionPrior with pre-computed embeddings
This is in response to https://github.com/lucidrains/DALLE2-pytorch/issues/29 - more metrics will get added.
2022-05-01 11:46:59 -07:00
Phil Wang
b8cf1e5c20
more attention
2022-05-01 11:00:33 -07:00
Phil Wang
1bb9fc9829
add convnext backbone for vqgan-vae, still need to fix groupnorms in resnet encdec
2022-05-01 09:32:24 -07:00
Phil Wang
5e421bd5bb
let researchers do the hyperparameter search
2022-05-01 08:46:21 -07:00
Phil Wang
67fcab1122
add MLP based time conditioning to all convnexts, in addition to cross attention. also add an initial convolution, given convnext first depthwise conv
2022-05-01 08:41:02 -07:00
Phil Wang
d1a697ac23
allows one to shortcut sampling at a specific unet number, if one were to be training in stages
2022-04-30 16:05:13 -07:00
Phil Wang
ebe01749ed
DecoderTrainer sample method uses the exponentially moving averaged
2022-04-30 14:55:34 -07:00
Phil Wang
63195cc2cb
allow for division of loss prior to scaling, for gradient accumulation purposes
2022-04-30 12:56:47 -07:00
Phil Wang
a2ef69af66
take care of mixed precision, and make gradient accumulation do-able externally
2022-04-30 12:27:24 -07:00
Phil Wang
5fff22834e
be able to finely customize learning parameters for each unet, take care of gradient clipping
2022-04-30 11:56:05 -07:00
Phil Wang
a9421f49ec
simplify Decoder training for the public
2022-04-30 11:45:18 -07:00
Phil Wang
77fa34eae9
fix all clipping / clamping issues
2022-04-30 10:08:24 -07:00
Phil Wang
1c1e508369
fix all issues with text encodings conditioning in the decoder, using null padding tokens technique from dalle v1
2022-04-30 09:13:34 -07:00
Phil Wang
f19c99ecb0
fix decoder needing separate conditional dropping probabilities for image embeddings and text encodings, thanks to @xiankgx !
2022-04-30 08:48:05 -07:00
Phil Wang
e2f9615afa
use @clip-anytorch , thanks to @rom1504
2022-04-30 06:40:54 -07:00
Phil Wang
0d1c07c803
fix a bug with classifier free guidance, thanks to @xiankgx again!
2022-04-30 06:34:57 -07:00
Phil Wang
5063d192b6
now completely OpenAI CLIP compatible for training
...
just take care of the logic for AdamW and transformers
used namedtuples for clip adapter embedding outputs
2022-04-29 13:05:01 -07:00
Phil Wang
fb662a62f3
fix another bug thanks to @xiankgx
2022-04-29 07:38:32 -07:00
Phil Wang
aa900213e7
force first unet in the cascade to be conditioned on image embeds
2022-04-28 20:53:15 -07:00
Phil Wang
cb26187450
vqgan-vae codebook dims should be 256 or smaller
2022-04-28 08:59:03 -07:00
Phil Wang
625ce23f6b
🐛
2022-04-28 07:21:18 -07:00
Phil Wang
dbf4a281f1
make sure another CLIP can actually be passed in, as long as it is wrapped in an adapter extended from BaseClipAdapter
2022-04-27 20:45:27 -07:00
Phil Wang
4ab527e779
some extra asserts for text encoding of diffusion prior and decoder
2022-04-27 20:11:43 -07:00
Phil Wang
d0cdeb3247
add ability for DALL-E2 to return PIL images with return_pil_images = True on forward, for those who have no clue about deep learning
2022-04-27 19:58:06 -07:00
Phil Wang
8c610aad9a
only pass text encodings conditioning in diffusion prior if specified on initialization
2022-04-27 19:48:16 -07:00
Phil Wang
6700381a37
prepare for ability to integrate other clips other than x-clip
2022-04-27 19:35:05 -07:00
Phil Wang
6edb1c5dd0
fix issue with ema class
2022-04-27 16:40:02 -07:00
Phil Wang
fa3bb6ba5c
make sure cpu-only still works
2022-04-27 08:02:10 -07:00
Phil Wang
77141882c8
complete vit-vqgan from https://arxiv.org/abs/2110.04627
2022-04-26 17:20:47 -07:00
Phil Wang
de0296106b
be able to turn off warning for use of LazyLinear by passing in text embedding dimension for unet
2022-04-26 11:42:46 -07:00
Phil Wang
eafb136214
suppress a warning
2022-04-26 11:40:45 -07:00
Phil Wang
c30544b73a
no CLIP altogether for training DiffusionPrior
2022-04-26 10:23:41 -07:00