Phil Wang
|
58d9b422f3
|
0.0.94
|
2022-05-04 07:42:33 -07:00 |
|
Phil Wang
|
70282de23b
|
add ability to turn on normformer settings, given @borisdayma reported good results and some personal anecdata
|
2022-05-02 11:33:15 -07:00 |
|
Phil Wang
|
11469dc0c6
|
makes more sense to keep this as True as default, for stability
|
2022-05-02 10:50:55 -07:00 |
|
Phil Wang
|
0fc6c9cdf3
|
provide option to l2norm the output of the diffusion prior
|
2022-05-02 09:41:03 -07:00 |
|
Phil Wang
|
1924c7cc3d
|
fix issue with mixed precision and gradient clipping
|
2022-05-02 09:20:19 -07:00 |
|
Phil Wang
|
fc954ee788
|
fix calculation of adaptive weight for vit-vqgan, thanks to @CiaoHe
|
2022-05-02 07:58:14 -07:00 |
|
Phil Wang
|
ad87bfe28f
|
switch to using linear attention for the sparse attention layers within unet, given success in GAN projects
|
2022-05-01 17:59:03 -07:00 |
|
Phil Wang
|
76c767b1ce
|
update deps, commit to using webdatasets, per @rom1504 consultation
|
2022-05-01 12:22:15 -07:00 |
|
Kumar R
|
53ce6dfdf6
|
All changes implemented, current run happening. Link to wandb run in comments. (#43)
* Train DiffusionPrior with pre-computed embeddings
This is in response to https://github.com/lucidrains/DALLE2-pytorch/issues/29 - more metrics will get added.
|
2022-05-01 11:46:59 -07:00 |
|
Phil Wang
|
b8cf1e5c20
|
more attention
|
2022-05-01 11:00:33 -07:00 |
|
Phil Wang
|
1bb9fc9829
|
add convnext backbone for vqgan-vae, still need to fix groupnorms in resnet encdec
|
2022-05-01 09:32:24 -07:00 |
|
Phil Wang
|
5e421bd5bb
|
let researchers do the hyperparameter search
|
2022-05-01 08:46:21 -07:00 |
|
Phil Wang
|
67fcab1122
|
add MLP based time conditioning to all convnexts, in addition to cross attention. also add an initial convolution, given convnext first depthwise conv
|
2022-05-01 08:41:02 -07:00 |
|
Phil Wang
|
d1a697ac23
|
allows one to shortcut sampling at a specific unet number, if one were to be training in stages
|
2022-04-30 16:05:13 -07:00 |
|
Phil Wang
|
ebe01749ed
|
DecoderTrainer sample method uses the exponentially moving averaged
|
2022-04-30 14:55:34 -07:00 |
|
Phil Wang
|
63195cc2cb
|
allow for division of loss prior to scaling, for gradient accumulation purposes
|
2022-04-30 12:56:47 -07:00 |
|
Phil Wang
|
a2ef69af66
|
take care of mixed precision, and make gradient accumulation do-able externally
|
2022-04-30 12:27:24 -07:00 |
|
Phil Wang
|
5fff22834e
|
be able to finely customize learning parameters for each unet, take care of gradient clipping
|
2022-04-30 11:56:05 -07:00 |
|
Phil Wang
|
a9421f49ec
|
simplify Decoder training for the public
|
2022-04-30 11:45:18 -07:00 |
|
Phil Wang
|
77fa34eae9
|
fix all clipping / clamping issues
|
2022-04-30 10:08:24 -07:00 |
|
Phil Wang
|
1c1e508369
|
fix all issues with text encodings conditioning in the decoder, using null padding tokens technique from dalle v1
|
2022-04-30 09:13:34 -07:00 |
|
Phil Wang
|
f19c99ecb0
|
fix decoder needing separate conditional dropping probabilities for image embeddings and text encodings, thanks to @xiankgx !
|
2022-04-30 08:48:05 -07:00 |
|
Phil Wang
|
e2f9615afa
|
use @clip-anytorch , thanks to @rom1504
|
2022-04-30 06:40:54 -07:00 |
|
Phil Wang
|
0d1c07c803
|
fix a bug with classifier free guidance, thanks to @xiankgx again!
|
2022-04-30 06:34:57 -07:00 |
|
Phil Wang
|
5063d192b6
|
now completely OpenAI CLIP compatible for training
just take care of the logic for AdamW and transformers
used namedtuples for clip adapter embedding outputs
|
2022-04-29 13:05:01 -07:00 |
|
Phil Wang
|
fb662a62f3
|
fix another bug thanks to @xiankgx
|
2022-04-29 07:38:32 -07:00 |
|
Phil Wang
|
aa900213e7
|
force first unet in the cascade to be conditioned on image embeds
|
2022-04-28 20:53:15 -07:00 |
|
Phil Wang
|
cb26187450
|
vqgan-vae codebook dims should be 256 or smaller
|
2022-04-28 08:59:03 -07:00 |
|
Phil Wang
|
625ce23f6b
|
🐛
|
2022-04-28 07:21:18 -07:00 |
|
Phil Wang
|
dbf4a281f1
|
make sure another CLIP can actually be passed in, as long as it is wrapped in an adapter extended from BaseClipAdapter
|
2022-04-27 20:45:27 -07:00 |
|
Phil Wang
|
4ab527e779
|
some extra asserts for text encoding of diffusion prior and decoder
|
2022-04-27 20:11:43 -07:00 |
|
Phil Wang
|
d0cdeb3247
|
add ability for DALL-E2 to return PIL images with return_pil_images = True on forward, for those who have no clue about deep learning
|
2022-04-27 19:58:06 -07:00 |
|
Phil Wang
|
8c610aad9a
|
only pass text encodings conditioning in diffusion prior if specified on initialization
|
2022-04-27 19:48:16 -07:00 |
|
Phil Wang
|
6700381a37
|
prepare for ability to integrate other clips other than x-clip
|
2022-04-27 19:35:05 -07:00 |
|
Phil Wang
|
6edb1c5dd0
|
fix issue with ema class
|
2022-04-27 16:40:02 -07:00 |
|
Phil Wang
|
fa3bb6ba5c
|
make sure cpu-only still works
|
2022-04-27 08:02:10 -07:00 |
|
Phil Wang
|
77141882c8
|
complete vit-vqgan from https://arxiv.org/abs/2110.04627
|
2022-04-26 17:20:47 -07:00 |
|
Phil Wang
|
de0296106b
|
be able to turn off warning for use of LazyLinear by passing in text embedding dimension for unet
|
2022-04-26 11:42:46 -07:00 |
|
Phil Wang
|
eafb136214
|
suppress a warning
|
2022-04-26 11:40:45 -07:00 |
|
Phil Wang
|
c30544b73a
|
no CLIP altogether for training DiffusionPrior
|
2022-04-26 10:23:41 -07:00 |
|
Phil Wang
|
9878be760b
|
have researcher explicitly state upfront whether to condition with text encodings in cascading ddpm decoder, have DALLE-2 class take care of passing in text if feature turned on
|
2022-04-26 09:47:09 -07:00 |
|
Phil Wang
|
7ba6357c05
|
allow for training the Prior network with precomputed CLIP embeddings (or text encodings)
|
2022-04-26 09:29:51 -07:00 |
|
Phil Wang
|
76e063e8b7
|
refactor so that the causal transformer in the diffusion prior network can be conditioned without text encodings (for Laions parallel efforts, although it seems from the paper it is needed)
|
2022-04-26 09:00:11 -07:00 |
|
Phil Wang
|
4d25976f33
|
make sure non-latent diffusion still works
|
2022-04-26 08:36:00 -07:00 |
|
Phil Wang
|
f75d49c781
|
start a file for all attention-related modules, use attention-based upsampling in the unets in dalle-2
|
2022-04-25 18:59:10 -07:00 |
|
Phil Wang
|
3b520dfa85
|
bring in attention-based upsampling to strengthen vqgan-vae, seems to work as advertised in initial experiments in GAN
|
2022-04-25 17:27:45 -07:00 |
|
Phil Wang
|
8f2a0c7e00
|
better naming
|
2022-04-25 07:44:33 -07:00 |
|
Phil Wang
|
863f4ef243
|
just take care of the logic for setting all latent diffusion to predict x0, if needed
|
2022-04-24 10:06:42 -07:00 |
|
Phil Wang
|
fb8a66a2de
|
just in case latent diffusion performs better with prediction of x0 instead of epsilon, open up the research avenue
|
2022-04-24 10:04:22 -07:00 |
|
Phil Wang
|
579d4b42dd
|
does not seem right to clip for the prior diffusion part
|
2022-04-24 09:51:18 -07:00 |
|