Phil Wang
|
0be1e0d64c
|
support CoCa, which seems to be better than CLIP (has an autoregressive text encoder) https://arxiv.org/abs/2205.01917
|
2022-05-06 08:27:12 -07:00 |
|
Phil Wang
|
98df1ba51e
|
add diffusion prior trainer, which automatically takes care of the exponential moving average (training and sampling), as well as mixed precision, gradient clipping
|
2022-05-06 08:11:09 -07:00 |
|
Phil Wang
|
878b555ef7
|
fix training with clip
|
2022-05-06 07:37:57 -07:00 |
|
Phil Wang
|
c76a964fd6
|
allow for CLIP to be optional in Decoder, and allow DecoderTrainer to work off training pre-encoded image embeddings
|
2022-05-05 08:11:01 -07:00 |
|
Phil Wang
|
8518684ae9
|
does not make much sense, as researchers may want to try predicting noise with diffusionprior instead of predicting x0
|
2022-05-05 07:37:00 -07:00 |
|
Phil Wang
|
1d5dc08810
|
take @crowsonkb 's suggestion at https://github.com/lucidrains/DALLE2-pytorch/issues/60#issue-1226116132
|
2022-05-05 07:28:53 -07:00 |
|
Phil Wang
|
896f19786d
|
remove convnext blocks, they are illsuited for generative work, validated by early experimental results at https://github.com/lucidrains/video-diffusion-pytorch
|
2022-05-05 07:07:21 -07:00 |
|
Phil Wang
|
aec5575d09
|
take a bet on resize right, given Katherine is using it
|
2022-05-04 19:26:45 -07:00 |
|
Phil Wang
|
9773f10d6c
|
use inference mode whenever possible, cleanup
|
2022-05-04 15:25:05 -07:00 |
|
Phil Wang
|
86e692d24f
|
fix random crop probability
|
2022-05-04 11:52:24 -07:00 |
|
Phil Wang
|
97b751209f
|
allow for last unet in the cascade to be trained on crops, if it is convolution-only
|
2022-05-04 11:48:48 -07:00 |
|
Phil Wang
|
5b619c2fd5
|
make sure some hyperparameters for unet block is configurable
|
2022-05-04 11:18:32 -07:00 |
|
Phil Wang
|
9ff228188b
|
offer old resnet blocks, from the original DDPM paper, just in case convnexts are unsuitable for generative work
|
2022-05-04 10:52:58 -07:00 |
|
Phil Wang
|
70282de23b
|
add ability to turn on normformer settings, given @borisdayma reported good results and some personal anecdata
|
2022-05-02 11:33:15 -07:00 |
|
Phil Wang
|
11469dc0c6
|
makes more sense to keep this as True as default, for stability
|
2022-05-02 10:50:55 -07:00 |
|
Phil Wang
|
0fc6c9cdf3
|
provide option to l2norm the output of the diffusion prior
|
2022-05-02 09:41:03 -07:00 |
|
Phil Wang
|
ad87bfe28f
|
switch to using linear attention for the sparse attention layers within unet, given success in GAN projects
|
2022-05-01 17:59:03 -07:00 |
|
Phil Wang
|
b8cf1e5c20
|
more attention
|
2022-05-01 11:00:33 -07:00 |
|
Phil Wang
|
5e421bd5bb
|
let researchers do the hyperparameter search
|
2022-05-01 08:46:21 -07:00 |
|
Phil Wang
|
67fcab1122
|
add MLP based time conditioning to all convnexts, in addition to cross attention. also add an initial convolution, given convnext first depthwise conv
|
2022-05-01 08:41:02 -07:00 |
|
Phil Wang
|
d1a697ac23
|
allows one to shortcut sampling at a specific unet number, if one were to be training in stages
|
2022-04-30 16:05:13 -07:00 |
|
Phil Wang
|
a9421f49ec
|
simplify Decoder training for the public
|
2022-04-30 11:45:18 -07:00 |
|
Phil Wang
|
77fa34eae9
|
fix all clipping / clamping issues
|
2022-04-30 10:08:24 -07:00 |
|
Phil Wang
|
1c1e508369
|
fix all issues with text encodings conditioning in the decoder, using null padding tokens technique from dalle v1
|
2022-04-30 09:13:34 -07:00 |
|
Phil Wang
|
f19c99ecb0
|
fix decoder needing separate conditional dropping probabilities for image embeddings and text encodings, thanks to @xiankgx !
|
2022-04-30 08:48:05 -07:00 |
|
Phil Wang
|
20e7eb5a9b
|
cleanup
|
2022-04-30 07:22:57 -07:00 |
|
Phil Wang
|
e2f9615afa
|
use @clip-anytorch , thanks to @rom1504
|
2022-04-30 06:40:54 -07:00 |
|
Phil Wang
|
0d1c07c803
|
fix a bug with classifier free guidance, thanks to @xiankgx again!
|
2022-04-30 06:34:57 -07:00 |
|
Phil Wang
|
5063d192b6
|
now completely OpenAI CLIP compatible for training
just take care of the logic for AdamW and transformers
used namedtuples for clip adapter embedding outputs
|
2022-04-29 13:05:01 -07:00 |
|
Phil Wang
|
fb662a62f3
|
fix another bug thanks to @xiankgx
|
2022-04-29 07:38:32 -07:00 |
|
Phil Wang
|
587c8c9b44
|
optimize for clarity
|
2022-04-28 21:59:13 -07:00 |
|
Phil Wang
|
aa900213e7
|
force first unet in the cascade to be conditioned on image embeds
|
2022-04-28 20:53:15 -07:00 |
|
Phil Wang
|
625ce23f6b
|
🐛
|
2022-04-28 07:21:18 -07:00 |
|
Phil Wang
|
dbf4a281f1
|
make sure another CLIP can actually be passed in, as long as it is wrapped in an adapter extended from BaseClipAdapter
|
2022-04-27 20:45:27 -07:00 |
|
Phil Wang
|
4ab527e779
|
some extra asserts for text encoding of diffusion prior and decoder
|
2022-04-27 20:11:43 -07:00 |
|
Phil Wang
|
d0cdeb3247
|
add ability for DALL-E2 to return PIL images with return_pil_images = True on forward, for those who have no clue about deep learning
|
2022-04-27 19:58:06 -07:00 |
|
Phil Wang
|
8c610aad9a
|
only pass text encodings conditioning in diffusion prior if specified on initialization
|
2022-04-27 19:48:16 -07:00 |
|
Phil Wang
|
6700381a37
|
prepare for ability to integrate other clips other than x-clip
|
2022-04-27 19:35:05 -07:00 |
|
Phil Wang
|
fa3bb6ba5c
|
make sure cpu-only still works
|
2022-04-27 08:02:10 -07:00 |
|
Phil Wang
|
2705e7c9b0
|
attention-based upsampling claims unsupported by local experiments, removing
|
2022-04-27 07:51:04 -07:00 |
|
Phil Wang
|
de0296106b
|
be able to turn off warning for use of LazyLinear by passing in text embedding dimension for unet
|
2022-04-26 11:42:46 -07:00 |
|
Phil Wang
|
eafb136214
|
suppress a warning
|
2022-04-26 11:40:45 -07:00 |
|
Phil Wang
|
bfbcc283a3
|
DRY a tiny bit for gaussian diffusion related logic
|
2022-04-26 11:39:12 -07:00 |
|
Phil Wang
|
c30544b73a
|
no CLIP altogether for training DiffusionPrior
|
2022-04-26 10:23:41 -07:00 |
|
Phil Wang
|
9878be760b
|
have researcher explicitly state upfront whether to condition with text encodings in cascading ddpm decoder, have DALLE-2 class take care of passing in text if feature turned on
|
2022-04-26 09:47:09 -07:00 |
|
Phil Wang
|
7ba6357c05
|
allow for training the Prior network with precomputed CLIP embeddings (or text encodings)
|
2022-04-26 09:29:51 -07:00 |
|
Phil Wang
|
76e063e8b7
|
refactor so that the causal transformer in the diffusion prior network can be conditioned without text encodings (for Laions parallel efforts, although it seems from the paper it is needed)
|
2022-04-26 09:00:11 -07:00 |
|
Phil Wang
|
4d25976f33
|
make sure non-latent diffusion still works
|
2022-04-26 08:36:00 -07:00 |
|
Phil Wang
|
0b28ee0d01
|
revert back to old upsampling, paper does not work
|
2022-04-26 07:39:04 -07:00 |
|
Phil Wang
|
f75d49c781
|
start a file for all attention-related modules, use attention-based upsampling in the unets in dalle-2
|
2022-04-25 18:59:10 -07:00 |
|