Phil Wang
|
5b619c2fd5
|
make sure some hyperparameters for unet block is configurable
|
2022-05-04 11:18:32 -07:00 |
|
Phil Wang
|
9ff228188b
|
offer old resnet blocks, from the original DDPM paper, just in case convnexts are unsuitable for generative work
|
2022-05-04 10:52:58 -07:00 |
|
Phil Wang
|
70282de23b
|
add ability to turn on normformer settings, given @borisdayma reported good results and some personal anecdata
|
2022-05-02 11:33:15 -07:00 |
|
Phil Wang
|
11469dc0c6
|
makes more sense to keep this as True as default, for stability
|
2022-05-02 10:50:55 -07:00 |
|
Phil Wang
|
0fc6c9cdf3
|
provide option to l2norm the output of the diffusion prior
|
2022-05-02 09:41:03 -07:00 |
|
Phil Wang
|
ad87bfe28f
|
switch to using linear attention for the sparse attention layers within unet, given success in GAN projects
|
2022-05-01 17:59:03 -07:00 |
|
Phil Wang
|
b8cf1e5c20
|
more attention
|
2022-05-01 11:00:33 -07:00 |
|
Phil Wang
|
5e421bd5bb
|
let researchers do the hyperparameter search
|
2022-05-01 08:46:21 -07:00 |
|
Phil Wang
|
67fcab1122
|
add MLP based time conditioning to all convnexts, in addition to cross attention. also add an initial convolution, given convnext first depthwise conv
|
2022-05-01 08:41:02 -07:00 |
|
Phil Wang
|
d1a697ac23
|
allows one to shortcut sampling at a specific unet number, if one were to be training in stages
|
2022-04-30 16:05:13 -07:00 |
|
Phil Wang
|
a9421f49ec
|
simplify Decoder training for the public
|
2022-04-30 11:45:18 -07:00 |
|
Phil Wang
|
77fa34eae9
|
fix all clipping / clamping issues
|
2022-04-30 10:08:24 -07:00 |
|
Phil Wang
|
1c1e508369
|
fix all issues with text encodings conditioning in the decoder, using null padding tokens technique from dalle v1
|
2022-04-30 09:13:34 -07:00 |
|
Phil Wang
|
f19c99ecb0
|
fix decoder needing separate conditional dropping probabilities for image embeddings and text encodings, thanks to @xiankgx !
|
2022-04-30 08:48:05 -07:00 |
|
Phil Wang
|
20e7eb5a9b
|
cleanup
|
2022-04-30 07:22:57 -07:00 |
|
Phil Wang
|
e2f9615afa
|
use @clip-anytorch , thanks to @rom1504
|
2022-04-30 06:40:54 -07:00 |
|
Phil Wang
|
0d1c07c803
|
fix a bug with classifier free guidance, thanks to @xiankgx again!
|
2022-04-30 06:34:57 -07:00 |
|
Phil Wang
|
5063d192b6
|
now completely OpenAI CLIP compatible for training
just take care of the logic for AdamW and transformers
used namedtuples for clip adapter embedding outputs
|
2022-04-29 13:05:01 -07:00 |
|
Phil Wang
|
fb662a62f3
|
fix another bug thanks to @xiankgx
|
2022-04-29 07:38:32 -07:00 |
|
Phil Wang
|
587c8c9b44
|
optimize for clarity
|
2022-04-28 21:59:13 -07:00 |
|
Phil Wang
|
aa900213e7
|
force first unet in the cascade to be conditioned on image embeds
|
2022-04-28 20:53:15 -07:00 |
|
Phil Wang
|
625ce23f6b
|
🐛
|
2022-04-28 07:21:18 -07:00 |
|
Phil Wang
|
dbf4a281f1
|
make sure another CLIP can actually be passed in, as long as it is wrapped in an adapter extended from BaseClipAdapter
|
2022-04-27 20:45:27 -07:00 |
|
Phil Wang
|
4ab527e779
|
some extra asserts for text encoding of diffusion prior and decoder
|
2022-04-27 20:11:43 -07:00 |
|
Phil Wang
|
d0cdeb3247
|
add ability for DALL-E2 to return PIL images with return_pil_images = True on forward, for those who have no clue about deep learning
|
2022-04-27 19:58:06 -07:00 |
|
Phil Wang
|
8c610aad9a
|
only pass text encodings conditioning in diffusion prior if specified on initialization
|
2022-04-27 19:48:16 -07:00 |
|
Phil Wang
|
6700381a37
|
prepare for ability to integrate other clips other than x-clip
|
2022-04-27 19:35:05 -07:00 |
|
Phil Wang
|
fa3bb6ba5c
|
make sure cpu-only still works
|
2022-04-27 08:02:10 -07:00 |
|
Phil Wang
|
2705e7c9b0
|
attention-based upsampling claims unsupported by local experiments, removing
|
2022-04-27 07:51:04 -07:00 |
|
Phil Wang
|
de0296106b
|
be able to turn off warning for use of LazyLinear by passing in text embedding dimension for unet
|
2022-04-26 11:42:46 -07:00 |
|
Phil Wang
|
eafb136214
|
suppress a warning
|
2022-04-26 11:40:45 -07:00 |
|
Phil Wang
|
bfbcc283a3
|
DRY a tiny bit for gaussian diffusion related logic
|
2022-04-26 11:39:12 -07:00 |
|
Phil Wang
|
c30544b73a
|
no CLIP altogether for training DiffusionPrior
|
2022-04-26 10:23:41 -07:00 |
|
Phil Wang
|
9878be760b
|
have researcher explicitly state upfront whether to condition with text encodings in cascading ddpm decoder, have DALLE-2 class take care of passing in text if feature turned on
|
2022-04-26 09:47:09 -07:00 |
|
Phil Wang
|
7ba6357c05
|
allow for training the Prior network with precomputed CLIP embeddings (or text encodings)
|
2022-04-26 09:29:51 -07:00 |
|
Phil Wang
|
76e063e8b7
|
refactor so that the causal transformer in the diffusion prior network can be conditioned without text encodings (for Laions parallel efforts, although it seems from the paper it is needed)
|
2022-04-26 09:00:11 -07:00 |
|
Phil Wang
|
4d25976f33
|
make sure non-latent diffusion still works
|
2022-04-26 08:36:00 -07:00 |
|
Phil Wang
|
0b28ee0d01
|
revert back to old upsampling, paper does not work
|
2022-04-26 07:39:04 -07:00 |
|
Phil Wang
|
f75d49c781
|
start a file for all attention-related modules, use attention-based upsampling in the unets in dalle-2
|
2022-04-25 18:59:10 -07:00 |
|
Phil Wang
|
8f2a0c7e00
|
better naming
|
2022-04-25 07:44:33 -07:00 |
|
Phil Wang
|
863f4ef243
|
just take care of the logic for setting all latent diffusion to predict x0, if needed
|
2022-04-24 10:06:42 -07:00 |
|
Phil Wang
|
fb8a66a2de
|
just in case latent diffusion performs better with prediction of x0 instead of epsilon, open up the research avenue
|
2022-04-24 10:04:22 -07:00 |
|
Phil Wang
|
579d4b42dd
|
does not seem right to clip for the prior diffusion part
|
2022-04-24 09:51:18 -07:00 |
|
Phil Wang
|
473808850a
|
some outlines to the eventual CLI endpoint
|
2022-04-24 09:27:15 -07:00 |
|
Phil Wang
|
05b74be69a
|
use null container pattern to cleanup some conditionals, save more cleanup for next week
|
2022-04-22 15:23:18 -07:00 |
|
Phil Wang
|
76b32f18b3
|
first pass at complete DALL-E2 + Latent Diffusion integration, latent diffusion on any layer(s) of the cascading ddpm in the decoder.
|
2022-04-22 13:53:13 -07:00 |
|
Phil Wang
|
46cef31c86
|
optional projection out for prior network causal transformer
|
2022-04-22 11:16:30 -07:00 |
|
Phil Wang
|
59b1a77d4d
|
be a bit more conservative and stick with layernorm (without bias) for now, given @borisdayma results https://twitter.com/borisdayma/status/1517227191477571585
|
2022-04-22 11:14:54 -07:00 |
|
Phil Wang
|
7f338319fd
|
makes more sense for blur augmentation to happen before the upsampling
|
2022-04-22 11:10:47 -07:00 |
|
Phil Wang
|
2c6c91829d
|
refactor blurring training augmentation to be taken care of by the decoder, with option to downsample to previous resolution before upsampling (cascading ddpm). this opens up the possibility of cascading latent ddpm
|
2022-04-22 11:09:17 -07:00 |
|