Commit Graph

181 Commits

Author SHA1 Message Date
Phil Wang
5063d192b6 now completely OpenAI CLIP compatible for training
just take care of the logic for AdamW and transformers

used namedtuples for clip adapter embedding outputs
2022-04-29 13:05:01 -07:00
Phil Wang
6700381a37 prepare for ability to integrate other clips other than x-clip 2022-04-27 19:35:05 -07:00
Phil Wang
20377f889a todo 2022-04-27 17:22:14 -07:00
Phil Wang
b093f92182 inform what is possible 2022-04-27 08:25:16 -07:00
Phil Wang
2705e7c9b0 attention-based upsampling claims unsupported by local experiments, removing 2022-04-27 07:51:04 -07:00
Phil Wang
77141882c8 complete vit-vqgan from https://arxiv.org/abs/2110.04627 2022-04-26 17:20:47 -07:00
Phil Wang
4075d02139 nevermind, it could be working, but only when i stabilize it with the feedforward layer + tanh as proposed in vit-vqgan paper (which will be built into the repository later for the latent diffusion) 2022-04-26 12:43:31 -07:00
Phil Wang
bfbcc283a3 DRY a tiny bit for gaussian diffusion related logic 2022-04-26 11:39:12 -07:00
Phil Wang
c30544b73a no CLIP altogether for training DiffusionPrior 2022-04-26 10:23:41 -07:00
Phil Wang
bdf5e9c009 todo 2022-04-26 09:56:54 -07:00
Phil Wang
9878be760b have researcher explicitly state upfront whether to condition with text encodings in cascading ddpm decoder, have DALLE-2 class take care of passing in text if feature turned on 2022-04-26 09:47:09 -07:00
Phil Wang
7ba6357c05 allow for training the Prior network with precomputed CLIP embeddings (or text encodings) 2022-04-26 09:29:51 -07:00
Phil Wang
0b28ee0d01 revert back to old upsampling, paper does not work 2022-04-26 07:39:04 -07:00
Phil Wang
13a58a78c4 scratch off todo 2022-04-25 19:01:30 -07:00
Phil Wang
3b520dfa85 bring in attention-based upsampling to strengthen vqgan-vae, seems to work as advertised in initial experiments in GAN 2022-04-25 17:27:45 -07:00
Phil Wang
79198c6ae4 keep readme simple for reader 2022-04-25 17:21:45 -07:00
Phil Wang
77a246b1b9 todo 2022-04-25 08:48:28 -07:00
Phil Wang
f93a3f6ed8 reprioritize 2022-04-25 08:44:27 -07:00
Phil Wang
fb8a66a2de just in case latent diffusion performs better with prediction of x0 instead of epsilon, open up the research avenue 2022-04-24 10:04:22 -07:00
Phil Wang
d5318aef4f todo 2022-04-23 08:23:08 -07:00
Phil Wang
a8b5d5d753 last tweak of readme 2022-04-22 14:16:43 -07:00
Phil Wang
976ef7f87c project management 2022-04-22 14:15:42 -07:00
Phil Wang
fd175bcc0e readme 2022-04-22 14:13:33 -07:00
Phil Wang
76b32f18b3 first pass at complete DALL-E2 + Latent Diffusion integration, latent diffusion on any layer(s) of the cascading ddpm in the decoder. 2022-04-22 13:53:13 -07:00
Phil Wang
f2d5b87677 todo 2022-04-22 11:39:58 -07:00
Phil Wang
ad17c69ab6 prepare for latent diffusion in the first DDPM of the cascade in the Decoder 2022-04-21 17:54:31 -07:00
Phil Wang
0b4ec34efb todo 2022-04-20 12:24:23 -07:00
Phil Wang
f027b82e38 remove wip as main networks (prior and decoder) are completed 2022-04-20 12:12:16 -07:00
Kashif Rasul
1d8f37befe added diffusion-gan thoughts
https://github.com/NVlabs/denoising-diffusion-gan
2022-04-20 21:01:11 +02:00
Phil Wang
b8e8d3c164 thoughts 2022-04-20 11:34:51 -07:00
Phil Wang
8e2416b49b commit to generalizing latent diffusion to one model 2022-04-20 11:27:42 -07:00
Phil Wang
27a33e1b20 complete contextmanager method for keeping only one unet in GPU during training or inference 2022-04-20 10:46:13 -07:00
Phil Wang
6f941a219a give time tokens a surface area of 2 tokens as default, make it so researcher can customize which unet actually is conditioned on image embeddings and/or text encodings 2022-04-20 10:04:47 -07:00
Phil Wang
c26b77ad20 todo 2022-04-19 13:07:32 -07:00
Phil Wang
c5b4aab8e5 intent 2022-04-19 11:00:05 -07:00
Phil Wang
a35c309b5f add sparse attention layers in between convnext blocks in unet (grid like attention, used in mobilevit, maxvit [bytedance ai], as well as a growing number of attention-based GANs) 2022-04-19 09:49:03 -07:00
Phil Wang
a54e309269 prioritize todos, play project management 2022-04-18 13:28:01 -07:00
Phil Wang
c6bfd7fdc8 readme 2022-04-18 12:43:10 -07:00
Phil Wang
960a79857b use some magic just this once to remove the need for researchers to think 2022-04-18 12:40:43 -07:00
Phil Wang
7214df472d todo 2022-04-18 12:18:19 -07:00
Phil Wang
00ae50999b make kernel size and sigma for gaussian blur for cascading DDPM overridable at forward. also make sure unets are wrapped in a modulelist so that at sample time, blurring does not happen 2022-04-18 12:04:31 -07:00
Phil Wang
6cddefad26 readme 2022-04-18 11:52:25 -07:00
Phil Wang
0332eaa6ff complete first pass at full cascading DDPM setup in Decoder, flexible enough to support one unet for testing 2022-04-18 11:44:56 -07:00
Phil Wang
1e939153fb link to AssemblyAI explanation 2022-04-15 12:58:57 -07:00
Phil Wang
1abeb8918e personal project management for next week 2022-04-15 08:04:01 -07:00
Phil Wang
b423855483 commit to jax version 2022-04-15 07:16:25 -07:00
Phil Wang
5b4ee09625 ideation 2022-04-14 13:48:01 -07:00
Phil Wang
9f55c24db6 allow for decoder conditioning with the text encodings from CLIP, if it is passed in. use lazy linear to avoid researchers having to worry about text encoding dimensions, but remove later if it does not work well 2022-04-14 11:46:45 -07:00
Phil Wang
69e822b7f8 "project management" 2022-04-14 10:20:37 -07:00
Phil Wang
68e9883f59 use cross attention for conditioning unet based on image embedding tokens (which opens up the door on conditioning on text encodings as well 2022-04-14 10:10:04 -07:00