Commit Graph

328 Commits

Author SHA1 Message Date
Phil Wang
f93a3f6ed8 reprioritize 2022-04-25 08:44:27 -07:00
Phil Wang
8f2a0c7e00 better naming 0.0.43 2022-04-25 07:44:33 -07:00
Phil Wang
863f4ef243 just take care of the logic for setting all latent diffusion to predict x0, if needed 0.0.42 2022-04-24 10:06:42 -07:00
Phil Wang
fb8a66a2de just in case latent diffusion performs better with prediction of x0 instead of epsilon, open up the research avenue 0.0.41 2022-04-24 10:04:22 -07:00
Phil Wang
579d4b42dd does not seem right to clip for the prior diffusion part 0.0.40 2022-04-24 09:51:18 -07:00
Phil Wang
473808850a some outlines to the eventual CLI endpoint 2022-04-24 09:27:15 -07:00
Phil Wang
d5318aef4f todo 2022-04-23 08:23:08 -07:00
Phil Wang
f82917e1fd prepare for turning off gradient penalty, as shown in GAN literature, GP needs to be only applied 1 out of 4 iterations 0.0.39 2022-04-23 07:52:10 -07:00
Phil Wang
05b74be69a use null container pattern to cleanup some conditionals, save more cleanup for next week 0.0.38 2022-04-22 15:23:18 -07:00
Phil Wang
a8b5d5d753 last tweak of readme 2022-04-22 14:16:43 -07:00
Phil Wang
976ef7f87c project management 2022-04-22 14:15:42 -07:00
Phil Wang
fd175bcc0e readme 2022-04-22 14:13:33 -07:00
Phil Wang
76b32f18b3 first pass at complete DALL-E2 + Latent Diffusion integration, latent diffusion on any layer(s) of the cascading ddpm in the decoder. 2022-04-22 13:53:13 -07:00
Phil Wang
f2d5b87677 todo 2022-04-22 11:39:58 -07:00
Phil Wang
461347c171 fix vqgan-vae for latent diffusion 0.0.36 2022-04-22 11:38:57 -07:00
Phil Wang
46cef31c86 optional projection out for prior network causal transformer 0.0.35 2022-04-22 11:16:30 -07:00
Phil Wang
59b1a77d4d be a bit more conservative and stick with layernorm (without bias) for now, given @borisdayma results https://twitter.com/borisdayma/status/1517227191477571585 0.0.34 2022-04-22 11:14:54 -07:00
Phil Wang
7f338319fd makes more sense for blur augmentation to happen before the upsampling 0.0.33 2022-04-22 11:10:47 -07:00
Phil Wang
2c6c91829d refactor blurring training augmentation to be taken care of by the decoder, with option to downsample to previous resolution before upsampling (cascading ddpm). this opens up the possibility of cascading latent ddpm 0.0.32 2022-04-22 11:09:17 -07:00
Phil Wang
ad17c69ab6 prepare for latent diffusion in the first DDPM of the cascade in the Decoder 2022-04-21 17:54:31 -07:00
Phil Wang
0b4ec34efb todo 2022-04-20 12:24:23 -07:00
Phil Wang
f027b82e38 remove wip as main networks (prior and decoder) are completed 2022-04-20 12:12:16 -07:00
Phil Wang
8cc9016cb0 Merge pull request #17 from kashif/patch-2
added diffusion-gan thoughts
2022-04-20 12:10:26 -07:00
Kashif Rasul
1d8f37befe added diffusion-gan thoughts
https://github.com/NVlabs/denoising-diffusion-gan
2022-04-20 21:01:11 +02:00
Phil Wang
faebf4c8b8 from my vision transformer experience, dimension of attention head of 32 is sufficient for image feature maps 0.0.31 2022-04-20 11:40:32 -07:00
Phil Wang
b8e8d3c164 thoughts 2022-04-20 11:34:51 -07:00
Phil Wang
8e2416b49b commit to generalizing latent diffusion to one model 2022-04-20 11:27:42 -07:00
Phil Wang
f37c26e856 cleanup and DRY a little 2022-04-20 10:56:32 -07:00
Phil Wang
27a33e1b20 complete contextmanager method for keeping only one unet in GPU during training or inference 0.0.28 2022-04-20 10:46:13 -07:00
Phil Wang
6f941a219a give time tokens a surface area of 2 tokens as default, make it so researcher can customize which unet actually is conditioned on image embeddings and/or text encodings 0.0.27 2022-04-20 10:04:47 -07:00
Phil Wang
ddde8ca1bf fix cosine bbeta schedule, thanks to @Zhengxinyang 0.0.26 2022-04-19 20:54:28 -07:00
Phil Wang
c26b77ad20 todo 2022-04-19 13:07:32 -07:00
Phil Wang
c5b4aab8e5 intent 2022-04-19 11:00:05 -07:00
Phil Wang
a35c309b5f add sparse attention layers in between convnext blocks in unet (grid like attention, used in mobilevit, maxvit [bytedance ai], as well as a growing number of attention-based GANs) 0.0.25 2022-04-19 09:49:03 -07:00
Phil Wang
55bdcb98b9 scaffold for latent diffusion 2022-04-19 09:26:58 -07:00
Phil Wang
82328f16cd same for text encodings for decoder ddpm training 0.0.24 2022-04-18 14:41:02 -07:00
Phil Wang
6fee4fce6e also allow for image embedding to be passed into the diffusion model, in the case one wants to generate image embedding once and then train multiple unets in one iteration 0.0.23 2022-04-18 14:00:38 -07:00
Phil Wang
a54e309269 prioritize todos, play project management 2022-04-18 13:28:01 -07:00
Phil Wang
c6bfd7fdc8 readme 2022-04-18 12:43:10 -07:00
Phil Wang
960a79857b use some magic just this once to remove the need for researchers to think 0.0.22 2022-04-18 12:40:43 -07:00
Phil Wang
7214df472d todo 2022-04-18 12:18:19 -07:00
Phil Wang
00ae50999b make kernel size and sigma for gaussian blur for cascading DDPM overridable at forward. also make sure unets are wrapped in a modulelist so that at sample time, blurring does not happen 2022-04-18 12:04:31 -07:00
Phil Wang
6cddefad26 readme 2022-04-18 11:52:25 -07:00
Phil Wang
0332eaa6ff complete first pass at full cascading DDPM setup in Decoder, flexible enough to support one unet for testing 0.0.20 2022-04-18 11:44:56 -07:00
Phil Wang
1cce4225eb 0.0.18 0.0.18 2022-04-17 07:29:34 -07:00
Phil Wang
5ab0700bab Merge pull request #14 from kashif/loss-schedule
added huber loss and other schedulers
2022-04-17 07:29:10 -07:00
Kashif Rasul
b0f2fbaa95 schedule to Prior 2022-04-17 15:21:47 +02:00
Kashif Rasul
51361c2d15 added beta_schedule argument 2022-04-17 15:19:33 +02:00
Kashif Rasul
42d6e47387 added huber loss and other schedulers 2022-04-17 15:14:05 +02:00
Phil Wang
1e939153fb link to AssemblyAI explanation 2022-04-15 12:58:57 -07:00