Commit Graph

106 Commits

Author SHA1 Message Date
Phil Wang
8cc9016cb0 Merge pull request #17 from kashif/patch-2
added diffusion-gan thoughts
2022-04-20 12:10:26 -07:00
Kashif Rasul
1d8f37befe added diffusion-gan thoughts
https://github.com/NVlabs/denoising-diffusion-gan
2022-04-20 21:01:11 +02:00
Phil Wang
faebf4c8b8 from my vision transformer experience, dimension of attention head of 32 is sufficient for image feature maps 0.0.31 2022-04-20 11:40:32 -07:00
Phil Wang
b8e8d3c164 thoughts 2022-04-20 11:34:51 -07:00
Phil Wang
8e2416b49b commit to generalizing latent diffusion to one model 2022-04-20 11:27:42 -07:00
Phil Wang
f37c26e856 cleanup and DRY a little 2022-04-20 10:56:32 -07:00
Phil Wang
27a33e1b20 complete contextmanager method for keeping only one unet in GPU during training or inference 0.0.28 2022-04-20 10:46:13 -07:00
Phil Wang
6f941a219a give time tokens a surface area of 2 tokens as default, make it so researcher can customize which unet actually is conditioned on image embeddings and/or text encodings 0.0.27 2022-04-20 10:04:47 -07:00
Phil Wang
ddde8ca1bf fix cosine bbeta schedule, thanks to @Zhengxinyang 0.0.26 2022-04-19 20:54:28 -07:00
Phil Wang
c26b77ad20 todo 2022-04-19 13:07:32 -07:00
Phil Wang
c5b4aab8e5 intent 2022-04-19 11:00:05 -07:00
Phil Wang
a35c309b5f add sparse attention layers in between convnext blocks in unet (grid like attention, used in mobilevit, maxvit [bytedance ai], as well as a growing number of attention-based GANs) 0.0.25 2022-04-19 09:49:03 -07:00
Phil Wang
55bdcb98b9 scaffold for latent diffusion 2022-04-19 09:26:58 -07:00
Phil Wang
82328f16cd same for text encodings for decoder ddpm training 0.0.24 2022-04-18 14:41:02 -07:00
Phil Wang
6fee4fce6e also allow for image embedding to be passed into the diffusion model, in the case one wants to generate image embedding once and then train multiple unets in one iteration 0.0.23 2022-04-18 14:00:38 -07:00
Phil Wang
a54e309269 prioritize todos, play project management 2022-04-18 13:28:01 -07:00
Phil Wang
c6bfd7fdc8 readme 2022-04-18 12:43:10 -07:00
Phil Wang
960a79857b use some magic just this once to remove the need for researchers to think 0.0.22 2022-04-18 12:40:43 -07:00
Phil Wang
7214df472d todo 2022-04-18 12:18:19 -07:00
Phil Wang
00ae50999b make kernel size and sigma for gaussian blur for cascading DDPM overridable at forward. also make sure unets are wrapped in a modulelist so that at sample time, blurring does not happen 2022-04-18 12:04:31 -07:00
Phil Wang
6cddefad26 readme 2022-04-18 11:52:25 -07:00
Phil Wang
0332eaa6ff complete first pass at full cascading DDPM setup in Decoder, flexible enough to support one unet for testing 0.0.20 2022-04-18 11:44:56 -07:00
Phil Wang
1cce4225eb 0.0.18 0.0.18 2022-04-17 07:29:34 -07:00
Phil Wang
5ab0700bab Merge pull request #14 from kashif/loss-schedule
added huber loss and other schedulers
2022-04-17 07:29:10 -07:00
Kashif Rasul
b0f2fbaa95 schedule to Prior 2022-04-17 15:21:47 +02:00
Kashif Rasul
51361c2d15 added beta_schedule argument 2022-04-17 15:19:33 +02:00
Kashif Rasul
42d6e47387 added huber loss and other schedulers 2022-04-17 15:14:05 +02:00
Phil Wang
1e939153fb link to AssemblyAI explanation 2022-04-15 12:58:57 -07:00
Phil Wang
1abeb8918e personal project management for next week 2022-04-15 08:04:01 -07:00
Phil Wang
b423855483 commit to jax version 2022-04-15 07:16:25 -07:00
Phil Wang
c400d8758c prepare for cascading diffusion in unet, save the full progressive upsampling architecture to be built next week 0.0.17 2022-04-15 07:03:28 -07:00
Phil Wang
bece206699 fix bug thanks to @jihoonerd 2022-04-15 06:44:40 -07:00
Phil Wang
5b4ee09625 ideation 2022-04-14 13:48:01 -07:00
Phil Wang
6e27f617f1 use t5 relative positional bias in prior network causal transformer, since it makes more sense than rotary embeddings 0.0.15 2022-04-14 12:01:09 -07:00
Phil Wang
9f55c24db6 allow for decoder conditioning with the text encodings from CLIP, if it is passed in. use lazy linear to avoid researchers having to worry about text encoding dimensions, but remove later if it does not work well 0.0.14 2022-04-14 11:46:45 -07:00
Phil Wang
69e822b7f8 "project management" 2022-04-14 10:20:37 -07:00
Phil Wang
23c401a5d5 use the eval decorator 0.0.12 2022-04-14 10:13:43 -07:00
Phil Wang
68e9883f59 use cross attention for conditioning unet based on image embedding tokens (which opens up the door on conditioning on text encodings as well 0.0.11 2022-04-14 10:10:04 -07:00
Phil Wang
95b018374a start using swish glu everywhere, given success of PaLM 0.0.10 2022-04-14 09:34:32 -07:00
Phil Wang
8b5c2385b0 better naming 2022-04-14 09:24:31 -07:00
Phil Wang
f2c52d8239 fix bug with classifier free guidance for prior network, even though it seems it may not be used 0.0.9 2022-04-14 09:21:51 -07:00
Phil Wang
97e951221b bring in blur, as it will be used somewhere in the cascading DDPM in the decoder eventually, once i figure it out 2022-04-14 09:16:09 -07:00
Phil Wang
e1b0c140f1 cleanup readme 2022-04-14 08:51:22 -07:00
Phil Wang
5989569a44 link to OpenCLIP effort 2022-04-14 08:31:15 -07:00
Phil Wang
82464d7bd3 per-fect 2022-04-14 08:30:07 -07:00
Phil Wang
7fb3f695d5 offer continuously parameterized time embedding for diffusion prior network, remove a hyperparameter that may trip up people, if not set correctly 0.0.8 2022-04-14 08:28:11 -07:00
Phil Wang
7e93b9d3c8 make sure classifier free guidance condition scaling is exposed on DALLE2 forward function 0.0.7 2022-04-13 20:14:28 -07:00
Phil Wang
4c827ba94f typo 2022-04-13 19:01:03 -07:00
Phil Wang
cb3923a90f readme tweak 2022-04-13 18:43:34 -07:00
Phil Wang
cc30676a3f lengthen todo 2022-04-13 18:34:09 -07:00