Commit Graph

166 Commits

Author SHA1 Message Date
Phil Wang
473808850a some outlines to the eventual CLI endpoint 2022-04-24 09:27:15 -07:00
Phil Wang
05b74be69a use null container pattern to cleanup some conditionals, save more cleanup for next week 2022-04-22 15:23:18 -07:00
Phil Wang
76b32f18b3 first pass at complete DALL-E2 + Latent Diffusion integration, latent diffusion on any layer(s) of the cascading ddpm in the decoder. 2022-04-22 13:53:13 -07:00
Phil Wang
46cef31c86 optional projection out for prior network causal transformer 2022-04-22 11:16:30 -07:00
Phil Wang
59b1a77d4d be a bit more conservative and stick with layernorm (without bias) for now, given @borisdayma results https://twitter.com/borisdayma/status/1517227191477571585 2022-04-22 11:14:54 -07:00
Phil Wang
7f338319fd makes more sense for blur augmentation to happen before the upsampling 2022-04-22 11:10:47 -07:00
Phil Wang
2c6c91829d refactor blurring training augmentation to be taken care of by the decoder, with option to downsample to previous resolution before upsampling (cascading ddpm). this opens up the possibility of cascading latent ddpm 2022-04-22 11:09:17 -07:00
Phil Wang
ad17c69ab6 prepare for latent diffusion in the first DDPM of the cascade in the Decoder 2022-04-21 17:54:31 -07:00
Phil Wang
faebf4c8b8 from my vision transformer experience, dimension of attention head of 32 is sufficient for image feature maps 2022-04-20 11:40:32 -07:00
Phil Wang
f37c26e856 cleanup and DRY a little 2022-04-20 10:56:32 -07:00
Phil Wang
27a33e1b20 complete contextmanager method for keeping only one unet in GPU during training or inference 2022-04-20 10:46:13 -07:00
Phil Wang
6f941a219a give time tokens a surface area of 2 tokens as default, make it so researcher can customize which unet actually is conditioned on image embeddings and/or text encodings 2022-04-20 10:04:47 -07:00
Phil Wang
ddde8ca1bf fix cosine bbeta schedule, thanks to @Zhengxinyang 2022-04-19 20:54:28 -07:00
Phil Wang
a35c309b5f add sparse attention layers in between convnext blocks in unet (grid like attention, used in mobilevit, maxvit [bytedance ai], as well as a growing number of attention-based GANs) 2022-04-19 09:49:03 -07:00
Phil Wang
82328f16cd same for text encodings for decoder ddpm training 2022-04-18 14:41:02 -07:00
Phil Wang
6fee4fce6e also allow for image embedding to be passed into the diffusion model, in the case one wants to generate image embedding once and then train multiple unets in one iteration 2022-04-18 14:00:38 -07:00
Phil Wang
960a79857b use some magic just this once to remove the need for researchers to think 2022-04-18 12:40:43 -07:00
Phil Wang
00ae50999b make kernel size and sigma for gaussian blur for cascading DDPM overridable at forward. also make sure unets are wrapped in a modulelist so that at sample time, blurring does not happen 2022-04-18 12:04:31 -07:00
Phil Wang
0332eaa6ff complete first pass at full cascading DDPM setup in Decoder, flexible enough to support one unet for testing 2022-04-18 11:44:56 -07:00
Kashif Rasul
b0f2fbaa95 schedule to Prior 2022-04-17 15:21:47 +02:00
Kashif Rasul
51361c2d15 added beta_schedule argument 2022-04-17 15:19:33 +02:00
Kashif Rasul
42d6e47387 added huber loss and other schedulers 2022-04-17 15:14:05 +02:00
Phil Wang
c400d8758c prepare for cascading diffusion in unet, save the full progressive upsampling architecture to be built next week 2022-04-15 07:03:28 -07:00
Phil Wang
bece206699 fix bug thanks to @jihoonerd 2022-04-15 06:44:40 -07:00
Phil Wang
6e27f617f1 use t5 relative positional bias in prior network causal transformer, since it makes more sense than rotary embeddings 2022-04-14 12:01:09 -07:00
Phil Wang
9f55c24db6 allow for decoder conditioning with the text encodings from CLIP, if it is passed in. use lazy linear to avoid researchers having to worry about text encoding dimensions, but remove later if it does not work well 2022-04-14 11:46:45 -07:00
Phil Wang
23c401a5d5 use the eval decorator 2022-04-14 10:13:43 -07:00
Phil Wang
68e9883f59 use cross attention for conditioning unet based on image embedding tokens (which opens up the door on conditioning on text encodings as well 2022-04-14 10:10:04 -07:00
Phil Wang
95b018374a start using swish glu everywhere, given success of PaLM 2022-04-14 09:34:32 -07:00
Phil Wang
8b5c2385b0 better naming 2022-04-14 09:24:31 -07:00
Phil Wang
f2c52d8239 fix bug with classifier free guidance for prior network, even though it seems it may not be used 2022-04-14 09:21:51 -07:00
Phil Wang
97e951221b bring in blur, as it will be used somewhere in the cascading DDPM in the decoder eventually, once i figure it out 2022-04-14 09:16:09 -07:00
Phil Wang
82464d7bd3 per-fect 2022-04-14 08:30:07 -07:00
Phil Wang
7fb3f695d5 offer continuously parameterized time embedding for diffusion prior network, remove a hyperparameter that may trip up people, if not set correctly 2022-04-14 08:28:11 -07:00
Phil Wang
7e93b9d3c8 make sure classifier free guidance condition scaling is exposed on DALLE2 forward function 2022-04-13 20:14:28 -07:00
Phil Wang
14ddbc159c cleanup 2022-04-13 18:24:32 -07:00
Phil Wang
5e06cde4cb always work in the l2normed space for image and text embeddings 2022-04-13 18:08:42 -07:00
Phil Wang
a1a8a78f21 fix everything and make sure it runs end to end, document everything in readme for public 2022-04-13 18:05:25 -07:00
Phil Wang
e5e415297c prepare non-causal attention, for use in the unet in the decoder 2022-04-13 12:04:09 -07:00
Phil Wang
c9377efc93 go for the multi-headed queries, one-headed key/values, proven out in AlphaCode as well as PaLM by now 2022-04-13 12:01:43 -07:00
Phil Wang
d3cded3c6c complete logic in diffusion prior for sampling more than 1 image embeds, taking top similarity 2022-04-13 10:52:31 -07:00
Phil Wang
d573c82f8c add one full attention at the middle of the unet, prepare to do efficient attention employing every trick i know from vision transformer literature 2022-04-13 10:39:06 -07:00
Phil Wang
3aa6f91e7a be transparent 2022-04-13 10:32:11 -07:00
Phil Wang
1bf071af78 allow for predicting image embedding directly during diffusion training. need to fix sampling still 2022-04-13 10:29:29 -07:00
Phil Wang
791d27326a add diffusion code for the image embedding. nearly all the code is there except for the cascading ddpm in the decoder (with upscaling etc) 2022-04-13 10:06:52 -07:00
Phil Wang
33d69d3859 take care of DDPM decoder (DDPM for producing image embedding will have a separate objective, predicting directly the embedding rather than the noise [epsilon in paper]) 2022-04-12 17:48:41 -07:00
Phil Wang
862e5ba50e more sketches to base dalle2 class 2022-04-12 17:31:01 -07:00
Phil Wang
25d980ebbf complete naive conditioning of unet with image embedding, with ability to dropout for classifier free guidance 2022-04-12 17:27:39 -07:00
Phil Wang
d546a615c0 complete helper methods for doing condition scaling (classifier free guidance), for decoder unet and prior network 2022-04-12 16:11:16 -07:00
Phil Wang
d4c8373635 complete conditional dropout mask creation for both prior network as well as image decoder unet for classifier free guidance 2022-04-12 14:04:08 -07:00