Commit Graph

69 Commits

Author SHA1 Message Date
Phil Wang
68e9883f59 use cross attention for conditioning unet based on image embedding tokens (which opens up the door on conditioning on text encodings as well 0.0.11 2022-04-14 10:10:04 -07:00
Phil Wang
95b018374a start using swish glu everywhere, given success of PaLM 0.0.10 2022-04-14 09:34:32 -07:00
Phil Wang
8b5c2385b0 better naming 2022-04-14 09:24:31 -07:00
Phil Wang
f2c52d8239 fix bug with classifier free guidance for prior network, even though it seems it may not be used 0.0.9 2022-04-14 09:21:51 -07:00
Phil Wang
97e951221b bring in blur, as it will be used somewhere in the cascading DDPM in the decoder eventually, once i figure it out 2022-04-14 09:16:09 -07:00
Phil Wang
e1b0c140f1 cleanup readme 2022-04-14 08:51:22 -07:00
Phil Wang
5989569a44 link to OpenCLIP effort 2022-04-14 08:31:15 -07:00
Phil Wang
82464d7bd3 per-fect 2022-04-14 08:30:07 -07:00
Phil Wang
7fb3f695d5 offer continuously parameterized time embedding for diffusion prior network, remove a hyperparameter that may trip up people, if not set correctly 0.0.8 2022-04-14 08:28:11 -07:00
Phil Wang
7e93b9d3c8 make sure classifier free guidance condition scaling is exposed on DALLE2 forward function 0.0.7 2022-04-13 20:14:28 -07:00
Phil Wang
4c827ba94f typo 2022-04-13 19:01:03 -07:00
Phil Wang
cb3923a90f readme tweak 2022-04-13 18:43:34 -07:00
Phil Wang
cc30676a3f lengthen todo 2022-04-13 18:34:09 -07:00
Phil Wang
c7fb327618 link to x-clip 2022-04-13 18:26:30 -07:00
Phil Wang
14ddbc159c cleanup 0.0.6a 2022-04-13 18:24:32 -07:00
Phil Wang
0692f1699f favorite quote 0.0.6 2022-04-13 18:17:59 -07:00
Phil Wang
26c4534bc3 readme 2022-04-13 18:11:55 -07:00
Phil Wang
5e06cde4cb always work in the l2normed space for image and text embeddings 0.0.5 2022-04-13 18:08:42 -07:00
Phil Wang
a1a8a78f21 fix everything and make sure it runs end to end, document everything in readme for public 2022-04-13 18:05:25 -07:00
Phil Wang
e5e415297c prepare non-causal attention, for use in the unet in the decoder 2022-04-13 12:04:09 -07:00
Phil Wang
c9377efc93 go for the multi-headed queries, one-headed key/values, proven out in AlphaCode as well as PaLM by now 2022-04-13 12:01:43 -07:00
Phil Wang
2a424b6a28 readme 2022-04-13 10:58:06 -07:00
Phil Wang
d3cded3c6c complete logic in diffusion prior for sampling more than 1 image embeds, taking top similarity 2022-04-13 10:52:31 -07:00
Phil Wang
d573c82f8c add one full attention at the middle of the unet, prepare to do efficient attention employing every trick i know from vision transformer literature 2022-04-13 10:39:06 -07:00
Phil Wang
3aa6f91e7a be transparent 2022-04-13 10:32:11 -07:00
Phil Wang
1bf071af78 allow for predicting image embedding directly during diffusion training. need to fix sampling still 2022-04-13 10:29:29 -07:00
Phil Wang
9f1fe6c7ae update todo 2022-04-13 10:09:08 -07:00
Phil Wang
791d27326a add diffusion code for the image embedding. nearly all the code is there except for the cascading ddpm in the decoder (with upscaling etc) 2022-04-13 10:06:52 -07:00
Phil Wang
6d4e9c97bf todo 2022-04-12 20:50:29 -07:00
Phil Wang
40140b54d6 put on project manager hat 2022-04-12 17:51:23 -07:00
Phil Wang
33d69d3859 take care of DDPM decoder (DDPM for producing image embedding will have a separate objective, predicting directly the embedding rather than the noise [epsilon in paper]) 2022-04-12 17:48:41 -07:00
Phil Wang
862e5ba50e more sketches to base dalle2 class 2022-04-12 17:31:01 -07:00
Phil Wang
25d980ebbf complete naive conditioning of unet with image embedding, with ability to dropout for classifier free guidance 2022-04-12 17:27:39 -07:00
Phil Wang
d546a615c0 complete helper methods for doing condition scaling (classifier free guidance), for decoder unet and prior network 2022-04-12 16:11:16 -07:00
Phil Wang
d4c8373635 complete conditional dropout mask creation for both prior network as well as image decoder unet for classifier free guidance 2022-04-12 14:04:08 -07:00
Phil Wang
c814b2b278 sponsor project button 2022-04-12 13:34:02 -07:00
Phil Wang
74aec9d8ca further prepare attention for classifier free guidance 2022-04-12 13:01:18 -07:00
Phil Wang
7647be2569 prep for classifier free guidance for the image embedding diffusion step, even though not mentioned in paper 2022-04-12 12:57:09 -07:00
Phil Wang
59b8abe09e prepare unet to be conditioned on image embedding, optionally text encodings, and reminder for self to build conditional dropout for classifier free guidance 2022-04-12 12:38:56 -07:00
Phil Wang
46dde54948 for integration of X-CLIP automagically in the gaussian diffusion classes 2022-04-12 12:17:34 -07:00
Phil Wang
40aa304b7e rename to DiffusionPriorNetwork in case ARPriorNetwork is ever built 2022-04-12 11:45:57 -07:00
Phil Wang
fd38eb83c4 complete the main contribution of the paper, the diffusion prior network, minus the diffusion training setup 2022-04-12 11:43:59 -07:00
Phil Wang
83aabd42ca move epsilon inside of square root for further stability in rmsnorm
improvise and use rmsnorm in convnext blocks too
2022-04-12 11:18:36 -07:00
Phil Wang
cf22affcbb bring in modified unet using convnext blocks https://arxiv.org/abs/2201.03545 2022-04-12 10:58:44 -07:00
Phil Wang
522f42f582 start using RMSNorm, used in Gopher and AlphaCode, and as a way to go complete bias-less (purportedly more stable according to PaLM) 2022-04-12 10:45:03 -07:00
Phil Wang
0a60818965 dropouts in transformer, also prep for classifier free guidance in decoder 2022-04-12 10:42:57 -07:00
Phil Wang
604765b563 readme 2022-04-12 10:35:56 -07:00
Phil Wang
7bbc62f3d5 bring in pillow, for image encoding to and from 2022-04-12 10:29:55 -07:00
Phil Wang
771fe0d0d2 also consider accepting tokenizer, so dalle2 forward pass can just be invoked as DALLE2(<prompt string>) 2022-04-12 10:29:29 -07:00
Phil Wang
de75a8af76 link to yannic, since he is the best 2022-04-12 10:27:01 -07:00