Phil Wang
|
e1b0c140f1
|
cleanup readme
|
2022-04-14 08:51:22 -07:00 |
|
Phil Wang
|
5989569a44
|
link to OpenCLIP effort
|
2022-04-14 08:31:15 -07:00 |
|
Phil Wang
|
82464d7bd3
|
per-fect
|
2022-04-14 08:30:07 -07:00 |
|
Phil Wang
|
7fb3f695d5
|
offer continuously parameterized time embedding for diffusion prior network, remove a hyperparameter that may trip up people, if not set correctly
0.0.8
|
2022-04-14 08:28:11 -07:00 |
|
Phil Wang
|
7e93b9d3c8
|
make sure classifier free guidance condition scaling is exposed on DALLE2 forward function
0.0.7
|
2022-04-13 20:14:28 -07:00 |
|
Phil Wang
|
4c827ba94f
|
typo
|
2022-04-13 19:01:03 -07:00 |
|
Phil Wang
|
cb3923a90f
|
readme tweak
|
2022-04-13 18:43:34 -07:00 |
|
Phil Wang
|
cc30676a3f
|
lengthen todo
|
2022-04-13 18:34:09 -07:00 |
|
Phil Wang
|
c7fb327618
|
link to x-clip
|
2022-04-13 18:26:30 -07:00 |
|
Phil Wang
|
14ddbc159c
|
cleanup
0.0.6a
|
2022-04-13 18:24:32 -07:00 |
|
Phil Wang
|
0692f1699f
|
favorite quote
0.0.6
|
2022-04-13 18:17:59 -07:00 |
|
Phil Wang
|
26c4534bc3
|
readme
|
2022-04-13 18:11:55 -07:00 |
|
Phil Wang
|
5e06cde4cb
|
always work in the l2normed space for image and text embeddings
0.0.5
|
2022-04-13 18:08:42 -07:00 |
|
Phil Wang
|
a1a8a78f21
|
fix everything and make sure it runs end to end, document everything in readme for public
|
2022-04-13 18:05:25 -07:00 |
|
Phil Wang
|
e5e415297c
|
prepare non-causal attention, for use in the unet in the decoder
|
2022-04-13 12:04:09 -07:00 |
|
Phil Wang
|
c9377efc93
|
go for the multi-headed queries, one-headed key/values, proven out in AlphaCode as well as PaLM by now
|
2022-04-13 12:01:43 -07:00 |
|
Phil Wang
|
2a424b6a28
|
readme
|
2022-04-13 10:58:06 -07:00 |
|
Phil Wang
|
d3cded3c6c
|
complete logic in diffusion prior for sampling more than 1 image embeds, taking top similarity
|
2022-04-13 10:52:31 -07:00 |
|
Phil Wang
|
d573c82f8c
|
add one full attention at the middle of the unet, prepare to do efficient attention employing every trick i know from vision transformer literature
|
2022-04-13 10:39:06 -07:00 |
|
Phil Wang
|
3aa6f91e7a
|
be transparent
|
2022-04-13 10:32:11 -07:00 |
|
Phil Wang
|
1bf071af78
|
allow for predicting image embedding directly during diffusion training. need to fix sampling still
|
2022-04-13 10:29:29 -07:00 |
|
Phil Wang
|
9f1fe6c7ae
|
update todo
|
2022-04-13 10:09:08 -07:00 |
|
Phil Wang
|
791d27326a
|
add diffusion code for the image embedding. nearly all the code is there except for the cascading ddpm in the decoder (with upscaling etc)
|
2022-04-13 10:06:52 -07:00 |
|
Phil Wang
|
6d4e9c97bf
|
todo
|
2022-04-12 20:50:29 -07:00 |
|
Phil Wang
|
40140b54d6
|
put on project manager hat
|
2022-04-12 17:51:23 -07:00 |
|
Phil Wang
|
33d69d3859
|
take care of DDPM decoder (DDPM for producing image embedding will have a separate objective, predicting directly the embedding rather than the noise [epsilon in paper])
|
2022-04-12 17:48:41 -07:00 |
|
Phil Wang
|
862e5ba50e
|
more sketches to base dalle2 class
|
2022-04-12 17:31:01 -07:00 |
|
Phil Wang
|
25d980ebbf
|
complete naive conditioning of unet with image embedding, with ability to dropout for classifier free guidance
|
2022-04-12 17:27:39 -07:00 |
|
Phil Wang
|
d546a615c0
|
complete helper methods for doing condition scaling (classifier free guidance), for decoder unet and prior network
|
2022-04-12 16:11:16 -07:00 |
|
Phil Wang
|
d4c8373635
|
complete conditional dropout mask creation for both prior network as well as image decoder unet for classifier free guidance
|
2022-04-12 14:04:08 -07:00 |
|
Phil Wang
|
c814b2b278
|
sponsor project button
|
2022-04-12 13:34:02 -07:00 |
|
Phil Wang
|
74aec9d8ca
|
further prepare attention for classifier free guidance
|
2022-04-12 13:01:18 -07:00 |
|
Phil Wang
|
7647be2569
|
prep for classifier free guidance for the image embedding diffusion step, even though not mentioned in paper
|
2022-04-12 12:57:09 -07:00 |
|
Phil Wang
|
59b8abe09e
|
prepare unet to be conditioned on image embedding, optionally text encodings, and reminder for self to build conditional dropout for classifier free guidance
|
2022-04-12 12:38:56 -07:00 |
|
Phil Wang
|
46dde54948
|
for integration of X-CLIP automagically in the gaussian diffusion classes
|
2022-04-12 12:17:34 -07:00 |
|
Phil Wang
|
40aa304b7e
|
rename to DiffusionPriorNetwork in case ARPriorNetwork is ever built
|
2022-04-12 11:45:57 -07:00 |
|
Phil Wang
|
fd38eb83c4
|
complete the main contribution of the paper, the diffusion prior network, minus the diffusion training setup
|
2022-04-12 11:43:59 -07:00 |
|
Phil Wang
|
83aabd42ca
|
move epsilon inside of square root for further stability in rmsnorm
improvise and use rmsnorm in convnext blocks too
|
2022-04-12 11:18:36 -07:00 |
|
Phil Wang
|
cf22affcbb
|
bring in modified unet using convnext blocks https://arxiv.org/abs/2201.03545
|
2022-04-12 10:58:44 -07:00 |
|
Phil Wang
|
522f42f582
|
start using RMSNorm, used in Gopher and AlphaCode, and as a way to go complete bias-less (purportedly more stable according to PaLM)
|
2022-04-12 10:45:03 -07:00 |
|
Phil Wang
|
0a60818965
|
dropouts in transformer, also prep for classifier free guidance in decoder
|
2022-04-12 10:42:57 -07:00 |
|
Phil Wang
|
604765b563
|
readme
|
2022-04-12 10:35:56 -07:00 |
|
Phil Wang
|
7bbc62f3d5
|
bring in pillow, for image encoding to and from
|
2022-04-12 10:29:55 -07:00 |
|
Phil Wang
|
771fe0d0d2
|
also consider accepting tokenizer, so dalle2 forward pass can just be invoked as DALLE2(<prompt string>)
|
2022-04-12 10:29:29 -07:00 |
|
Phil Wang
|
de75a8af76
|
link to yannic, since he is the best
|
2022-04-12 10:27:01 -07:00 |
|
Phil Wang
|
df4dac4f5a
|
bring in attention - it is all we need
|
2022-04-12 10:23:07 -07:00 |
|
Phil Wang
|
24b428bdfc
|
readme
|
2022-04-12 10:12:42 -07:00 |
|
Phil Wang
|
2ab042b862
|
create the eventual dream cli, like bigsleep library
|
2022-04-12 10:04:17 -07:00 |
|
Phil Wang
|
b93ad8b7a2
|
add cli file, use click
|
2022-04-12 09:58:53 -07:00 |
|
Phil Wang
|
f5e0aea140
|
get ready for CLI tool, just like stylegan2_pytorch
0.0.2
|
2022-04-12 09:57:54 -07:00 |
|