Phil Wang
|
1cce4225eb
|
0.0.18
|
2022-04-17 07:29:34 -07:00 |
|
Phil Wang
|
c400d8758c
|
prepare for cascading diffusion in unet, save the full progressive upsampling architecture to be built next week
|
2022-04-15 07:03:28 -07:00 |
|
Phil Wang
|
bece206699
|
fix bug thanks to @jihoonerd
|
2022-04-15 06:44:40 -07:00 |
|
Phil Wang
|
6e27f617f1
|
use t5 relative positional bias in prior network causal transformer, since it makes more sense than rotary embeddings
|
2022-04-14 12:01:09 -07:00 |
|
Phil Wang
|
9f55c24db6
|
allow for decoder conditioning with the text encodings from CLIP, if it is passed in. use lazy linear to avoid researchers having to worry about text encoding dimensions, but remove later if it does not work well
|
2022-04-14 11:46:45 -07:00 |
|
Phil Wang
|
23c401a5d5
|
use the eval decorator
|
2022-04-14 10:13:43 -07:00 |
|
Phil Wang
|
68e9883f59
|
use cross attention for conditioning unet based on image embedding tokens (which opens up the door on conditioning on text encodings as well
|
2022-04-14 10:10:04 -07:00 |
|
Phil Wang
|
95b018374a
|
start using swish glu everywhere, given success of PaLM
|
2022-04-14 09:34:32 -07:00 |
|
Phil Wang
|
f2c52d8239
|
fix bug with classifier free guidance for prior network, even though it seems it may not be used
|
2022-04-14 09:21:51 -07:00 |
|
Phil Wang
|
97e951221b
|
bring in blur, as it will be used somewhere in the cascading DDPM in the decoder eventually, once i figure it out
|
2022-04-14 09:16:09 -07:00 |
|
Phil Wang
|
7fb3f695d5
|
offer continuously parameterized time embedding for diffusion prior network, remove a hyperparameter that may trip up people, if not set correctly
|
2022-04-14 08:28:11 -07:00 |
|
Phil Wang
|
7e93b9d3c8
|
make sure classifier free guidance condition scaling is exposed on DALLE2 forward function
|
2022-04-13 20:14:28 -07:00 |
|
Phil Wang
|
14ddbc159c
|
cleanup
|
2022-04-13 18:24:32 -07:00 |
|
Phil Wang
|
5e06cde4cb
|
always work in the l2normed space for image and text embeddings
|
2022-04-13 18:08:42 -07:00 |
|
Phil Wang
|
a1a8a78f21
|
fix everything and make sure it runs end to end, document everything in readme for public
|
2022-04-13 18:05:25 -07:00 |
|
Phil Wang
|
791d27326a
|
add diffusion code for the image embedding. nearly all the code is there except for the cascading ddpm in the decoder (with upscaling etc)
|
2022-04-13 10:06:52 -07:00 |
|
Phil Wang
|
33d69d3859
|
take care of DDPM decoder (DDPM for producing image embedding will have a separate objective, predicting directly the embedding rather than the noise [epsilon in paper])
|
2022-04-12 17:48:41 -07:00 |
|
Phil Wang
|
46dde54948
|
for integration of X-CLIP automagically in the gaussian diffusion classes
|
2022-04-12 12:17:34 -07:00 |
|
Phil Wang
|
fd38eb83c4
|
complete the main contribution of the paper, the diffusion prior network, minus the diffusion training setup
|
2022-04-12 11:43:59 -07:00 |
|
Phil Wang
|
7bbc62f3d5
|
bring in pillow, for image encoding to and from
|
2022-04-12 10:29:55 -07:00 |
|
Phil Wang
|
2ab042b862
|
create the eventual dream cli, like bigsleep library
|
2022-04-12 10:04:17 -07:00 |
|
Phil Wang
|
f5e0aea140
|
get ready for CLI tool, just like stylegan2_pytorch
|
2022-04-12 09:57:54 -07:00 |
|
Phil Wang
|
7cf1637d24
|
bring in the simple tokenizer released by openai, but also plan on leaving room for custom tokenizer with yttm
|
2022-04-12 09:23:17 -07:00 |
|
Phil Wang
|
4ff6d021c9
|
pin to newer version of CLIP that returns encoded text and images, get some helper functions ready for XCLIP
|
2022-04-12 08:54:47 -07:00 |
|
Phil Wang
|
850271e2d9
|
bring in x-clip
|
2022-04-08 12:19:31 -07:00 |
|
Phil Wang
|
f283bf25be
|
scaffold
|
2022-04-07 07:29:34 -07:00 |
|