Phil Wang
|
13a58a78c4
|
scratch off todo
|
2022-04-25 19:01:30 -07:00 |
|
Phil Wang
|
3b520dfa85
|
bring in attention-based upsampling to strengthen vqgan-vae, seems to work as advertised in initial experiments in GAN
|
2022-04-25 17:27:45 -07:00 |
|
Phil Wang
|
79198c6ae4
|
keep readme simple for reader
|
2022-04-25 17:21:45 -07:00 |
|
Phil Wang
|
77a246b1b9
|
todo
|
2022-04-25 08:48:28 -07:00 |
|
Phil Wang
|
f93a3f6ed8
|
reprioritize
|
2022-04-25 08:44:27 -07:00 |
|
Phil Wang
|
fb8a66a2de
|
just in case latent diffusion performs better with prediction of x0 instead of epsilon, open up the research avenue
|
2022-04-24 10:04:22 -07:00 |
|
Phil Wang
|
d5318aef4f
|
todo
|
2022-04-23 08:23:08 -07:00 |
|
Phil Wang
|
a8b5d5d753
|
last tweak of readme
|
2022-04-22 14:16:43 -07:00 |
|
Phil Wang
|
976ef7f87c
|
project management
|
2022-04-22 14:15:42 -07:00 |
|
Phil Wang
|
fd175bcc0e
|
readme
|
2022-04-22 14:13:33 -07:00 |
|
Phil Wang
|
76b32f18b3
|
first pass at complete DALL-E2 + Latent Diffusion integration, latent diffusion on any layer(s) of the cascading ddpm in the decoder.
|
2022-04-22 13:53:13 -07:00 |
|
Phil Wang
|
f2d5b87677
|
todo
|
2022-04-22 11:39:58 -07:00 |
|
Phil Wang
|
ad17c69ab6
|
prepare for latent diffusion in the first DDPM of the cascade in the Decoder
|
2022-04-21 17:54:31 -07:00 |
|
Phil Wang
|
0b4ec34efb
|
todo
|
2022-04-20 12:24:23 -07:00 |
|
Phil Wang
|
f027b82e38
|
remove wip as main networks (prior and decoder) are completed
|
2022-04-20 12:12:16 -07:00 |
|
Kashif Rasul
|
1d8f37befe
|
added diffusion-gan thoughts
https://github.com/NVlabs/denoising-diffusion-gan
|
2022-04-20 21:01:11 +02:00 |
|
Phil Wang
|
b8e8d3c164
|
thoughts
|
2022-04-20 11:34:51 -07:00 |
|
Phil Wang
|
8e2416b49b
|
commit to generalizing latent diffusion to one model
|
2022-04-20 11:27:42 -07:00 |
|
Phil Wang
|
27a33e1b20
|
complete contextmanager method for keeping only one unet in GPU during training or inference
|
2022-04-20 10:46:13 -07:00 |
|
Phil Wang
|
6f941a219a
|
give time tokens a surface area of 2 tokens as default, make it so researcher can customize which unet actually is conditioned on image embeddings and/or text encodings
|
2022-04-20 10:04:47 -07:00 |
|
Phil Wang
|
c26b77ad20
|
todo
|
2022-04-19 13:07:32 -07:00 |
|
Phil Wang
|
c5b4aab8e5
|
intent
|
2022-04-19 11:00:05 -07:00 |
|
Phil Wang
|
a35c309b5f
|
add sparse attention layers in between convnext blocks in unet (grid like attention, used in mobilevit, maxvit [bytedance ai], as well as a growing number of attention-based GANs)
|
2022-04-19 09:49:03 -07:00 |
|
Phil Wang
|
a54e309269
|
prioritize todos, play project management
|
2022-04-18 13:28:01 -07:00 |
|
Phil Wang
|
c6bfd7fdc8
|
readme
|
2022-04-18 12:43:10 -07:00 |
|
Phil Wang
|
960a79857b
|
use some magic just this once to remove the need for researchers to think
|
2022-04-18 12:40:43 -07:00 |
|
Phil Wang
|
7214df472d
|
todo
|
2022-04-18 12:18:19 -07:00 |
|
Phil Wang
|
00ae50999b
|
make kernel size and sigma for gaussian blur for cascading DDPM overridable at forward. also make sure unets are wrapped in a modulelist so that at sample time, blurring does not happen
|
2022-04-18 12:04:31 -07:00 |
|
Phil Wang
|
6cddefad26
|
readme
|
2022-04-18 11:52:25 -07:00 |
|
Phil Wang
|
0332eaa6ff
|
complete first pass at full cascading DDPM setup in Decoder, flexible enough to support one unet for testing
|
2022-04-18 11:44:56 -07:00 |
|
Phil Wang
|
1e939153fb
|
link to AssemblyAI explanation
|
2022-04-15 12:58:57 -07:00 |
|
Phil Wang
|
1abeb8918e
|
personal project management for next week
|
2022-04-15 08:04:01 -07:00 |
|
Phil Wang
|
b423855483
|
commit to jax version
|
2022-04-15 07:16:25 -07:00 |
|
Phil Wang
|
5b4ee09625
|
ideation
|
2022-04-14 13:48:01 -07:00 |
|
Phil Wang
|
9f55c24db6
|
allow for decoder conditioning with the text encodings from CLIP, if it is passed in. use lazy linear to avoid researchers having to worry about text encoding dimensions, but remove later if it does not work well
|
2022-04-14 11:46:45 -07:00 |
|
Phil Wang
|
69e822b7f8
|
"project management"
|
2022-04-14 10:20:37 -07:00 |
|
Phil Wang
|
68e9883f59
|
use cross attention for conditioning unet based on image embedding tokens (which opens up the door on conditioning on text encodings as well
|
2022-04-14 10:10:04 -07:00 |
|
Phil Wang
|
e1b0c140f1
|
cleanup readme
|
2022-04-14 08:51:22 -07:00 |
|
Phil Wang
|
5989569a44
|
link to OpenCLIP effort
|
2022-04-14 08:31:15 -07:00 |
|
Phil Wang
|
7fb3f695d5
|
offer continuously parameterized time embedding for diffusion prior network, remove a hyperparameter that may trip up people, if not set correctly
|
2022-04-14 08:28:11 -07:00 |
|
Phil Wang
|
7e93b9d3c8
|
make sure classifier free guidance condition scaling is exposed on DALLE2 forward function
|
2022-04-13 20:14:28 -07:00 |
|
Phil Wang
|
4c827ba94f
|
typo
|
2022-04-13 19:01:03 -07:00 |
|
Phil Wang
|
cb3923a90f
|
readme tweak
|
2022-04-13 18:43:34 -07:00 |
|
Phil Wang
|
cc30676a3f
|
lengthen todo
|
2022-04-13 18:34:09 -07:00 |
|
Phil Wang
|
c7fb327618
|
link to x-clip
|
2022-04-13 18:26:30 -07:00 |
|
Phil Wang
|
14ddbc159c
|
cleanup
|
2022-04-13 18:24:32 -07:00 |
|
Phil Wang
|
0692f1699f
|
favorite quote
|
2022-04-13 18:17:59 -07:00 |
|
Phil Wang
|
26c4534bc3
|
readme
|
2022-04-13 18:11:55 -07:00 |
|
Phil Wang
|
a1a8a78f21
|
fix everything and make sure it runs end to end, document everything in readme for public
|
2022-04-13 18:05:25 -07:00 |
|
Phil Wang
|
2a424b6a28
|
readme
|
2022-04-13 10:58:06 -07:00 |
|