ProGamerGov
|
63450b466d
|
Fix spelling and grammatical errors
|
2022-04-30 09:18:13 -06:00 |
|
Phil Wang
|
e2f9615afa
|
use @clip-anytorch , thanks to @rom1504
|
2022-04-30 06:40:54 -07:00 |
|
Phil Wang
|
a389f81138
|
todo
|
2022-04-29 15:40:51 -07:00 |
|
Phil Wang
|
0283556608
|
fix example in readme, since api changed
|
2022-04-29 13:40:55 -07:00 |
|
Phil Wang
|
5063d192b6
|
now completely OpenAI CLIP compatible for training
just take care of the logic for AdamW and transformers
used namedtuples for clip adapter embedding outputs
|
2022-04-29 13:05:01 -07:00 |
|
Phil Wang
|
6700381a37
|
prepare for ability to integrate other clips other than x-clip
|
2022-04-27 19:35:05 -07:00 |
|
Phil Wang
|
20377f889a
|
todo
|
2022-04-27 17:22:14 -07:00 |
|
Phil Wang
|
b093f92182
|
inform what is possible
|
2022-04-27 08:25:16 -07:00 |
|
Phil Wang
|
2705e7c9b0
|
attention-based upsampling claims unsupported by local experiments, removing
|
2022-04-27 07:51:04 -07:00 |
|
Phil Wang
|
77141882c8
|
complete vit-vqgan from https://arxiv.org/abs/2110.04627
|
2022-04-26 17:20:47 -07:00 |
|
Phil Wang
|
4075d02139
|
nevermind, it could be working, but only when i stabilize it with the feedforward layer + tanh as proposed in vit-vqgan paper (which will be built into the repository later for the latent diffusion)
|
2022-04-26 12:43:31 -07:00 |
|
Phil Wang
|
bfbcc283a3
|
DRY a tiny bit for gaussian diffusion related logic
|
2022-04-26 11:39:12 -07:00 |
|
Phil Wang
|
c30544b73a
|
no CLIP altogether for training DiffusionPrior
|
2022-04-26 10:23:41 -07:00 |
|
Phil Wang
|
bdf5e9c009
|
todo
|
2022-04-26 09:56:54 -07:00 |
|
Phil Wang
|
9878be760b
|
have researcher explicitly state upfront whether to condition with text encodings in cascading ddpm decoder, have DALLE-2 class take care of passing in text if feature turned on
|
2022-04-26 09:47:09 -07:00 |
|
Phil Wang
|
7ba6357c05
|
allow for training the Prior network with precomputed CLIP embeddings (or text encodings)
|
2022-04-26 09:29:51 -07:00 |
|
Phil Wang
|
0b28ee0d01
|
revert back to old upsampling, paper does not work
|
2022-04-26 07:39:04 -07:00 |
|
Phil Wang
|
13a58a78c4
|
scratch off todo
|
2022-04-25 19:01:30 -07:00 |
|
Phil Wang
|
3b520dfa85
|
bring in attention-based upsampling to strengthen vqgan-vae, seems to work as advertised in initial experiments in GAN
|
2022-04-25 17:27:45 -07:00 |
|
Phil Wang
|
79198c6ae4
|
keep readme simple for reader
|
2022-04-25 17:21:45 -07:00 |
|
Phil Wang
|
77a246b1b9
|
todo
|
2022-04-25 08:48:28 -07:00 |
|
Phil Wang
|
f93a3f6ed8
|
reprioritize
|
2022-04-25 08:44:27 -07:00 |
|
Phil Wang
|
fb8a66a2de
|
just in case latent diffusion performs better with prediction of x0 instead of epsilon, open up the research avenue
|
2022-04-24 10:04:22 -07:00 |
|
Phil Wang
|
d5318aef4f
|
todo
|
2022-04-23 08:23:08 -07:00 |
|
Phil Wang
|
a8b5d5d753
|
last tweak of readme
|
2022-04-22 14:16:43 -07:00 |
|
Phil Wang
|
976ef7f87c
|
project management
|
2022-04-22 14:15:42 -07:00 |
|
Phil Wang
|
fd175bcc0e
|
readme
|
2022-04-22 14:13:33 -07:00 |
|
Phil Wang
|
76b32f18b3
|
first pass at complete DALL-E2 + Latent Diffusion integration, latent diffusion on any layer(s) of the cascading ddpm in the decoder.
|
2022-04-22 13:53:13 -07:00 |
|
Phil Wang
|
f2d5b87677
|
todo
|
2022-04-22 11:39:58 -07:00 |
|
Phil Wang
|
ad17c69ab6
|
prepare for latent diffusion in the first DDPM of the cascade in the Decoder
|
2022-04-21 17:54:31 -07:00 |
|
Phil Wang
|
0b4ec34efb
|
todo
|
2022-04-20 12:24:23 -07:00 |
|
Phil Wang
|
f027b82e38
|
remove wip as main networks (prior and decoder) are completed
|
2022-04-20 12:12:16 -07:00 |
|
Kashif Rasul
|
1d8f37befe
|
added diffusion-gan thoughts
https://github.com/NVlabs/denoising-diffusion-gan
|
2022-04-20 21:01:11 +02:00 |
|
Phil Wang
|
b8e8d3c164
|
thoughts
|
2022-04-20 11:34:51 -07:00 |
|
Phil Wang
|
8e2416b49b
|
commit to generalizing latent diffusion to one model
|
2022-04-20 11:27:42 -07:00 |
|
Phil Wang
|
27a33e1b20
|
complete contextmanager method for keeping only one unet in GPU during training or inference
|
2022-04-20 10:46:13 -07:00 |
|
Phil Wang
|
6f941a219a
|
give time tokens a surface area of 2 tokens as default, make it so researcher can customize which unet actually is conditioned on image embeddings and/or text encodings
|
2022-04-20 10:04:47 -07:00 |
|
Phil Wang
|
c26b77ad20
|
todo
|
2022-04-19 13:07:32 -07:00 |
|
Phil Wang
|
c5b4aab8e5
|
intent
|
2022-04-19 11:00:05 -07:00 |
|
Phil Wang
|
a35c309b5f
|
add sparse attention layers in between convnext blocks in unet (grid like attention, used in mobilevit, maxvit [bytedance ai], as well as a growing number of attention-based GANs)
|
2022-04-19 09:49:03 -07:00 |
|
Phil Wang
|
a54e309269
|
prioritize todos, play project management
|
2022-04-18 13:28:01 -07:00 |
|
Phil Wang
|
c6bfd7fdc8
|
readme
|
2022-04-18 12:43:10 -07:00 |
|
Phil Wang
|
960a79857b
|
use some magic just this once to remove the need for researchers to think
|
2022-04-18 12:40:43 -07:00 |
|
Phil Wang
|
7214df472d
|
todo
|
2022-04-18 12:18:19 -07:00 |
|
Phil Wang
|
00ae50999b
|
make kernel size and sigma for gaussian blur for cascading DDPM overridable at forward. also make sure unets are wrapped in a modulelist so that at sample time, blurring does not happen
|
2022-04-18 12:04:31 -07:00 |
|
Phil Wang
|
6cddefad26
|
readme
|
2022-04-18 11:52:25 -07:00 |
|
Phil Wang
|
0332eaa6ff
|
complete first pass at full cascading DDPM setup in Decoder, flexible enough to support one unet for testing
|
2022-04-18 11:44:56 -07:00 |
|
Phil Wang
|
1e939153fb
|
link to AssemblyAI explanation
|
2022-04-15 12:58:57 -07:00 |
|
Phil Wang
|
1abeb8918e
|
personal project management for next week
|
2022-04-15 08:04:01 -07:00 |
|
Phil Wang
|
b423855483
|
commit to jax version
|
2022-04-15 07:16:25 -07:00 |
|