DALLE2-pytorch

mirror of https://github.com/lucidrains/DALLE2-pytorch.git synced 2025-12-19 17:54:20 +01:00

Author	SHA1	Message	Date
Phil Wang	5063d192b6	now completely OpenAI CLIP compatible for training just take care of the logic for AdamW and transformers used namedtuples for clip adapter embedding outputs	2022-04-29 13:05:01 -07:00
Phil Wang	6700381a37	prepare for ability to integrate other clips other than x-clip	2022-04-27 19:35:05 -07:00
Phil Wang	20377f889a	todo	2022-04-27 17:22:14 -07:00
Phil Wang	b093f92182	inform what is possible	2022-04-27 08:25:16 -07:00
Phil Wang	2705e7c9b0	attention-based upsampling claims unsupported by local experiments, removing	2022-04-27 07:51:04 -07:00
Phil Wang	77141882c8	complete vit-vqgan from https://arxiv.org/abs/2110.04627	2022-04-26 17:20:47 -07:00
Phil Wang	4075d02139	nevermind, it could be working, but only when i stabilize it with the feedforward layer + tanh as proposed in vit-vqgan paper (which will be built into the repository later for the latent diffusion)	2022-04-26 12:43:31 -07:00
Phil Wang	bfbcc283a3	DRY a tiny bit for gaussian diffusion related logic	2022-04-26 11:39:12 -07:00
Phil Wang	c30544b73a	no CLIP altogether for training DiffusionPrior	2022-04-26 10:23:41 -07:00
Phil Wang	bdf5e9c009	todo	2022-04-26 09:56:54 -07:00
Phil Wang	9878be760b	have researcher explicitly state upfront whether to condition with text encodings in cascading ddpm decoder, have DALLE-2 class take care of passing in text if feature turned on	2022-04-26 09:47:09 -07:00
Phil Wang	7ba6357c05	allow for training the Prior network with precomputed CLIP embeddings (or text encodings)	2022-04-26 09:29:51 -07:00
Phil Wang	0b28ee0d01	revert back to old upsampling, paper does not work	2022-04-26 07:39:04 -07:00
Phil Wang	13a58a78c4	scratch off todo	2022-04-25 19:01:30 -07:00
Phil Wang	3b520dfa85	bring in attention-based upsampling to strengthen vqgan-vae, seems to work as advertised in initial experiments in GAN	2022-04-25 17:27:45 -07:00
Phil Wang	79198c6ae4	keep readme simple for reader	2022-04-25 17:21:45 -07:00
Phil Wang	77a246b1b9	todo	2022-04-25 08:48:28 -07:00
Phil Wang	f93a3f6ed8	reprioritize	2022-04-25 08:44:27 -07:00
Phil Wang	fb8a66a2de	just in case latent diffusion performs better with prediction of x0 instead of epsilon, open up the research avenue	2022-04-24 10:04:22 -07:00
Phil Wang	d5318aef4f	todo	2022-04-23 08:23:08 -07:00
Phil Wang	a8b5d5d753	last tweak of readme	2022-04-22 14:16:43 -07:00
Phil Wang	976ef7f87c	project management	2022-04-22 14:15:42 -07:00
Phil Wang	fd175bcc0e	readme	2022-04-22 14:13:33 -07:00
Phil Wang	76b32f18b3	first pass at complete DALL-E2 + Latent Diffusion integration, latent diffusion on any layer(s) of the cascading ddpm in the decoder.	2022-04-22 13:53:13 -07:00
Phil Wang	f2d5b87677	todo	2022-04-22 11:39:58 -07:00
Phil Wang	ad17c69ab6	prepare for latent diffusion in the first DDPM of the cascade in the Decoder	2022-04-21 17:54:31 -07:00
Phil Wang	0b4ec34efb	todo	2022-04-20 12:24:23 -07:00
Phil Wang	f027b82e38	remove wip as main networks (prior and decoder) are completed	2022-04-20 12:12:16 -07:00
Kashif Rasul	1d8f37befe	added diffusion-gan thoughts https://github.com/NVlabs/denoising-diffusion-gan	2022-04-20 21:01:11 +02:00
Phil Wang	b8e8d3c164	thoughts	2022-04-20 11:34:51 -07:00
Phil Wang	8e2416b49b	commit to generalizing latent diffusion to one model	2022-04-20 11:27:42 -07:00
Phil Wang	27a33e1b20	complete contextmanager method for keeping only one unet in GPU during training or inference	2022-04-20 10:46:13 -07:00
Phil Wang	6f941a219a	give time tokens a surface area of 2 tokens as default, make it so researcher can customize which unet actually is conditioned on image embeddings and/or text encodings	2022-04-20 10:04:47 -07:00
Phil Wang	c26b77ad20	todo	2022-04-19 13:07:32 -07:00
Phil Wang	c5b4aab8e5	intent	2022-04-19 11:00:05 -07:00
Phil Wang	a35c309b5f	add sparse attention layers in between convnext blocks in unet (grid like attention, used in mobilevit, maxvit [bytedance ai], as well as a growing number of attention-based GANs)	2022-04-19 09:49:03 -07:00
Phil Wang	a54e309269	prioritize todos, play project management	2022-04-18 13:28:01 -07:00
Phil Wang	c6bfd7fdc8	readme	2022-04-18 12:43:10 -07:00
Phil Wang	960a79857b	use some magic just this once to remove the need for researchers to think	2022-04-18 12:40:43 -07:00
Phil Wang	7214df472d	todo	2022-04-18 12:18:19 -07:00
Phil Wang	00ae50999b	make kernel size and sigma for gaussian blur for cascading DDPM overridable at forward. also make sure unets are wrapped in a modulelist so that at sample time, blurring does not happen	2022-04-18 12:04:31 -07:00
Phil Wang	6cddefad26	readme	2022-04-18 11:52:25 -07:00
Phil Wang	0332eaa6ff	complete first pass at full cascading DDPM setup in Decoder, flexible enough to support one unet for testing	2022-04-18 11:44:56 -07:00
Phil Wang	1e939153fb	link to AssemblyAI explanation	2022-04-15 12:58:57 -07:00
Phil Wang	1abeb8918e	personal project management for next week	2022-04-15 08:04:01 -07:00
Phil Wang	b423855483	commit to jax version	2022-04-15 07:16:25 -07:00
Phil Wang	5b4ee09625	ideation	2022-04-14 13:48:01 -07:00
Phil Wang	9f55c24db6	allow for decoder conditioning with the text encodings from CLIP, if it is passed in. use lazy linear to avoid researchers having to worry about text encoding dimensions, but remove later if it does not work well	2022-04-14 11:46:45 -07:00
Phil Wang	69e822b7f8	"project management"	2022-04-14 10:20:37 -07:00
Phil Wang	68e9883f59	use cross attention for conditioning unet based on image embedding tokens (which opens up the door on conditioning on text encodings as well	2022-04-14 10:10:04 -07:00

1 2 3 4

181 Commits