DALLE2-pytorch

mirror of https://github.com/lucidrains/DALLE2-pytorch.git synced 2025-12-19 17:54:20 +01:00

Author	SHA1	Message	Date
Phil Wang	9878be760b	have researcher explicitly state upfront whether to condition with text encodings in cascading ddpm decoder, have DALLE-2 class take care of passing in text if feature turned on	2022-04-26 09:47:09 -07:00
Phil Wang	7ba6357c05	allow for training the Prior network with precomputed CLIP embeddings (or text encodings)	2022-04-26 09:29:51 -07:00
Phil Wang	76e063e8b7	refactor so that the causal transformer in the diffusion prior network can be conditioned without text encodings (for Laions parallel efforts, although it seems from the paper it is needed)	2022-04-26 09:00:11 -07:00
Phil Wang	4d25976f33	make sure non-latent diffusion still works	2022-04-26 08:36:00 -07:00
Phil Wang	0b28ee0d01	revert back to old upsampling, paper does not work	2022-04-26 07:39:04 -07:00
Phil Wang	f75d49c781	start a file for all attention-related modules, use attention-based upsampling in the unets in dalle-2	2022-04-25 18:59:10 -07:00
Phil Wang	8f2a0c7e00	better naming	2022-04-25 07:44:33 -07:00
Phil Wang	863f4ef243	just take care of the logic for setting all latent diffusion to predict x0, if needed	2022-04-24 10:06:42 -07:00
Phil Wang	fb8a66a2de	just in case latent diffusion performs better with prediction of x0 instead of epsilon, open up the research avenue	2022-04-24 10:04:22 -07:00
Phil Wang	579d4b42dd	does not seem right to clip for the prior diffusion part	2022-04-24 09:51:18 -07:00
Phil Wang	473808850a	some outlines to the eventual CLI endpoint	2022-04-24 09:27:15 -07:00
Phil Wang	05b74be69a	use null container pattern to cleanup some conditionals, save more cleanup for next week	2022-04-22 15:23:18 -07:00
Phil Wang	76b32f18b3	first pass at complete DALL-E2 + Latent Diffusion integration, latent diffusion on any layer(s) of the cascading ddpm in the decoder.	2022-04-22 13:53:13 -07:00
Phil Wang	46cef31c86	optional projection out for prior network causal transformer	2022-04-22 11:16:30 -07:00
Phil Wang	59b1a77d4d	be a bit more conservative and stick with layernorm (without bias) for now, given @borisdayma results https://twitter.com/borisdayma/status/1517227191477571585	2022-04-22 11:14:54 -07:00
Phil Wang	7f338319fd	makes more sense for blur augmentation to happen before the upsampling	2022-04-22 11:10:47 -07:00
Phil Wang	2c6c91829d	refactor blurring training augmentation to be taken care of by the decoder, with option to downsample to previous resolution before upsampling (cascading ddpm). this opens up the possibility of cascading latent ddpm	2022-04-22 11:09:17 -07:00
Phil Wang	ad17c69ab6	prepare for latent diffusion in the first DDPM of the cascade in the Decoder	2022-04-21 17:54:31 -07:00
Phil Wang	faebf4c8b8	from my vision transformer experience, dimension of attention head of 32 is sufficient for image feature maps	2022-04-20 11:40:32 -07:00
Phil Wang	f37c26e856	cleanup and DRY a little	2022-04-20 10:56:32 -07:00
Phil Wang	27a33e1b20	complete contextmanager method for keeping only one unet in GPU during training or inference	2022-04-20 10:46:13 -07:00
Phil Wang	6f941a219a	give time tokens a surface area of 2 tokens as default, make it so researcher can customize which unet actually is conditioned on image embeddings and/or text encodings	2022-04-20 10:04:47 -07:00
Phil Wang	ddde8ca1bf	fix cosine bbeta schedule, thanks to @Zhengxinyang	2022-04-19 20:54:28 -07:00
Phil Wang	a35c309b5f	add sparse attention layers in between convnext blocks in unet (grid like attention, used in mobilevit, maxvit [bytedance ai], as well as a growing number of attention-based GANs)	2022-04-19 09:49:03 -07:00
Phil Wang	82328f16cd	same for text encodings for decoder ddpm training	2022-04-18 14:41:02 -07:00
Phil Wang	6fee4fce6e	also allow for image embedding to be passed into the diffusion model, in the case one wants to generate image embedding once and then train multiple unets in one iteration	2022-04-18 14:00:38 -07:00
Phil Wang	960a79857b	use some magic just this once to remove the need for researchers to think	2022-04-18 12:40:43 -07:00
Phil Wang	00ae50999b	make kernel size and sigma for gaussian blur for cascading DDPM overridable at forward. also make sure unets are wrapped in a modulelist so that at sample time, blurring does not happen	2022-04-18 12:04:31 -07:00
Phil Wang	0332eaa6ff	complete first pass at full cascading DDPM setup in Decoder, flexible enough to support one unet for testing	2022-04-18 11:44:56 -07:00
Kashif Rasul	b0f2fbaa95	schedule to Prior	2022-04-17 15:21:47 +02:00
Kashif Rasul	51361c2d15	added beta_schedule argument	2022-04-17 15:19:33 +02:00
Kashif Rasul	42d6e47387	added huber loss and other schedulers	2022-04-17 15:14:05 +02:00
Phil Wang	c400d8758c	prepare for cascading diffusion in unet, save the full progressive upsampling architecture to be built next week	2022-04-15 07:03:28 -07:00
Phil Wang	bece206699	fix bug thanks to @jihoonerd	2022-04-15 06:44:40 -07:00
Phil Wang	6e27f617f1	use t5 relative positional bias in prior network causal transformer, since it makes more sense than rotary embeddings	2022-04-14 12:01:09 -07:00
Phil Wang	9f55c24db6	allow for decoder conditioning with the text encodings from CLIP, if it is passed in. use lazy linear to avoid researchers having to worry about text encoding dimensions, but remove later if it does not work well	2022-04-14 11:46:45 -07:00
Phil Wang	23c401a5d5	use the eval decorator	2022-04-14 10:13:43 -07:00
Phil Wang	68e9883f59	use cross attention for conditioning unet based on image embedding tokens (which opens up the door on conditioning on text encodings as well	2022-04-14 10:10:04 -07:00
Phil Wang	95b018374a	start using swish glu everywhere, given success of PaLM	2022-04-14 09:34:32 -07:00
Phil Wang	8b5c2385b0	better naming	2022-04-14 09:24:31 -07:00
Phil Wang	f2c52d8239	fix bug with classifier free guidance for prior network, even though it seems it may not be used	2022-04-14 09:21:51 -07:00
Phil Wang	97e951221b	bring in blur, as it will be used somewhere in the cascading DDPM in the decoder eventually, once i figure it out	2022-04-14 09:16:09 -07:00
Phil Wang	82464d7bd3	per-fect	2022-04-14 08:30:07 -07:00
Phil Wang	7fb3f695d5	offer continuously parameterized time embedding for diffusion prior network, remove a hyperparameter that may trip up people, if not set correctly	2022-04-14 08:28:11 -07:00
Phil Wang	7e93b9d3c8	make sure classifier free guidance condition scaling is exposed on DALLE2 forward function	2022-04-13 20:14:28 -07:00
Phil Wang	14ddbc159c	cleanup	2022-04-13 18:24:32 -07:00
Phil Wang	5e06cde4cb	always work in the l2normed space for image and text embeddings	2022-04-13 18:08:42 -07:00
Phil Wang	a1a8a78f21	fix everything and make sure it runs end to end, document everything in readme for public	2022-04-13 18:05:25 -07:00
Phil Wang	e5e415297c	prepare non-causal attention, for use in the unet in the decoder	2022-04-13 12:04:09 -07:00
Phil Wang	c9377efc93	go for the multi-headed queries, one-headed key/values, proven out in AlphaCode as well as PaLM by now	2022-04-13 12:01:43 -07:00

1 2

76 Commits