DALLE2-pytorch

mirror of https://github.com/lucidrains/DALLE2-pytorch.git synced 2025-12-19 17:54:20 +01:00

Author	SHA1	Message	Date
Phil Wang	1e939153fb	link to AssemblyAI explanation	2022-04-15 12:58:57 -07:00
Phil Wang	1abeb8918e	personal project management for next week	2022-04-15 08:04:01 -07:00
Phil Wang	b423855483	commit to jax version	2022-04-15 07:16:25 -07:00
Phil Wang	c400d8758c	prepare for cascading diffusion in unet, save the full progressive upsampling architecture to be built next week 0.0.17	2022-04-15 07:03:28 -07:00
Phil Wang	bece206699	fix bug thanks to @jihoonerd	2022-04-15 06:44:40 -07:00
Phil Wang	5b4ee09625	ideation	2022-04-14 13:48:01 -07:00
Phil Wang	6e27f617f1	use t5 relative positional bias in prior network causal transformer, since it makes more sense than rotary embeddings 0.0.15	2022-04-14 12:01:09 -07:00
Phil Wang	9f55c24db6	allow for decoder conditioning with the text encodings from CLIP, if it is passed in. use lazy linear to avoid researchers having to worry about text encoding dimensions, but remove later if it does not work well 0.0.14	2022-04-14 11:46:45 -07:00
Phil Wang	69e822b7f8	"project management"	2022-04-14 10:20:37 -07:00
Phil Wang	23c401a5d5	use the eval decorator 0.0.12	2022-04-14 10:13:43 -07:00
Phil Wang	68e9883f59	use cross attention for conditioning unet based on image embedding tokens (which opens up the door on conditioning on text encodings as well 0.0.11	2022-04-14 10:10:04 -07:00
Phil Wang	95b018374a	start using swish glu everywhere, given success of PaLM 0.0.10	2022-04-14 09:34:32 -07:00
Phil Wang	8b5c2385b0	better naming	2022-04-14 09:24:31 -07:00
Phil Wang	f2c52d8239	fix bug with classifier free guidance for prior network, even though it seems it may not be used 0.0.9	2022-04-14 09:21:51 -07:00
Phil Wang	97e951221b	bring in blur, as it will be used somewhere in the cascading DDPM in the decoder eventually, once i figure it out	2022-04-14 09:16:09 -07:00
Phil Wang	e1b0c140f1	cleanup readme	2022-04-14 08:51:22 -07:00
Phil Wang	5989569a44	link to OpenCLIP effort	2022-04-14 08:31:15 -07:00
Phil Wang	82464d7bd3	per-fect	2022-04-14 08:30:07 -07:00
Phil Wang	7fb3f695d5	offer continuously parameterized time embedding for diffusion prior network, remove a hyperparameter that may trip up people, if not set correctly 0.0.8	2022-04-14 08:28:11 -07:00
Phil Wang	7e93b9d3c8	make sure classifier free guidance condition scaling is exposed on DALLE2 forward function 0.0.7	2022-04-13 20:14:28 -07:00
Phil Wang	4c827ba94f	typo	2022-04-13 19:01:03 -07:00
Phil Wang	cb3923a90f	readme tweak	2022-04-13 18:43:34 -07:00
Phil Wang	cc30676a3f	lengthen todo	2022-04-13 18:34:09 -07:00
Phil Wang	c7fb327618	link to x-clip	2022-04-13 18:26:30 -07:00
Phil Wang	14ddbc159c	cleanup 0.0.6a	2022-04-13 18:24:32 -07:00
Phil Wang	0692f1699f	favorite quote 0.0.6	2022-04-13 18:17:59 -07:00
Phil Wang	26c4534bc3	readme	2022-04-13 18:11:55 -07:00
Phil Wang	5e06cde4cb	always work in the l2normed space for image and text embeddings 0.0.5	2022-04-13 18:08:42 -07:00
Phil Wang	a1a8a78f21	fix everything and make sure it runs end to end, document everything in readme for public	2022-04-13 18:05:25 -07:00
Phil Wang	e5e415297c	prepare non-causal attention, for use in the unet in the decoder	2022-04-13 12:04:09 -07:00
Phil Wang	c9377efc93	go for the multi-headed queries, one-headed key/values, proven out in AlphaCode as well as PaLM by now	2022-04-13 12:01:43 -07:00
Phil Wang	2a424b6a28	readme	2022-04-13 10:58:06 -07:00
Phil Wang	d3cded3c6c	complete logic in diffusion prior for sampling more than 1 image embeds, taking top similarity	2022-04-13 10:52:31 -07:00
Phil Wang	d573c82f8c	add one full attention at the middle of the unet, prepare to do efficient attention employing every trick i know from vision transformer literature	2022-04-13 10:39:06 -07:00
Phil Wang	3aa6f91e7a	be transparent	2022-04-13 10:32:11 -07:00
Phil Wang	1bf071af78	allow for predicting image embedding directly during diffusion training. need to fix sampling still	2022-04-13 10:29:29 -07:00
Phil Wang	9f1fe6c7ae	update todo	2022-04-13 10:09:08 -07:00
Phil Wang	791d27326a	add diffusion code for the image embedding. nearly all the code is there except for the cascading ddpm in the decoder (with upscaling etc)	2022-04-13 10:06:52 -07:00
Phil Wang	6d4e9c97bf	todo	2022-04-12 20:50:29 -07:00
Phil Wang	40140b54d6	put on project manager hat	2022-04-12 17:51:23 -07:00
Phil Wang	33d69d3859	take care of DDPM decoder (DDPM for producing image embedding will have a separate objective, predicting directly the embedding rather than the noise [epsilon in paper])	2022-04-12 17:48:41 -07:00
Phil Wang	862e5ba50e	more sketches to base dalle2 class	2022-04-12 17:31:01 -07:00
Phil Wang	25d980ebbf	complete naive conditioning of unet with image embedding, with ability to dropout for classifier free guidance	2022-04-12 17:27:39 -07:00
Phil Wang	d546a615c0	complete helper methods for doing condition scaling (classifier free guidance), for decoder unet and prior network	2022-04-12 16:11:16 -07:00
Phil Wang	d4c8373635	complete conditional dropout mask creation for both prior network as well as image decoder unet for classifier free guidance	2022-04-12 14:04:08 -07:00
Phil Wang	c814b2b278	sponsor project button	2022-04-12 13:34:02 -07:00
Phil Wang	74aec9d8ca	further prepare attention for classifier free guidance	2022-04-12 13:01:18 -07:00
Phil Wang	7647be2569	prep for classifier free guidance for the image embedding diffusion step, even though not mentioned in paper	2022-04-12 12:57:09 -07:00
Phil Wang	59b8abe09e	prepare unet to be conditioned on image embedding, optionally text encodings, and reminder for self to build conditional dropout for classifier free guidance	2022-04-12 12:38:56 -07:00
Phil Wang	46dde54948	for integration of X-CLIP automagically in the gaussian diffusion classes	2022-04-12 12:17:34 -07:00

... 6 7 8 9 10

479 Commits