DALLE2-pytorch

mirror of https://github.com/lucidrains/DALLE2-pytorch.git synced 2026-02-20 02:04:21 +01:00

Author	SHA1	Message	Date
Phil Wang	960a79857b	use some magic just this once to remove the need for researchers to think 0.0.22	2022-04-18 12:40:43 -07:00
Phil Wang	7214df472d	todo	2022-04-18 12:18:19 -07:00
Phil Wang	00ae50999b	make kernel size and sigma for gaussian blur for cascading DDPM overridable at forward. also make sure unets are wrapped in a modulelist so that at sample time, blurring does not happen	2022-04-18 12:04:31 -07:00
Phil Wang	6cddefad26	readme	2022-04-18 11:52:25 -07:00
Phil Wang	0332eaa6ff	complete first pass at full cascading DDPM setup in Decoder, flexible enough to support one unet for testing 0.0.20	2022-04-18 11:44:56 -07:00
Phil Wang	1cce4225eb	0.0.18 0.0.18	2022-04-17 07:29:34 -07:00
Phil Wang	5ab0700bab	Merge pull request #14 from kashif/loss-schedule added huber loss and other schedulers	2022-04-17 07:29:10 -07:00
Kashif Rasul	b0f2fbaa95	schedule to Prior	2022-04-17 15:21:47 +02:00
Kashif Rasul	51361c2d15	added beta_schedule argument	2022-04-17 15:19:33 +02:00
Kashif Rasul	42d6e47387	added huber loss and other schedulers	2022-04-17 15:14:05 +02:00
Phil Wang	1e939153fb	link to AssemblyAI explanation	2022-04-15 12:58:57 -07:00
Phil Wang	1abeb8918e	personal project management for next week	2022-04-15 08:04:01 -07:00
Phil Wang	b423855483	commit to jax version	2022-04-15 07:16:25 -07:00
Phil Wang	c400d8758c	prepare for cascading diffusion in unet, save the full progressive upsampling architecture to be built next week 0.0.17	2022-04-15 07:03:28 -07:00
Phil Wang	bece206699	fix bug thanks to @jihoonerd	2022-04-15 06:44:40 -07:00
Phil Wang	5b4ee09625	ideation	2022-04-14 13:48:01 -07:00
Phil Wang	6e27f617f1	use t5 relative positional bias in prior network causal transformer, since it makes more sense than rotary embeddings 0.0.15	2022-04-14 12:01:09 -07:00
Phil Wang	9f55c24db6	allow for decoder conditioning with the text encodings from CLIP, if it is passed in. use lazy linear to avoid researchers having to worry about text encoding dimensions, but remove later if it does not work well 0.0.14	2022-04-14 11:46:45 -07:00
Phil Wang	69e822b7f8	"project management"	2022-04-14 10:20:37 -07:00
Phil Wang	23c401a5d5	use the eval decorator 0.0.12	2022-04-14 10:13:43 -07:00
Phil Wang	68e9883f59	use cross attention for conditioning unet based on image embedding tokens (which opens up the door on conditioning on text encodings as well 0.0.11	2022-04-14 10:10:04 -07:00
Phil Wang	95b018374a	start using swish glu everywhere, given success of PaLM 0.0.10	2022-04-14 09:34:32 -07:00
Phil Wang	8b5c2385b0	better naming	2022-04-14 09:24:31 -07:00
Phil Wang	f2c52d8239	fix bug with classifier free guidance for prior network, even though it seems it may not be used 0.0.9	2022-04-14 09:21:51 -07:00
Phil Wang	97e951221b	bring in blur, as it will be used somewhere in the cascading DDPM in the decoder eventually, once i figure it out	2022-04-14 09:16:09 -07:00
Phil Wang	e1b0c140f1	cleanup readme	2022-04-14 08:51:22 -07:00
Phil Wang	5989569a44	link to OpenCLIP effort	2022-04-14 08:31:15 -07:00
Phil Wang	82464d7bd3	per-fect	2022-04-14 08:30:07 -07:00
Phil Wang	7fb3f695d5	offer continuously parameterized time embedding for diffusion prior network, remove a hyperparameter that may trip up people, if not set correctly 0.0.8	2022-04-14 08:28:11 -07:00
Phil Wang	7e93b9d3c8	make sure classifier free guidance condition scaling is exposed on DALLE2 forward function 0.0.7	2022-04-13 20:14:28 -07:00
Phil Wang	4c827ba94f	typo	2022-04-13 19:01:03 -07:00
Phil Wang	cb3923a90f	readme tweak	2022-04-13 18:43:34 -07:00
Phil Wang	cc30676a3f	lengthen todo	2022-04-13 18:34:09 -07:00
Phil Wang	c7fb327618	link to x-clip	2022-04-13 18:26:30 -07:00
Phil Wang	14ddbc159c	cleanup 0.0.6a	2022-04-13 18:24:32 -07:00
Phil Wang	0692f1699f	favorite quote 0.0.6	2022-04-13 18:17:59 -07:00
Phil Wang	26c4534bc3	readme	2022-04-13 18:11:55 -07:00
Phil Wang	5e06cde4cb	always work in the l2normed space for image and text embeddings 0.0.5	2022-04-13 18:08:42 -07:00
Phil Wang	a1a8a78f21	fix everything and make sure it runs end to end, document everything in readme for public	2022-04-13 18:05:25 -07:00
Phil Wang	e5e415297c	prepare non-causal attention, for use in the unet in the decoder	2022-04-13 12:04:09 -07:00
Phil Wang	c9377efc93	go for the multi-headed queries, one-headed key/values, proven out in AlphaCode as well as PaLM by now	2022-04-13 12:01:43 -07:00
Phil Wang	2a424b6a28	readme	2022-04-13 10:58:06 -07:00
Phil Wang	d3cded3c6c	complete logic in diffusion prior for sampling more than 1 image embeds, taking top similarity	2022-04-13 10:52:31 -07:00
Phil Wang	d573c82f8c	add one full attention at the middle of the unet, prepare to do efficient attention employing every trick i know from vision transformer literature	2022-04-13 10:39:06 -07:00
Phil Wang	3aa6f91e7a	be transparent	2022-04-13 10:32:11 -07:00
Phil Wang	1bf071af78	allow for predicting image embedding directly during diffusion training. need to fix sampling still	2022-04-13 10:29:29 -07:00
Phil Wang	9f1fe6c7ae	update todo	2022-04-13 10:09:08 -07:00
Phil Wang	791d27326a	add diffusion code for the image embedding. nearly all the code is there except for the cascading ddpm in the decoder (with upscaling etc)	2022-04-13 10:06:52 -07:00
Phil Wang	6d4e9c97bf	todo	2022-04-12 20:50:29 -07:00
Phil Wang	40140b54d6	put on project manager hat	2022-04-12 17:51:23 -07:00

1 2

89 Commits