DALLE2-pytorch

mirror of https://github.com/lucidrains/DALLE2-pytorch.git synced 2026-02-21 11:24:51 +01:00

Author	SHA1	Message	Date
Phil Wang	68e9883f59	use cross attention for conditioning unet based on image embedding tokens (which opens up the door on conditioning on text encodings as well 0.0.11	2022-04-14 10:10:04 -07:00
Phil Wang	95b018374a	start using swish glu everywhere, given success of PaLM 0.0.10	2022-04-14 09:34:32 -07:00
Phil Wang	8b5c2385b0	better naming	2022-04-14 09:24:31 -07:00
Phil Wang	f2c52d8239	fix bug with classifier free guidance for prior network, even though it seems it may not be used 0.0.9	2022-04-14 09:21:51 -07:00
Phil Wang	97e951221b	bring in blur, as it will be used somewhere in the cascading DDPM in the decoder eventually, once i figure it out	2022-04-14 09:16:09 -07:00
Phil Wang	e1b0c140f1	cleanup readme	2022-04-14 08:51:22 -07:00
Phil Wang	5989569a44	link to OpenCLIP effort	2022-04-14 08:31:15 -07:00
Phil Wang	82464d7bd3	per-fect	2022-04-14 08:30:07 -07:00
Phil Wang	7fb3f695d5	offer continuously parameterized time embedding for diffusion prior network, remove a hyperparameter that may trip up people, if not set correctly 0.0.8	2022-04-14 08:28:11 -07:00
Phil Wang	7e93b9d3c8	make sure classifier free guidance condition scaling is exposed on DALLE2 forward function 0.0.7	2022-04-13 20:14:28 -07:00
Phil Wang	4c827ba94f	typo	2022-04-13 19:01:03 -07:00
Phil Wang	cb3923a90f	readme tweak	2022-04-13 18:43:34 -07:00
Phil Wang	cc30676a3f	lengthen todo	2022-04-13 18:34:09 -07:00
Phil Wang	c7fb327618	link to x-clip	2022-04-13 18:26:30 -07:00
Phil Wang	14ddbc159c	cleanup 0.0.6a	2022-04-13 18:24:32 -07:00
Phil Wang	0692f1699f	favorite quote 0.0.6	2022-04-13 18:17:59 -07:00
Phil Wang	26c4534bc3	readme	2022-04-13 18:11:55 -07:00
Phil Wang	5e06cde4cb	always work in the l2normed space for image and text embeddings 0.0.5	2022-04-13 18:08:42 -07:00
Phil Wang	a1a8a78f21	fix everything and make sure it runs end to end, document everything in readme for public	2022-04-13 18:05:25 -07:00
Phil Wang	e5e415297c	prepare non-causal attention, for use in the unet in the decoder	2022-04-13 12:04:09 -07:00
Phil Wang	c9377efc93	go for the multi-headed queries, one-headed key/values, proven out in AlphaCode as well as PaLM by now	2022-04-13 12:01:43 -07:00
Phil Wang	2a424b6a28	readme	2022-04-13 10:58:06 -07:00
Phil Wang	d3cded3c6c	complete logic in diffusion prior for sampling more than 1 image embeds, taking top similarity	2022-04-13 10:52:31 -07:00
Phil Wang	d573c82f8c	add one full attention at the middle of the unet, prepare to do efficient attention employing every trick i know from vision transformer literature	2022-04-13 10:39:06 -07:00
Phil Wang	3aa6f91e7a	be transparent	2022-04-13 10:32:11 -07:00
Phil Wang	1bf071af78	allow for predicting image embedding directly during diffusion training. need to fix sampling still	2022-04-13 10:29:29 -07:00
Phil Wang	9f1fe6c7ae	update todo	2022-04-13 10:09:08 -07:00
Phil Wang	791d27326a	add diffusion code for the image embedding. nearly all the code is there except for the cascading ddpm in the decoder (with upscaling etc)	2022-04-13 10:06:52 -07:00
Phil Wang	6d4e9c97bf	todo	2022-04-12 20:50:29 -07:00
Phil Wang	40140b54d6	put on project manager hat	2022-04-12 17:51:23 -07:00
Phil Wang	33d69d3859	take care of DDPM decoder (DDPM for producing image embedding will have a separate objective, predicting directly the embedding rather than the noise [epsilon in paper])	2022-04-12 17:48:41 -07:00
Phil Wang	862e5ba50e	more sketches to base dalle2 class	2022-04-12 17:31:01 -07:00
Phil Wang	25d980ebbf	complete naive conditioning of unet with image embedding, with ability to dropout for classifier free guidance	2022-04-12 17:27:39 -07:00
Phil Wang	d546a615c0	complete helper methods for doing condition scaling (classifier free guidance), for decoder unet and prior network	2022-04-12 16:11:16 -07:00
Phil Wang	d4c8373635	complete conditional dropout mask creation for both prior network as well as image decoder unet for classifier free guidance	2022-04-12 14:04:08 -07:00
Phil Wang	c814b2b278	sponsor project button	2022-04-12 13:34:02 -07:00
Phil Wang	74aec9d8ca	further prepare attention for classifier free guidance	2022-04-12 13:01:18 -07:00
Phil Wang	7647be2569	prep for classifier free guidance for the image embedding diffusion step, even though not mentioned in paper	2022-04-12 12:57:09 -07:00
Phil Wang	59b8abe09e	prepare unet to be conditioned on image embedding, optionally text encodings, and reminder for self to build conditional dropout for classifier free guidance	2022-04-12 12:38:56 -07:00
Phil Wang	46dde54948	for integration of X-CLIP automagically in the gaussian diffusion classes	2022-04-12 12:17:34 -07:00
Phil Wang	40aa304b7e	rename to DiffusionPriorNetwork in case ARPriorNetwork is ever built	2022-04-12 11:45:57 -07:00
Phil Wang	fd38eb83c4	complete the main contribution of the paper, the diffusion prior network, minus the diffusion training setup	2022-04-12 11:43:59 -07:00
Phil Wang	83aabd42ca	move epsilon inside of square root for further stability in rmsnorm improvise and use rmsnorm in convnext blocks too	2022-04-12 11:18:36 -07:00
Phil Wang	cf22affcbb	bring in modified unet using convnext blocks https://arxiv.org/abs/2201.03545	2022-04-12 10:58:44 -07:00
Phil Wang	522f42f582	start using RMSNorm, used in Gopher and AlphaCode, and as a way to go complete bias-less (purportedly more stable according to PaLM)	2022-04-12 10:45:03 -07:00
Phil Wang	0a60818965	dropouts in transformer, also prep for classifier free guidance in decoder	2022-04-12 10:42:57 -07:00
Phil Wang	604765b563	readme	2022-04-12 10:35:56 -07:00
Phil Wang	7bbc62f3d5	bring in pillow, for image encoding to and from	2022-04-12 10:29:55 -07:00
Phil Wang	771fe0d0d2	also consider accepting tokenizer, so dalle2 forward pass can just be invoked as DALLE2(<prompt string>)	2022-04-12 10:29:29 -07:00
Phil Wang	de75a8af76	link to yannic, since he is the best	2022-04-12 10:27:01 -07:00

1 2

69 Commits