DALLE2-pytorch

mirror of https://github.com/lucidrains/DALLE2-pytorch.git synced 2025-12-19 17:54:20 +01:00

Author	SHA1	Message	Date
Phil Wang	82464d7bd3	per-fect	2022-04-14 08:30:07 -07:00
Phil Wang	7fb3f695d5	offer continuously parameterized time embedding for diffusion prior network, remove a hyperparameter that may trip up people, if not set correctly	2022-04-14 08:28:11 -07:00
Phil Wang	7e93b9d3c8	make sure classifier free guidance condition scaling is exposed on DALLE2 forward function	2022-04-13 20:14:28 -07:00
Phil Wang	14ddbc159c	cleanup	2022-04-13 18:24:32 -07:00
Phil Wang	5e06cde4cb	always work in the l2normed space for image and text embeddings	2022-04-13 18:08:42 -07:00
Phil Wang	a1a8a78f21	fix everything and make sure it runs end to end, document everything in readme for public	2022-04-13 18:05:25 -07:00
Phil Wang	e5e415297c	prepare non-causal attention, for use in the unet in the decoder	2022-04-13 12:04:09 -07:00
Phil Wang	c9377efc93	go for the multi-headed queries, one-headed key/values, proven out in AlphaCode as well as PaLM by now	2022-04-13 12:01:43 -07:00
Phil Wang	d3cded3c6c	complete logic in diffusion prior for sampling more than 1 image embeds, taking top similarity	2022-04-13 10:52:31 -07:00
Phil Wang	d573c82f8c	add one full attention at the middle of the unet, prepare to do efficient attention employing every trick i know from vision transformer literature	2022-04-13 10:39:06 -07:00
Phil Wang	3aa6f91e7a	be transparent	2022-04-13 10:32:11 -07:00
Phil Wang	1bf071af78	allow for predicting image embedding directly during diffusion training. need to fix sampling still	2022-04-13 10:29:29 -07:00
Phil Wang	791d27326a	add diffusion code for the image embedding. nearly all the code is there except for the cascading ddpm in the decoder (with upscaling etc)	2022-04-13 10:06:52 -07:00
Phil Wang	33d69d3859	take care of DDPM decoder (DDPM for producing image embedding will have a separate objective, predicting directly the embedding rather than the noise [epsilon in paper])	2022-04-12 17:48:41 -07:00
Phil Wang	862e5ba50e	more sketches to base dalle2 class	2022-04-12 17:31:01 -07:00
Phil Wang	25d980ebbf	complete naive conditioning of unet with image embedding, with ability to dropout for classifier free guidance	2022-04-12 17:27:39 -07:00
Phil Wang	d546a615c0	complete helper methods for doing condition scaling (classifier free guidance), for decoder unet and prior network	2022-04-12 16:11:16 -07:00
Phil Wang	d4c8373635	complete conditional dropout mask creation for both prior network as well as image decoder unet for classifier free guidance	2022-04-12 14:04:08 -07:00
Phil Wang	74aec9d8ca	further prepare attention for classifier free guidance	2022-04-12 13:01:18 -07:00
Phil Wang	7647be2569	prep for classifier free guidance for the image embedding diffusion step, even though not mentioned in paper	2022-04-12 12:57:09 -07:00
Phil Wang	59b8abe09e	prepare unet to be conditioned on image embedding, optionally text encodings, and reminder for self to build conditional dropout for classifier free guidance	2022-04-12 12:38:56 -07:00
Phil Wang	40aa304b7e	rename to DiffusionPriorNetwork in case ARPriorNetwork is ever built	2022-04-12 11:45:57 -07:00
Phil Wang	fd38eb83c4	complete the main contribution of the paper, the diffusion prior network, minus the diffusion training setup	2022-04-12 11:43:59 -07:00
Phil Wang	83aabd42ca	move epsilon inside of square root for further stability in rmsnorm improvise and use rmsnorm in convnext blocks too	2022-04-12 11:18:36 -07:00
Phil Wang	cf22affcbb	bring in modified unet using convnext blocks https://arxiv.org/abs/2201.03545	2022-04-12 10:58:44 -07:00
Phil Wang	522f42f582	start using RMSNorm, used in Gopher and AlphaCode, and as a way to go complete bias-less (purportedly more stable according to PaLM)	2022-04-12 10:45:03 -07:00
Phil Wang	0a60818965	dropouts in transformer, also prep for classifier free guidance in decoder	2022-04-12 10:42:57 -07:00
Phil Wang	771fe0d0d2	also consider accepting tokenizer, so dalle2 forward pass can just be invoked as DALLE2(<prompt string>)	2022-04-12 10:29:29 -07:00
Phil Wang	df4dac4f5a	bring in attention - it is all we need	2022-04-12 10:23:07 -07:00
Phil Wang	24b428bdfc	readme	2022-04-12 10:12:42 -07:00
Phil Wang	62c0d321a6	sketch	2022-04-12 09:39:42 -07:00
Phil Wang	7cf1637d24	bring in the simple tokenizer released by openai, but also plan on leaving room for custom tokenizer with yttm	2022-04-12 09:23:17 -07:00
Phil Wang	4ff6d021c9	pin to newer version of CLIP that returns encoded text and images, get some helper functions ready for XCLIP	2022-04-12 08:54:47 -07:00
Phil Wang	f283bf25be	scaffold	2022-04-07 07:29:34 -07:00

1 2

84 Commits