DALLE2-pytorch

mirror of https://github.com/lucidrains/DALLE2-pytorch.git synced 2025-12-19 17:54:20 +01:00

Author	SHA1	Message	Date
Phil Wang	791d27326a	add diffusion code for the image embedding. nearly all the code is there except for the cascading ddpm in the decoder (with upscaling etc)	2022-04-13 10:06:52 -07:00
Phil Wang	33d69d3859	take care of DDPM decoder (DDPM for producing image embedding will have a separate objective, predicting directly the embedding rather than the noise [epsilon in paper])	2022-04-12 17:48:41 -07:00
Phil Wang	862e5ba50e	more sketches to base dalle2 class	2022-04-12 17:31:01 -07:00
Phil Wang	25d980ebbf	complete naive conditioning of unet with image embedding, with ability to dropout for classifier free guidance	2022-04-12 17:27:39 -07:00
Phil Wang	d546a615c0	complete helper methods for doing condition scaling (classifier free guidance), for decoder unet and prior network	2022-04-12 16:11:16 -07:00
Phil Wang	d4c8373635	complete conditional dropout mask creation for both prior network as well as image decoder unet for classifier free guidance	2022-04-12 14:04:08 -07:00
Phil Wang	74aec9d8ca	further prepare attention for classifier free guidance	2022-04-12 13:01:18 -07:00
Phil Wang	7647be2569	prep for classifier free guidance for the image embedding diffusion step, even though not mentioned in paper	2022-04-12 12:57:09 -07:00
Phil Wang	59b8abe09e	prepare unet to be conditioned on image embedding, optionally text encodings, and reminder for self to build conditional dropout for classifier free guidance	2022-04-12 12:38:56 -07:00
Phil Wang	40aa304b7e	rename to DiffusionPriorNetwork in case ARPriorNetwork is ever built	2022-04-12 11:45:57 -07:00
Phil Wang	fd38eb83c4	complete the main contribution of the paper, the diffusion prior network, minus the diffusion training setup	2022-04-12 11:43:59 -07:00
Phil Wang	83aabd42ca	move epsilon inside of square root for further stability in rmsnorm improvise and use rmsnorm in convnext blocks too	2022-04-12 11:18:36 -07:00
Phil Wang	cf22affcbb	bring in modified unet using convnext blocks https://arxiv.org/abs/2201.03545	2022-04-12 10:58:44 -07:00
Phil Wang	522f42f582	start using RMSNorm, used in Gopher and AlphaCode, and as a way to go complete bias-less (purportedly more stable according to PaLM)	2022-04-12 10:45:03 -07:00
Phil Wang	0a60818965	dropouts in transformer, also prep for classifier free guidance in decoder	2022-04-12 10:42:57 -07:00
Phil Wang	771fe0d0d2	also consider accepting tokenizer, so dalle2 forward pass can just be invoked as DALLE2(<prompt string>)	2022-04-12 10:29:29 -07:00
Phil Wang	df4dac4f5a	bring in attention - it is all we need	2022-04-12 10:23:07 -07:00
Phil Wang	24b428bdfc	readme	2022-04-12 10:12:42 -07:00
Phil Wang	2ab042b862	create the eventual dream cli, like bigsleep library	2022-04-12 10:04:17 -07:00
Phil Wang	b93ad8b7a2	add cli file, use click	2022-04-12 09:58:53 -07:00
Phil Wang	5e03b7f932	get ready for all the training related classes and functions	2022-04-12 09:54:50 -07:00
Phil Wang	62c0d321a6	sketch	2022-04-12 09:39:42 -07:00
Phil Wang	7cf1637d24	bring in the simple tokenizer released by openai, but also plan on leaving room for custom tokenizer with yttm	2022-04-12 09:23:17 -07:00
Phil Wang	4ff6d021c9	pin to newer version of CLIP that returns encoded text and images, get some helper functions ready for XCLIP	2022-04-12 08:54:47 -07:00
Phil Wang	f283bf25be	scaffold	2022-04-07 07:29:34 -07:00

25 Commits