DALLE2-pytorch

mirror of https://github.com/lucidrains/DALLE2-pytorch.git synced 2025-12-19 17:54:20 +01:00

Author	SHA1	Message	Date
Phil Wang	0283556608	fix example in readme, since api changed	2022-04-29 13:40:55 -07:00
Phil Wang	5063d192b6	now completely OpenAI CLIP compatible for training just take care of the logic for AdamW and transformers used namedtuples for clip adapter embedding outputs	2022-04-29 13:05:01 -07:00
Phil Wang	f4a54e475e	add some training fns	2022-04-29 09:44:55 -07:00
Phil Wang	fb662a62f3	fix another bug thanks to @xiankgx 0.0.65	2022-04-29 07:38:32 -07:00
Phil Wang	587c8c9b44	optimize for clarity	2022-04-28 21:59:13 -07:00
Phil Wang	aa900213e7	force first unet in the cascade to be conditioned on image embeds 0.0.64	2022-04-28 20:53:15 -07:00
Phil Wang	cb26187450	vqgan-vae codebook dims should be 256 or smaller 0.0.63	2022-04-28 08:59:03 -07:00
Phil Wang	625ce23f6b	🐛 0.0.62	2022-04-28 07:21:18 -07:00
Phil Wang	dbf4a281f1	make sure another CLIP can actually be passed in, as long as it is wrapped in an adapter extended from BaseClipAdapter 0.0.61	2022-04-27 20:45:27 -07:00
Phil Wang	4ab527e779	some extra asserts for text encoding of diffusion prior and decoder 0.0.60	2022-04-27 20:11:43 -07:00
Phil Wang	d0cdeb3247	add ability for DALL-E2 to return PIL images with `return_pil_images = True` on forward, for those who have no clue about deep learning	2022-04-27 19:58:06 -07:00
Phil Wang	8c610aad9a	only pass text encodings conditioning in diffusion prior if specified on initialization 0.0.58	2022-04-27 19:48:16 -07:00
Phil Wang	6700381a37	prepare for ability to integrate other clips other than x-clip 0.0.57	2022-04-27 19:35:05 -07:00
Phil Wang	20377f889a	todo	2022-04-27 17:22:14 -07:00
Phil Wang	6edb1c5dd0	fix issue with ema class 0.0.56	2022-04-27 16:40:02 -07:00
Phil Wang	b093f92182	inform what is possible	2022-04-27 08:25:16 -07:00
Phil Wang	fa3bb6ba5c	make sure cpu-only still works 0.0.55	2022-04-27 08:02:10 -07:00
Phil Wang	2705e7c9b0	attention-based upsampling claims unsupported by local experiments, removing	2022-04-27 07:51:04 -07:00
Phil Wang	77141882c8	complete vit-vqgan from https://arxiv.org/abs/2110.04627 0.0.54	2022-04-26 17:20:47 -07:00
Phil Wang	4075d02139	nevermind, it could be working, but only when i stabilize it with the feedforward layer + tanh as proposed in vit-vqgan paper (which will be built into the repository later for the latent diffusion)	2022-04-26 12:43:31 -07:00
Phil Wang	de0296106b	be able to turn off warning for use of LazyLinear by passing in text embedding dimension for unet 0.0.52	2022-04-26 11:42:46 -07:00
Phil Wang	eafb136214	suppress a warning 0.0.51	2022-04-26 11:40:45 -07:00
Phil Wang	bfbcc283a3	DRY a tiny bit for gaussian diffusion related logic	2022-04-26 11:39:12 -07:00
Phil Wang	c30544b73a	no CLIP altogether for training DiffusionPrior 0.0.50	2022-04-26 10:23:41 -07:00
Phil Wang	bdf5e9c009	todo	2022-04-26 09:56:54 -07:00
Phil Wang	9878be760b	have researcher explicitly state upfront whether to condition with text encodings in cascading ddpm decoder, have DALLE-2 class take care of passing in text if feature turned on 0.0.49	2022-04-26 09:47:09 -07:00
Phil Wang	7ba6357c05	allow for training the Prior network with precomputed CLIP embeddings (or text encodings) 0.0.48	2022-04-26 09:29:51 -07:00
Phil Wang	76e063e8b7	refactor so that the causal transformer in the diffusion prior network can be conditioned without text encodings (for Laions parallel efforts, although it seems from the paper it is needed) 0.0.47	2022-04-26 09:00:11 -07:00
Phil Wang	4d25976f33	make sure non-latent diffusion still works 0.0.46	2022-04-26 08:36:00 -07:00
Phil Wang	0b28ee0d01	revert back to old upsampling, paper does not work	2022-04-26 07:39:04 -07:00
Phil Wang	45262a4bb7	bring in the exponential moving average wrapper, to get ready for training	2022-04-25 19:24:13 -07:00
Phil Wang	13a58a78c4	scratch off todo	2022-04-25 19:01:30 -07:00
Phil Wang	f75d49c781	start a file for all attention-related modules, use attention-based upsampling in the unets in dalle-2 0.0.45	2022-04-25 18:59:10 -07:00
Phil Wang	3b520dfa85	bring in attention-based upsampling to strengthen vqgan-vae, seems to work as advertised in initial experiments in GAN 0.0.44	2022-04-25 17:27:45 -07:00
Phil Wang	79198c6ae4	keep readme simple for reader	2022-04-25 17:21:45 -07:00
Phil Wang	77a246b1b9	todo	2022-04-25 08:48:28 -07:00
Phil Wang	f93a3f6ed8	reprioritize	2022-04-25 08:44:27 -07:00
Phil Wang	8f2a0c7e00	better naming 0.0.43	2022-04-25 07:44:33 -07:00
Phil Wang	863f4ef243	just take care of the logic for setting all latent diffusion to predict x0, if needed 0.0.42	2022-04-24 10:06:42 -07:00
Phil Wang	fb8a66a2de	just in case latent diffusion performs better with prediction of x0 instead of epsilon, open up the research avenue 0.0.41	2022-04-24 10:04:22 -07:00
Phil Wang	579d4b42dd	does not seem right to clip for the prior diffusion part 0.0.40	2022-04-24 09:51:18 -07:00
Phil Wang	473808850a	some outlines to the eventual CLI endpoint	2022-04-24 09:27:15 -07:00
Phil Wang	d5318aef4f	todo	2022-04-23 08:23:08 -07:00
Phil Wang	f82917e1fd	prepare for turning off gradient penalty, as shown in GAN literature, GP needs to be only applied 1 out of 4 iterations 0.0.39	2022-04-23 07:52:10 -07:00
Phil Wang	05b74be69a	use null container pattern to cleanup some conditionals, save more cleanup for next week 0.0.38	2022-04-22 15:23:18 -07:00
Phil Wang	a8b5d5d753	last tweak of readme	2022-04-22 14:16:43 -07:00
Phil Wang	976ef7f87c	project management	2022-04-22 14:15:42 -07:00
Phil Wang	fd175bcc0e	readme	2022-04-22 14:13:33 -07:00
Phil Wang	76b32f18b3	first pass at complete DALL-E2 + Latent Diffusion integration, latent diffusion on any layer(s) of the cascading ddpm in the decoder.	2022-04-22 13:53:13 -07:00
Phil Wang	f2d5b87677	todo	2022-04-22 11:39:58 -07:00

1 2 3 4

164 Commits