DALLE2-pytorch

mirror of https://github.com/lucidrains/DALLE2-pytorch.git synced 2025-12-19 17:54:20 +01:00

Author	SHA1	Message	Date
Phil Wang	4010aec033	turn off classifier free guidance if predicting x_start for diffusion prior	2022-05-07 09:38:17 -07:00
Phil Wang	830afd3c15	sinusoidal embed time embeddings for diffusion prior as well, for continuous version	2022-05-07 08:32:43 -07:00
Phil Wang	8f93729d19	when in doubt, make it a hyperparameter	2022-05-07 07:52:17 -07:00
Phil Wang	85ed77d512	fix a potentially huge bug thanks to @CiaoHe https://github.com/lucidrains/DALLE2-pytorch/issues/71	2022-05-07 05:05:54 -07:00
Phil Wang	28e944f328	make sure openai clip adapter outputs l2normed embeddings	2022-05-06 10:12:03 -07:00
Phil Wang	14e63a3f67	also offer l2norm clamping in diffusion prior during training, if one were using predict x0 objective	2022-05-06 10:05:14 -07:00
Phil Wang	ad20a14a4d	bring in rotary embeddings for diffusion prior causal transformer (the most powerful relative positional encoding, used in PaLM) - 0.1.0 because of breaking change	2022-05-06 08:45:30 -07:00
Phil Wang	0be1e0d64c	support CoCa, which seems to be better than CLIP (has an autoregressive text encoder) https://arxiv.org/abs/2205.01917	2022-05-06 08:27:12 -07:00
Phil Wang	98df1ba51e	add diffusion prior trainer, which automatically takes care of the exponential moving average (training and sampling), as well as mixed precision, gradient clipping	2022-05-06 08:11:09 -07:00
Phil Wang	878b555ef7	fix training with clip	2022-05-06 07:37:57 -07:00
Phil Wang	c76a964fd6	allow for CLIP to be optional in Decoder, and allow DecoderTrainer to work off training pre-encoded image embeddings	2022-05-05 08:11:01 -07:00
Phil Wang	8518684ae9	does not make much sense, as researchers may want to try predicting noise with diffusionprior instead of predicting x0	2022-05-05 07:37:00 -07:00
Phil Wang	1d5dc08810	take @crowsonkb 's suggestion at https://github.com/lucidrains/DALLE2-pytorch/issues/60#issue-1226116132	2022-05-05 07:28:53 -07:00
Phil Wang	896f19786d	remove convnext blocks, they are illsuited for generative work, validated by early experimental results at https://github.com/lucidrains/video-diffusion-pytorch	2022-05-05 07:07:21 -07:00
Phil Wang	aec5575d09	take a bet on resize right, given Katherine is using it	2022-05-04 19:26:45 -07:00
Phil Wang	9773f10d6c	use inference mode whenever possible, cleanup	2022-05-04 15:25:05 -07:00
Phil Wang	86e692d24f	fix random crop probability	2022-05-04 11:52:24 -07:00
Phil Wang	97b751209f	allow for last unet in the cascade to be trained on crops, if it is convolution-only	2022-05-04 11:48:48 -07:00
Phil Wang	5b619c2fd5	make sure some hyperparameters for unet block is configurable	2022-05-04 11:18:32 -07:00
Phil Wang	9ff228188b	offer old resnet blocks, from the original DDPM paper, just in case convnexts are unsuitable for generative work	2022-05-04 10:52:58 -07:00
Phil Wang	70282de23b	add ability to turn on normformer settings, given @borisdayma reported good results and some personal anecdata	2022-05-02 11:33:15 -07:00
Phil Wang	11469dc0c6	makes more sense to keep this as True as default, for stability	2022-05-02 10:50:55 -07:00
Phil Wang	0fc6c9cdf3	provide option to l2norm the output of the diffusion prior	2022-05-02 09:41:03 -07:00
Phil Wang	ad87bfe28f	switch to using linear attention for the sparse attention layers within unet, given success in GAN projects	2022-05-01 17:59:03 -07:00
Phil Wang	b8cf1e5c20	more attention	2022-05-01 11:00:33 -07:00
Phil Wang	5e421bd5bb	let researchers do the hyperparameter search	2022-05-01 08:46:21 -07:00
Phil Wang	67fcab1122	add MLP based time conditioning to all convnexts, in addition to cross attention. also add an initial convolution, given convnext first depthwise conv	2022-05-01 08:41:02 -07:00
Phil Wang	d1a697ac23	allows one to shortcut sampling at a specific unet number, if one were to be training in stages	2022-04-30 16:05:13 -07:00
Phil Wang	a9421f49ec	simplify Decoder training for the public	2022-04-30 11:45:18 -07:00
Phil Wang	77fa34eae9	fix all clipping / clamping issues	2022-04-30 10:08:24 -07:00
Phil Wang	1c1e508369	fix all issues with text encodings conditioning in the decoder, using null padding tokens technique from dalle v1	2022-04-30 09:13:34 -07:00
Phil Wang	f19c99ecb0	fix decoder needing separate conditional dropping probabilities for image embeddings and text encodings, thanks to @xiankgx !	2022-04-30 08:48:05 -07:00
Phil Wang	20e7eb5a9b	cleanup	2022-04-30 07:22:57 -07:00
Phil Wang	e2f9615afa	use @clip-anytorch , thanks to @rom1504	2022-04-30 06:40:54 -07:00
Phil Wang	0d1c07c803	fix a bug with classifier free guidance, thanks to @xiankgx again!	2022-04-30 06:34:57 -07:00
Phil Wang	5063d192b6	now completely OpenAI CLIP compatible for training just take care of the logic for AdamW and transformers used namedtuples for clip adapter embedding outputs	2022-04-29 13:05:01 -07:00
Phil Wang	fb662a62f3	fix another bug thanks to @xiankgx	2022-04-29 07:38:32 -07:00
Phil Wang	587c8c9b44	optimize for clarity	2022-04-28 21:59:13 -07:00
Phil Wang	aa900213e7	force first unet in the cascade to be conditioned on image embeds	2022-04-28 20:53:15 -07:00
Phil Wang	625ce23f6b	🐛	2022-04-28 07:21:18 -07:00
Phil Wang	dbf4a281f1	make sure another CLIP can actually be passed in, as long as it is wrapped in an adapter extended from BaseClipAdapter	2022-04-27 20:45:27 -07:00
Phil Wang	4ab527e779	some extra asserts for text encoding of diffusion prior and decoder	2022-04-27 20:11:43 -07:00
Phil Wang	d0cdeb3247	add ability for DALL-E2 to return PIL images with `return_pil_images = True` on forward, for those who have no clue about deep learning	2022-04-27 19:58:06 -07:00
Phil Wang	8c610aad9a	only pass text encodings conditioning in diffusion prior if specified on initialization	2022-04-27 19:48:16 -07:00
Phil Wang	6700381a37	prepare for ability to integrate other clips other than x-clip	2022-04-27 19:35:05 -07:00
Phil Wang	fa3bb6ba5c	make sure cpu-only still works	2022-04-27 08:02:10 -07:00
Phil Wang	2705e7c9b0	attention-based upsampling claims unsupported by local experiments, removing	2022-04-27 07:51:04 -07:00
Phil Wang	de0296106b	be able to turn off warning for use of LazyLinear by passing in text embedding dimension for unet	2022-04-26 11:42:46 -07:00
Phil Wang	eafb136214	suppress a warning	2022-04-26 11:40:45 -07:00
Phil Wang	bfbcc283a3	DRY a tiny bit for gaussian diffusion related logic	2022-04-26 11:39:12 -07:00

... 2 3 4 5 6

277 Commits