From bdf5e9c0096ca2fa670a1ecb64568e7cbd69361a Mon Sep 17 00:00:00 2001 From: Phil Wang Date: Tue, 26 Apr 2022 09:56:54 -0700 Subject: [PATCH] todo --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4500708..5a90f41 100644 --- a/README.md +++ b/README.md @@ -594,7 +594,7 @@ Once built, images will be saved to the same directory the command is invoked - [x] build out latent diffusion architecture, with the vq-reg variant (vqgan-vae), make it completely optional and compatible with cascading ddpms - [x] for decoder, allow ability to customize objective (predict epsilon vs x0), in case latent diffusion does better with prediction of x0 - [x] use attention-based upsampling https://arxiv.org/abs/2112.11435 -- [ ] spend one day cleaning up tech debt in decoder +- [ ] abstract interface for CLIP adapter class, so other CLIPs can be brought in - use inheritance just this once for sharing logic between decoder and prior network ddpms - [ ] become an expert with unets, cleanup unet code, make it fully configurable, port all learnings over to https://github.com/lucidrains/x-unet - [ ] copy the cascading ddpm code to a separate repo (perhaps https://github.com/lucidrains/denoising-diffusion-pytorch) as the main contribution of dalle2 really is just the prior network - [ ] transcribe code to Jax, which lowers the activation energy for distributed training, given access to TPUs