Commit Graph

353 Commits

Author SHA1 Message Date
Phil Wang
06c65b60d2 1.0.0 2022-07-19 19:08:17 -07:00
Aidan Dempster
4145474bab Improved upsampler training (#181)
Sampling is now possible without the first decoder unet

Non-training unets are deleted in the decoder trainer since they are never used and it is harder merge the models is they have keys in this state dict

Fixed a mistake where clip was not re-added after saving
2022-07-19 19:07:50 -07:00
Phil Wang
4b912a38c6 0.26.2 2022-07-19 17:50:36 -07:00
Aidan Dempster
f97e55ec6b Quality of life improvements for tracker savers (#210)
The default save location is now none so if keys are not specified the
corresponding checkpoint type is not saved.

Models and checkpoints are now both saved with version number and the
config used to create them in order to simplify loading.

Documentation was fixed to be in line with current usage.
2022-07-19 17:50:18 -07:00
Phil Wang
291377bb9c @jacobwjs reports dynamic thresholding works very well and 0.95 is a better value 2022-07-19 11:31:56 -07:00
Phil Wang
723bf0abba complete inpainting ability using inpaint_image and inpaint_mask passed into sample function for decoder 2022-07-19 09:26:55 -07:00
Phil Wang
d88c7ba56c fix a bug with ddim and predict x0 objective 2022-07-18 19:04:26 -07:00
Phil Wang
3676a8ce78 comments 2022-07-18 15:02:04 -07:00
Phil Wang
da8e99ada0 fix sample bug 2022-07-18 13:50:22 -07:00
Phil Wang
6afb886cf4 complete imagen-like noise level conditioning 2022-07-18 13:43:57 -07:00
Phil Wang
a2ee3fa3cc offer way to turn off initial cross embed convolutional module, for debugging upsampler artifacts 2022-07-15 17:29:10 -07:00
Phil Wang
a58a370d75 takes care of a grad strides error at https://github.com/lucidrains/DALLE2-pytorch/issues/196 thanks to @YUHANG-Ma 2022-07-14 15:28:34 -07:00
Phil Wang
1662bbf226 protect against random cropping for base unet 2022-07-14 12:49:43 -07:00
Phil Wang
a34f60962a let the neural network peek at the low resolution conditioning one last time before making prediction, for upsamplers 2022-07-14 10:27:04 -07:00
Phil Wang
0b40cbaa54 just always use nearest neighbor interpolation when resizing for low resolution conditioning, for https://github.com/lucidrains/DALLE2-pytorch/pull/181 2022-07-13 20:59:43 -07:00
Phil Wang
f141144a6d allow for using classifier free guidance for some unets but not others, by passing in a tuple of cond_scale during sampling for decoder, just in case it is causing issues for upsamplers 2022-07-13 13:12:30 -07:00
Phil Wang
f988207718 hack around some inplace error, also make sure for openai clip text encoding, only tokens after eos_id is masked out 2022-07-13 12:56:02 -07:00
Phil Wang
b2073219f0 foolproof sampling for decoder to always use eval mode (and restore training state afterwards) 2022-07-13 10:21:00 -07:00
Phil Wang
cc0f7a935c fix non pixel shuffle upsample 2022-07-13 10:16:02 -07:00
Phil Wang
95a512cb65 fix a potential bug with conditioning with blurred low resolution image, blur should be applied only 50% of the time 2022-07-13 10:11:49 -07:00
Phil Wang
972ee973bc fix issue with ddim and normalization of lowres conditioning image 2022-07-13 09:48:40 -07:00
Phil Wang
79e2a3bc77 only use the stable layernorm for final output norm in transformer 2022-07-13 07:56:30 -07:00
Phil Wang
349aaca56f add yet another transformer stability measure 2022-07-12 17:49:16 -07:00
Phil Wang
3ee3c56d2a add learned padding tokens, same strategy as dalle1, for diffusion prior, and get rid of masking in causal transformer 2022-07-12 17:33:14 -07:00
Phil Wang
cd26c6b17d 0.22.3 2022-07-12 17:08:31 -07:00
Phil Wang
775abc4df6 add setting to attend to all text encodings regardless of padding, for diffusion prior 2022-07-12 17:08:12 -07:00
Phil Wang
11b1d533a0 make sure text encodings being passed in has the correct batch dimension 2022-07-12 16:00:19 -07:00
Phil Wang
e76e89f9eb remove text masking altogether in favor of deriving from text encodings (padded text encodings must be pad value of 0.) 2022-07-12 15:40:31 -07:00
Phil Wang
bb3ff0ac67 protect against bad text mask being passed into decoder 2022-07-12 15:33:13 -07:00
Phil Wang
1ec4dbe64f one more fix for text mask, if the length of the text encoding exceeds max_text_len, add an assert for better error msg 2022-07-12 15:01:46 -07:00
Phil Wang
e0835acca9 generate text mask within the unet and diffusion prior itself from the text encodings, if not given 2022-07-12 12:54:59 -07:00
Phil Wang
1d9ef99288 add PixelShuffleUpsample thanks to @MalumaDev and @marunine for running the experiment and verifyng absence of checkboard artifacts 2022-07-11 16:07:23 -07:00
Phil Wang
bdd62c24b3 zero init final projection in unet, since openai and @crowsonkb are both doing it 2022-07-11 13:22:06 -07:00
Phil Wang
1f1557c614 make it so even if text mask is omitted, it will be derived based on whether text encodings are all 0s or not, simplify dataloading 2022-07-11 10:56:19 -07:00
Phil Wang
7ea314e2f0 allow for final l2norm clamping of the sampled image embed 2022-07-10 09:44:38 -07:00
Phil Wang
3dae43fa0e fix misnamed variable, thanks to @nousr 2022-07-09 19:01:37 -07:00
Phil Wang
a598820012 do not noise for the last step in ddim 2022-07-09 18:38:40 -07:00
Phil Wang
4878762627 fix for small validation bug for sampling steps 2022-07-09 17:31:54 -07:00
Phil Wang
47ae17b36e more informative error for something that tripped me up 2022-07-09 17:28:14 -07:00
Phil Wang
b7e22f7da0 complete ddim integration of diffusion prior as well as decoder for each unet, feature complete for https://github.com/lucidrains/DALLE2-pytorch/issues/157 2022-07-09 17:25:34 -07:00
Phil Wang
097afda606 0.18.0 2022-07-08 18:18:38 -07:00
Aidan Dempster
5c520db825 Added deepspeed support (#195) 2022-07-08 18:18:08 -07:00
Phil Wang
3070610231 just force it so researcher can never pass in an image that is less than the size that is required for CLIP or CoCa 2022-07-08 18:17:29 -07:00
Phil Wang
081d8d3484 0.17.0 2022-07-08 13:36:26 -07:00
Aidan Dempster
a71f693a26 Add the ability to auto restart the last run when started after a crash (#191)
* Added autoresume after crash functionality to the trackers

* Updated documentation

* Clarified what goes in the autorestart object

* Fixed style issues

Unraveled conditional block

Chnaged to using helper function to get step count
2022-07-08 13:35:40 -07:00
Phil Wang
d7bc5fbedd expose num_steps_taken helper method on trainer to retrieve number of training steps of each unet 2022-07-08 13:00:56 -07:00
Phil Wang
8c823affff allow for control over use of nearest interp method of downsampling low res conditioning, in addition to being able to turn it off 2022-07-08 11:44:43 -07:00
Phil Wang
ec7cab01d9 extra insurance that diffusion prior is on the correct device, when using trainer with accelerator or device was given 2022-07-07 10:08:33 -07:00
Phil Wang
46be8c32d3 fix a potential issue in the low resolution conditioner, when downsampling and then upsampling using resize right, thanks to @marunine 2022-07-07 09:41:49 -07:00
Phil Wang
900f086a6d fix condition_on_text_encodings in dalle2 orchestrator class, fix readme 2022-07-07 07:43:41 -07:00