Phil Wang
d167378401
add cosine sim for self attention as well, as a setting
2022-07-29 12:48:20 -07:00
Phil Wang
2d67d5821e
change up epsilon in layernorm the case of using fp16, thanks to @Veldrovive for figuring out this stabilizes training
2022-07-29 12:41:02 -07:00
Phil Wang
748c7fe7af
allow for cosine sim cross attention, modify linear attention in attempt to resolve issue on fp16
2022-07-29 11:12:18 -07:00
Phil Wang
80046334ad
make sure entire readme runs without errors
2022-07-28 10:17:43 -07:00
Phil Wang
36fb46a95e
fix readme and a small bug in DALLE2 class
2022-07-28 08:33:51 -07:00
Phil Wang
07abfcf45b
rescale values in linear attention to mitigate overflows in fp16 setting
2022-07-27 12:27:38 -07:00
Phil Wang
406e75043f
add upsample combiner feature for the unets
2022-07-26 10:46:04 -07:00
Phil Wang
62043acb2f
fix repaint
2022-07-24 15:29:06 -07:00
Aidan Dempster
4145474bab
Improved upsampler training ( #181 )
...
Sampling is now possible without the first decoder unet
Non-training unets are deleted in the decoder trainer since they are never used and it is harder merge the models is they have keys in this state dict
Fixed a mistake where clip was not re-added after saving
2022-07-19 19:07:50 -07:00
Phil Wang
291377bb9c
@jacobwjs reports dynamic thresholding works very well and 0.95 is a better value
2022-07-19 11:31:56 -07:00
Phil Wang
723bf0abba
complete inpainting ability using inpaint_image and inpaint_mask passed into sample function for decoder
2022-07-19 09:26:55 -07:00
Phil Wang
d88c7ba56c
fix a bug with ddim and predict x0 objective
2022-07-18 19:04:26 -07:00
Phil Wang
3676a8ce78
comments
2022-07-18 15:02:04 -07:00
Phil Wang
da8e99ada0
fix sample bug
2022-07-18 13:50:22 -07:00
Phil Wang
6afb886cf4
complete imagen-like noise level conditioning
2022-07-18 13:43:57 -07:00
Phil Wang
a2ee3fa3cc
offer way to turn off initial cross embed convolutional module, for debugging upsampler artifacts
2022-07-15 17:29:10 -07:00
Phil Wang
a58a370d75
takes care of a grad strides error at https://github.com/lucidrains/DALLE2-pytorch/issues/196 thanks to @YUHANG-Ma
2022-07-14 15:28:34 -07:00
Phil Wang
1662bbf226
protect against random cropping for base unet
2022-07-14 12:49:43 -07:00
Phil Wang
a34f60962a
let the neural network peek at the low resolution conditioning one last time before making prediction, for upsamplers
2022-07-14 10:27:04 -07:00
Phil Wang
0b40cbaa54
just always use nearest neighbor interpolation when resizing for low resolution conditioning, for https://github.com/lucidrains/DALLE2-pytorch/pull/181
2022-07-13 20:59:43 -07:00
Phil Wang
f141144a6d
allow for using classifier free guidance for some unets but not others, by passing in a tuple of cond_scale during sampling for decoder, just in case it is causing issues for upsamplers
2022-07-13 13:12:30 -07:00
Phil Wang
f988207718
hack around some inplace error, also make sure for openai clip text encoding, only tokens after eos_id is masked out
2022-07-13 12:56:02 -07:00
Phil Wang
cc0f7a935c
fix non pixel shuffle upsample
2022-07-13 10:16:02 -07:00
Phil Wang
95a512cb65
fix a potential bug with conditioning with blurred low resolution image, blur should be applied only 50% of the time
2022-07-13 10:11:49 -07:00
Phil Wang
972ee973bc
fix issue with ddim and normalization of lowres conditioning image
2022-07-13 09:48:40 -07:00
Phil Wang
79e2a3bc77
only use the stable layernorm for final output norm in transformer
2022-07-13 07:56:30 -07:00
Phil Wang
349aaca56f
add yet another transformer stability measure
2022-07-12 17:49:16 -07:00
Phil Wang
3ee3c56d2a
add learned padding tokens, same strategy as dalle1, for diffusion prior, and get rid of masking in causal transformer
2022-07-12 17:33:14 -07:00
Phil Wang
775abc4df6
add setting to attend to all text encodings regardless of padding, for diffusion prior
2022-07-12 17:08:12 -07:00
Phil Wang
11b1d533a0
make sure text encodings being passed in has the correct batch dimension
2022-07-12 16:00:19 -07:00
Phil Wang
e76e89f9eb
remove text masking altogether in favor of deriving from text encodings (padded text encodings must be pad value of 0.)
2022-07-12 15:40:31 -07:00
Phil Wang
bb3ff0ac67
protect against bad text mask being passed into decoder
2022-07-12 15:33:13 -07:00
Phil Wang
1ec4dbe64f
one more fix for text mask, if the length of the text encoding exceeds max_text_len, add an assert for better error msg
2022-07-12 15:01:46 -07:00
Phil Wang
e0835acca9
generate text mask within the unet and diffusion prior itself from the text encodings, if not given
2022-07-12 12:54:59 -07:00
Phil Wang
1d9ef99288
add PixelShuffleUpsample thanks to @MalumaDev and @marunine for running the experiment and verifyng absence of checkboard artifacts
2022-07-11 16:07:23 -07:00
Phil Wang
bdd62c24b3
zero init final projection in unet, since openai and @crowsonkb are both doing it
2022-07-11 13:22:06 -07:00
Phil Wang
1f1557c614
make it so even if text mask is omitted, it will be derived based on whether text encodings are all 0s or not, simplify dataloading
2022-07-11 10:56:19 -07:00
Phil Wang
7ea314e2f0
allow for final l2norm clamping of the sampled image embed
2022-07-10 09:44:38 -07:00
Phil Wang
3dae43fa0e
fix misnamed variable, thanks to @nousr
2022-07-09 19:01:37 -07:00
Phil Wang
a598820012
do not noise for the last step in ddim
2022-07-09 18:38:40 -07:00
Phil Wang
4878762627
fix for small validation bug for sampling steps
2022-07-09 17:31:54 -07:00
Phil Wang
47ae17b36e
more informative error for something that tripped me up
2022-07-09 17:28:14 -07:00
Phil Wang
b7e22f7da0
complete ddim integration of diffusion prior as well as decoder for each unet, feature complete for https://github.com/lucidrains/DALLE2-pytorch/issues/157
2022-07-09 17:25:34 -07:00
Phil Wang
3070610231
just force it so researcher can never pass in an image that is less than the size that is required for CLIP or CoCa
2022-07-08 18:17:29 -07:00
Phil Wang
8c823affff
allow for control over use of nearest interp method of downsampling low res conditioning, in addition to being able to turn it off
2022-07-08 11:44:43 -07:00
Phil Wang
46be8c32d3
fix a potential issue in the low resolution conditioner, when downsampling and then upsampling using resize right, thanks to @marunine
2022-07-07 09:41:49 -07:00
Phil Wang
900f086a6d
fix condition_on_text_encodings in dalle2 orchestrator class, fix readme
2022-07-07 07:43:41 -07:00
Phil Wang
6a59c7093d
more shots in the dark regarding fp16 with learned variance for deepspeed issue
2022-07-06 19:05:50 -07:00
Phil Wang
1bd8a7835a
attempting to fix issue with deepspeed fp16 seeing overflowing gradient
2022-07-06 08:27:34 -07:00
Phil Wang
f33453df9f
debugging with Aidan
2022-07-05 18:22:43 -07:00