Phil Wang
|
a2ee3fa3cc
|
offer way to turn off initial cross embed convolutional module, for debugging upsampler artifacts
|
2022-07-15 17:29:10 -07:00 |
|
Phil Wang
|
a58a370d75
|
takes care of a grad strides error at https://github.com/lucidrains/DALLE2-pytorch/issues/196 thanks to @YUHANG-Ma
|
2022-07-14 15:28:34 -07:00 |
|
Phil Wang
|
1662bbf226
|
protect against random cropping for base unet
|
2022-07-14 12:49:43 -07:00 |
|
Phil Wang
|
a34f60962a
|
let the neural network peek at the low resolution conditioning one last time before making prediction, for upsamplers
|
2022-07-14 10:27:04 -07:00 |
|
Phil Wang
|
0b40cbaa54
|
just always use nearest neighbor interpolation when resizing for low resolution conditioning, for https://github.com/lucidrains/DALLE2-pytorch/pull/181
|
2022-07-13 20:59:43 -07:00 |
|
Phil Wang
|
f141144a6d
|
allow for using classifier free guidance for some unets but not others, by passing in a tuple of cond_scale during sampling for decoder, just in case it is causing issues for upsamplers
|
2022-07-13 13:12:30 -07:00 |
|
Phil Wang
|
f988207718
|
hack around some inplace error, also make sure for openai clip text encoding, only tokens after eos_id is masked out
|
2022-07-13 12:56:02 -07:00 |
|
Phil Wang
|
b2073219f0
|
foolproof sampling for decoder to always use eval mode (and restore training state afterwards)
|
2022-07-13 10:21:00 -07:00 |
|
Phil Wang
|
cc0f7a935c
|
fix non pixel shuffle upsample
|
2022-07-13 10:16:02 -07:00 |
|
Phil Wang
|
95a512cb65
|
fix a potential bug with conditioning with blurred low resolution image, blur should be applied only 50% of the time
|
2022-07-13 10:11:49 -07:00 |
|
Phil Wang
|
972ee973bc
|
fix issue with ddim and normalization of lowres conditioning image
|
2022-07-13 09:48:40 -07:00 |
|
Phil Wang
|
79e2a3bc77
|
only use the stable layernorm for final output norm in transformer
|
2022-07-13 07:56:30 -07:00 |
|
Phil Wang
|
349aaca56f
|
add yet another transformer stability measure
|
2022-07-12 17:49:16 -07:00 |
|
Phil Wang
|
3ee3c56d2a
|
add learned padding tokens, same strategy as dalle1, for diffusion prior, and get rid of masking in causal transformer
|
2022-07-12 17:33:14 -07:00 |
|
Phil Wang
|
cd26c6b17d
|
0.22.3
|
2022-07-12 17:08:31 -07:00 |
|
Phil Wang
|
775abc4df6
|
add setting to attend to all text encodings regardless of padding, for diffusion prior
|
2022-07-12 17:08:12 -07:00 |
|
Phil Wang
|
11b1d533a0
|
make sure text encodings being passed in has the correct batch dimension
|
2022-07-12 16:00:19 -07:00 |
|
Phil Wang
|
e76e89f9eb
|
remove text masking altogether in favor of deriving from text encodings (padded text encodings must be pad value of 0.)
|
2022-07-12 15:40:31 -07:00 |
|
Phil Wang
|
bb3ff0ac67
|
protect against bad text mask being passed into decoder
|
2022-07-12 15:33:13 -07:00 |
|
Phil Wang
|
1ec4dbe64f
|
one more fix for text mask, if the length of the text encoding exceeds max_text_len, add an assert for better error msg
|
2022-07-12 15:01:46 -07:00 |
|
Phil Wang
|
e0835acca9
|
generate text mask within the unet and diffusion prior itself from the text encodings, if not given
|
2022-07-12 12:54:59 -07:00 |
|
Phil Wang
|
1d9ef99288
|
add PixelShuffleUpsample thanks to @MalumaDev and @marunine for running the experiment and verifyng absence of checkboard artifacts
|
2022-07-11 16:07:23 -07:00 |
|
Phil Wang
|
bdd62c24b3
|
zero init final projection in unet, since openai and @crowsonkb are both doing it
|
2022-07-11 13:22:06 -07:00 |
|
Phil Wang
|
1f1557c614
|
make it so even if text mask is omitted, it will be derived based on whether text encodings are all 0s or not, simplify dataloading
|
2022-07-11 10:56:19 -07:00 |
|
Phil Wang
|
7ea314e2f0
|
allow for final l2norm clamping of the sampled image embed
|
2022-07-10 09:44:38 -07:00 |
|
Phil Wang
|
3dae43fa0e
|
fix misnamed variable, thanks to @nousr
|
2022-07-09 19:01:37 -07:00 |
|
Phil Wang
|
a598820012
|
do not noise for the last step in ddim
|
2022-07-09 18:38:40 -07:00 |
|
Phil Wang
|
4878762627
|
fix for small validation bug for sampling steps
|
2022-07-09 17:31:54 -07:00 |
|
Phil Wang
|
47ae17b36e
|
more informative error for something that tripped me up
|
2022-07-09 17:28:14 -07:00 |
|
Phil Wang
|
b7e22f7da0
|
complete ddim integration of diffusion prior as well as decoder for each unet, feature complete for https://github.com/lucidrains/DALLE2-pytorch/issues/157
|
2022-07-09 17:25:34 -07:00 |
|
Phil Wang
|
097afda606
|
0.18.0
|
2022-07-08 18:18:38 -07:00 |
|
Phil Wang
|
3070610231
|
just force it so researcher can never pass in an image that is less than the size that is required for CLIP or CoCa
|
2022-07-08 18:17:29 -07:00 |
|
Phil Wang
|
081d8d3484
|
0.17.0
|
2022-07-08 13:36:26 -07:00 |
|
Phil Wang
|
d7bc5fbedd
|
expose num_steps_taken helper method on trainer to retrieve number of training steps of each unet
|
2022-07-08 13:00:56 -07:00 |
|
Phil Wang
|
8c823affff
|
allow for control over use of nearest interp method of downsampling low res conditioning, in addition to being able to turn it off
|
2022-07-08 11:44:43 -07:00 |
|
Phil Wang
|
ec7cab01d9
|
extra insurance that diffusion prior is on the correct device, when using trainer with accelerator or device was given
|
2022-07-07 10:08:33 -07:00 |
|
Phil Wang
|
46be8c32d3
|
fix a potential issue in the low resolution conditioner, when downsampling and then upsampling using resize right, thanks to @marunine
|
2022-07-07 09:41:49 -07:00 |
|
Phil Wang
|
900f086a6d
|
fix condition_on_text_encodings in dalle2 orchestrator class, fix readme
|
2022-07-07 07:43:41 -07:00 |
|
Phil Wang
|
6a59c7093d
|
more shots in the dark regarding fp16 with learned variance for deepspeed issue
|
2022-07-06 19:05:50 -07:00 |
|
Phil Wang
|
a6cdbe0b9c
|
relax learning rate constraint, as @rom1504 wants to try a higher one
|
2022-07-06 18:09:11 -07:00 |
|
Phil Wang
|
e928ae5c34
|
default the device to the device that the diffusion prior parameters are on, if the trainer was never given the accelerator nor device
|
2022-07-06 12:47:48 -07:00 |
|
Phil Wang
|
1bd8a7835a
|
attempting to fix issue with deepspeed fp16 seeing overflowing gradient
|
2022-07-06 08:27:34 -07:00 |
|
Phil Wang
|
f33453df9f
|
debugging with Aidan
|
2022-07-05 18:22:43 -07:00 |
|
Phil Wang
|
1e4bb2bafb
|
cast long as float before deriving sinusoidal pos emb
|
2022-07-05 18:01:22 -07:00 |
|
Phil Wang
|
ee75515c7d
|
remove forcing of softmax in f32, in case it is interfering with deepspeed
|
2022-07-05 16:53:58 -07:00 |
|
Phil Wang
|
ec68243479
|
set ability to do warmup steps for each unet during training
|
2022-07-05 16:24:16 -07:00 |
|
Phil Wang
|
3afdcdfe86
|
need to keep track of training steps separately for each unet in decoder trainer
|
2022-07-05 15:17:59 -07:00 |
|
Phil Wang
|
b9a908ff75
|
bring in two tricks from the cogview paper for reducing the chances of overflow, for attention and layernorm
|
2022-07-05 14:27:04 -07:00 |
|
Phil Wang
|
e1fe3089df
|
do bias-less layernorm manually
|
2022-07-05 13:09:58 -07:00 |
|
Phil Wang
|
ec5a77fc55
|
0.15.4
|
2022-07-02 08:56:34 -07:00 |
|