Phil Wang
|
723bf0abba
|
complete inpainting ability using inpaint_image and inpaint_mask passed into sample function for decoder
|
2022-07-19 09:26:55 -07:00 |
|
Phil Wang
|
d88c7ba56c
|
fix a bug with ddim and predict x0 objective
|
2022-07-18 19:04:26 -07:00 |
|
Phil Wang
|
3676a8ce78
|
comments
|
2022-07-18 15:02:04 -07:00 |
|
Phil Wang
|
da8e99ada0
|
fix sample bug
|
2022-07-18 13:50:22 -07:00 |
|
Phil Wang
|
6afb886cf4
|
complete imagen-like noise level conditioning
|
2022-07-18 13:43:57 -07:00 |
|
Phil Wang
|
a2ee3fa3cc
|
offer way to turn off initial cross embed convolutional module, for debugging upsampler artifacts
|
2022-07-15 17:29:10 -07:00 |
|
Phil Wang
|
a58a370d75
|
takes care of a grad strides error at https://github.com/lucidrains/DALLE2-pytorch/issues/196 thanks to @YUHANG-Ma
|
2022-07-14 15:28:34 -07:00 |
|
Phil Wang
|
1662bbf226
|
protect against random cropping for base unet
|
2022-07-14 12:49:43 -07:00 |
|
Phil Wang
|
a34f60962a
|
let the neural network peek at the low resolution conditioning one last time before making prediction, for upsamplers
|
2022-07-14 10:27:04 -07:00 |
|
Phil Wang
|
0b40cbaa54
|
just always use nearest neighbor interpolation when resizing for low resolution conditioning, for https://github.com/lucidrains/DALLE2-pytorch/pull/181
|
2022-07-13 20:59:43 -07:00 |
|
Phil Wang
|
f141144a6d
|
allow for using classifier free guidance for some unets but not others, by passing in a tuple of cond_scale during sampling for decoder, just in case it is causing issues for upsamplers
|
2022-07-13 13:12:30 -07:00 |
|
Phil Wang
|
f988207718
|
hack around some inplace error, also make sure for openai clip text encoding, only tokens after eos_id is masked out
|
2022-07-13 12:56:02 -07:00 |
|
Phil Wang
|
cc0f7a935c
|
fix non pixel shuffle upsample
|
2022-07-13 10:16:02 -07:00 |
|
Phil Wang
|
95a512cb65
|
fix a potential bug with conditioning with blurred low resolution image, blur should be applied only 50% of the time
|
2022-07-13 10:11:49 -07:00 |
|
Phil Wang
|
972ee973bc
|
fix issue with ddim and normalization of lowres conditioning image
|
2022-07-13 09:48:40 -07:00 |
|
Phil Wang
|
79e2a3bc77
|
only use the stable layernorm for final output norm in transformer
|
2022-07-13 07:56:30 -07:00 |
|
Phil Wang
|
349aaca56f
|
add yet another transformer stability measure
|
2022-07-12 17:49:16 -07:00 |
|
Phil Wang
|
3ee3c56d2a
|
add learned padding tokens, same strategy as dalle1, for diffusion prior, and get rid of masking in causal transformer
|
2022-07-12 17:33:14 -07:00 |
|
Phil Wang
|
775abc4df6
|
add setting to attend to all text encodings regardless of padding, for diffusion prior
|
2022-07-12 17:08:12 -07:00 |
|
Phil Wang
|
11b1d533a0
|
make sure text encodings being passed in has the correct batch dimension
|
2022-07-12 16:00:19 -07:00 |
|
Phil Wang
|
e76e89f9eb
|
remove text masking altogether in favor of deriving from text encodings (padded text encodings must be pad value of 0.)
|
2022-07-12 15:40:31 -07:00 |
|
Phil Wang
|
bb3ff0ac67
|
protect against bad text mask being passed into decoder
|
2022-07-12 15:33:13 -07:00 |
|
Phil Wang
|
1ec4dbe64f
|
one more fix for text mask, if the length of the text encoding exceeds max_text_len, add an assert for better error msg
|
2022-07-12 15:01:46 -07:00 |
|
Phil Wang
|
e0835acca9
|
generate text mask within the unet and diffusion prior itself from the text encodings, if not given
|
2022-07-12 12:54:59 -07:00 |
|
Phil Wang
|
1d9ef99288
|
add PixelShuffleUpsample thanks to @MalumaDev and @marunine for running the experiment and verifyng absence of checkboard artifacts
|
2022-07-11 16:07:23 -07:00 |
|
Phil Wang
|
bdd62c24b3
|
zero init final projection in unet, since openai and @crowsonkb are both doing it
|
2022-07-11 13:22:06 -07:00 |
|
Phil Wang
|
1f1557c614
|
make it so even if text mask is omitted, it will be derived based on whether text encodings are all 0s or not, simplify dataloading
|
2022-07-11 10:56:19 -07:00 |
|
Phil Wang
|
7ea314e2f0
|
allow for final l2norm clamping of the sampled image embed
|
2022-07-10 09:44:38 -07:00 |
|
Phil Wang
|
3dae43fa0e
|
fix misnamed variable, thanks to @nousr
|
2022-07-09 19:01:37 -07:00 |
|
Phil Wang
|
a598820012
|
do not noise for the last step in ddim
|
2022-07-09 18:38:40 -07:00 |
|
Phil Wang
|
4878762627
|
fix for small validation bug for sampling steps
|
2022-07-09 17:31:54 -07:00 |
|
Phil Wang
|
47ae17b36e
|
more informative error for something that tripped me up
|
2022-07-09 17:28:14 -07:00 |
|
Phil Wang
|
b7e22f7da0
|
complete ddim integration of diffusion prior as well as decoder for each unet, feature complete for https://github.com/lucidrains/DALLE2-pytorch/issues/157
|
2022-07-09 17:25:34 -07:00 |
|
Phil Wang
|
3070610231
|
just force it so researcher can never pass in an image that is less than the size that is required for CLIP or CoCa
|
2022-07-08 18:17:29 -07:00 |
|
Phil Wang
|
8c823affff
|
allow for control over use of nearest interp method of downsampling low res conditioning, in addition to being able to turn it off
|
2022-07-08 11:44:43 -07:00 |
|
Phil Wang
|
46be8c32d3
|
fix a potential issue in the low resolution conditioner, when downsampling and then upsampling using resize right, thanks to @marunine
|
2022-07-07 09:41:49 -07:00 |
|
Phil Wang
|
900f086a6d
|
fix condition_on_text_encodings in dalle2 orchestrator class, fix readme
|
2022-07-07 07:43:41 -07:00 |
|
Phil Wang
|
6a59c7093d
|
more shots in the dark regarding fp16 with learned variance for deepspeed issue
|
2022-07-06 19:05:50 -07:00 |
|
Phil Wang
|
1bd8a7835a
|
attempting to fix issue with deepspeed fp16 seeing overflowing gradient
|
2022-07-06 08:27:34 -07:00 |
|
Phil Wang
|
f33453df9f
|
debugging with Aidan
|
2022-07-05 18:22:43 -07:00 |
|
Phil Wang
|
1e4bb2bafb
|
cast long as float before deriving sinusoidal pos emb
|
2022-07-05 18:01:22 -07:00 |
|
Phil Wang
|
ee75515c7d
|
remove forcing of softmax in f32, in case it is interfering with deepspeed
|
2022-07-05 16:53:58 -07:00 |
|
Phil Wang
|
b9a908ff75
|
bring in two tricks from the cogview paper for reducing the chances of overflow, for attention and layernorm
|
2022-07-05 14:27:04 -07:00 |
|
Phil Wang
|
e1fe3089df
|
do bias-less layernorm manually
|
2022-07-05 13:09:58 -07:00 |
|
Phil Wang
|
3d23ba4aa5
|
add ability to specify full self attention on specific stages in the unet
|
2022-07-01 10:22:07 -07:00 |
|
Phil Wang
|
7b0edf9e42
|
allow for returning low resolution conditioning image on forward through decoder with return_lowres_cond_image flag
|
2022-07-01 09:35:39 -07:00 |
|
Phil Wang
|
a922a539de
|
bring back convtranspose2d upsampling, allow for nearest upsample with hyperparam, change kernel size of last conv to 1, make configurable, cleanup
|
2022-07-01 09:21:47 -07:00 |
|
Phil Wang
|
8f2466f1cd
|
blur sigma for upsampling training was 0.6 in the paper, make that the default value
|
2022-06-30 17:03:16 -07:00 |
|
Phil Wang
|
908ab83799
|
add skip connections for all intermediate resnet blocks, also add an extra resnet block for memory efficient version of unet, time condition for both initial resnet block and last one before output
|
2022-06-29 08:16:58 -07:00 |
|
Phil Wang
|
6a11b9678b
|
bring in the skip connection scaling factor, used by imagen in their unets, cite original paper using it
|
2022-06-26 21:59:55 -07:00 |
|