Phil Wang
0b40cbaa54
just always use nearest neighbor interpolation when resizing for low resolution conditioning, for https://github.com/lucidrains/DALLE2-pytorch/pull/181
2022-07-13 20:59:43 -07:00
Phil Wang
f141144a6d
allow for using classifier free guidance for some unets but not others, by passing in a tuple of cond_scale during sampling for decoder, just in case it is causing issues for upsamplers
2022-07-13 13:12:30 -07:00
Phil Wang
f988207718
hack around some inplace error, also make sure for openai clip text encoding, only tokens after eos_id is masked out
2022-07-13 12:56:02 -07:00
Phil Wang
cc0f7a935c
fix non pixel shuffle upsample
2022-07-13 10:16:02 -07:00
Phil Wang
95a512cb65
fix a potential bug with conditioning with blurred low resolution image, blur should be applied only 50% of the time
2022-07-13 10:11:49 -07:00
Phil Wang
972ee973bc
fix issue with ddim and normalization of lowres conditioning image
2022-07-13 09:48:40 -07:00
Phil Wang
79e2a3bc77
only use the stable layernorm for final output norm in transformer
2022-07-13 07:56:30 -07:00
Phil Wang
349aaca56f
add yet another transformer stability measure
2022-07-12 17:49:16 -07:00
Phil Wang
3ee3c56d2a
add learned padding tokens, same strategy as dalle1, for diffusion prior, and get rid of masking in causal transformer
2022-07-12 17:33:14 -07:00
Phil Wang
775abc4df6
add setting to attend to all text encodings regardless of padding, for diffusion prior
2022-07-12 17:08:12 -07:00
Phil Wang
11b1d533a0
make sure text encodings being passed in has the correct batch dimension
2022-07-12 16:00:19 -07:00
Phil Wang
e76e89f9eb
remove text masking altogether in favor of deriving from text encodings (padded text encodings must be pad value of 0.)
2022-07-12 15:40:31 -07:00
Phil Wang
bb3ff0ac67
protect against bad text mask being passed into decoder
2022-07-12 15:33:13 -07:00
Phil Wang
1ec4dbe64f
one more fix for text mask, if the length of the text encoding exceeds max_text_len, add an assert for better error msg
2022-07-12 15:01:46 -07:00
Phil Wang
e0835acca9
generate text mask within the unet and diffusion prior itself from the text encodings, if not given
2022-07-12 12:54:59 -07:00
Phil Wang
1d9ef99288
add PixelShuffleUpsample thanks to @MalumaDev and @marunine for running the experiment and verifyng absence of checkboard artifacts
2022-07-11 16:07:23 -07:00
Phil Wang
bdd62c24b3
zero init final projection in unet, since openai and @crowsonkb are both doing it
2022-07-11 13:22:06 -07:00
Phil Wang
1f1557c614
make it so even if text mask is omitted, it will be derived based on whether text encodings are all 0s or not, simplify dataloading
2022-07-11 10:56:19 -07:00
Phil Wang
7ea314e2f0
allow for final l2norm clamping of the sampled image embed
2022-07-10 09:44:38 -07:00
Phil Wang
3dae43fa0e
fix misnamed variable, thanks to @nousr
2022-07-09 19:01:37 -07:00
Phil Wang
a598820012
do not noise for the last step in ddim
2022-07-09 18:38:40 -07:00
Phil Wang
4878762627
fix for small validation bug for sampling steps
2022-07-09 17:31:54 -07:00
Phil Wang
47ae17b36e
more informative error for something that tripped me up
2022-07-09 17:28:14 -07:00
Phil Wang
b7e22f7da0
complete ddim integration of diffusion prior as well as decoder for each unet, feature complete for https://github.com/lucidrains/DALLE2-pytorch/issues/157
2022-07-09 17:25:34 -07:00
Phil Wang
3070610231
just force it so researcher can never pass in an image that is less than the size that is required for CLIP or CoCa
2022-07-08 18:17:29 -07:00
Phil Wang
8c823affff
allow for control over use of nearest interp method of downsampling low res conditioning, in addition to being able to turn it off
2022-07-08 11:44:43 -07:00
Phil Wang
46be8c32d3
fix a potential issue in the low resolution conditioner, when downsampling and then upsampling using resize right, thanks to @marunine
2022-07-07 09:41:49 -07:00
Phil Wang
900f086a6d
fix condition_on_text_encodings in dalle2 orchestrator class, fix readme
2022-07-07 07:43:41 -07:00
Phil Wang
6a59c7093d
more shots in the dark regarding fp16 with learned variance for deepspeed issue
2022-07-06 19:05:50 -07:00
Phil Wang
1bd8a7835a
attempting to fix issue with deepspeed fp16 seeing overflowing gradient
2022-07-06 08:27:34 -07:00
Phil Wang
f33453df9f
debugging with Aidan
2022-07-05 18:22:43 -07:00
Phil Wang
1e4bb2bafb
cast long as float before deriving sinusoidal pos emb
2022-07-05 18:01:22 -07:00
Phil Wang
ee75515c7d
remove forcing of softmax in f32, in case it is interfering with deepspeed
2022-07-05 16:53:58 -07:00
Phil Wang
b9a908ff75
bring in two tricks from the cogview paper for reducing the chances of overflow, for attention and layernorm
2022-07-05 14:27:04 -07:00
Phil Wang
e1fe3089df
do bias-less layernorm manually
2022-07-05 13:09:58 -07:00
Phil Wang
3d23ba4aa5
add ability to specify full self attention on specific stages in the unet
2022-07-01 10:22:07 -07:00
Phil Wang
7b0edf9e42
allow for returning low resolution conditioning image on forward through decoder with return_lowres_cond_image flag
2022-07-01 09:35:39 -07:00
Phil Wang
a922a539de
bring back convtranspose2d upsampling, allow for nearest upsample with hyperparam, change kernel size of last conv to 1, make configurable, cleanup
2022-07-01 09:21:47 -07:00
Phil Wang
8f2466f1cd
blur sigma for upsampling training was 0.6 in the paper, make that the default value
2022-06-30 17:03:16 -07:00
Phil Wang
908ab83799
add skip connections for all intermediate resnet blocks, also add an extra resnet block for memory efficient version of unet, time condition for both initial resnet block and last one before output
2022-06-29 08:16:58 -07:00
Phil Wang
6a11b9678b
bring in the skip connection scaling factor, used by imagen in their unets, cite original paper using it
2022-06-26 21:59:55 -07:00
Phil Wang
b90364695d
fix remaining issues with deriving cond_on_text_encodings from child unet settings
2022-06-26 21:07:42 -07:00
zion
868c001199
bug fixes for text conditioning update ( #175 )
2022-06-26 16:12:32 -07:00
Phil Wang
032e83b0e0
nevermind, do not enforce text encodings on first unet
2022-06-26 12:45:05 -07:00
Phil Wang
2e85e736f3
remove unnecessary decoder setting, and if not unconditional, always make sure the first unet is condition-able on text
2022-06-26 12:32:17 -07:00
zion
c453f468b1
autoswitch tqdm for notebooks ( #171 )
...
avoids printing the `tqdm` progress bar to a newline in notebooks when detected
2022-06-25 16:37:06 -07:00
Phil Wang
f545ce18f4
be able to turn off p2 loss reweighting for upsamplers
2022-06-20 09:43:31 -07:00
Phil Wang
fc7abf624d
in paper, blur sigma was 0.6
2022-06-20 09:05:08 -07:00
Phil Wang
138079ca83
allow for setting beta schedules of unets differently in the decoder, as what was used in the paper was cosine, cosine, linear
2022-06-20 08:56:37 -07:00
Aidan Dempster
58892135d9
Distributed Training of the Decoder ( #121 )
...
* Converted decoder trainer to use accelerate
* Fixed issue where metric evaluation would hang on distributed mode
* Implemented functional saving
Loading still fails due to some issue with the optimizer
* Fixed issue with loading decoders
* Fixed issue with tracker config
* Fixed issue with amp
Updated logging to be more logical
* Saving checkpoint now saves position in training as well
Fixed an issue with running out of gpu space due to loading weights into the gpu twice
* Fixed ema for distributed training
* Fixed isue where get_pkg_version was reintroduced
* Changed decoder trainer to upload config as a file
Fixed issue where loading best would error
2022-06-19 09:25:54 -07:00