Phil Wang
|
e76e89f9eb
|
remove text masking altogether in favor of deriving from text encodings (padded text encodings must be pad value of 0.)
|
2022-07-12 15:40:31 -07:00 |
|
Phil Wang
|
bb3ff0ac67
|
protect against bad text mask being passed into decoder
|
2022-07-12 15:33:13 -07:00 |
|
Phil Wang
|
1ec4dbe64f
|
one more fix for text mask, if the length of the text encoding exceeds max_text_len, add an assert for better error msg
|
2022-07-12 15:01:46 -07:00 |
|
Phil Wang
|
e0835acca9
|
generate text mask within the unet and diffusion prior itself from the text encodings, if not given
|
2022-07-12 12:54:59 -07:00 |
|
Phil Wang
|
1d9ef99288
|
add PixelShuffleUpsample thanks to @MalumaDev and @marunine for running the experiment and verifyng absence of checkboard artifacts
|
2022-07-11 16:07:23 -07:00 |
|
Phil Wang
|
bdd62c24b3
|
zero init final projection in unet, since openai and @crowsonkb are both doing it
|
2022-07-11 13:22:06 -07:00 |
|
Phil Wang
|
1f1557c614
|
make it so even if text mask is omitted, it will be derived based on whether text encodings are all 0s or not, simplify dataloading
|
2022-07-11 10:56:19 -07:00 |
|
Phil Wang
|
7ea314e2f0
|
allow for final l2norm clamping of the sampled image embed
|
2022-07-10 09:44:38 -07:00 |
|
Phil Wang
|
3dae43fa0e
|
fix misnamed variable, thanks to @nousr
|
2022-07-09 19:01:37 -07:00 |
|
Phil Wang
|
a598820012
|
do not noise for the last step in ddim
|
2022-07-09 18:38:40 -07:00 |
|
Phil Wang
|
4878762627
|
fix for small validation bug for sampling steps
|
2022-07-09 17:31:54 -07:00 |
|
Phil Wang
|
47ae17b36e
|
more informative error for something that tripped me up
|
2022-07-09 17:28:14 -07:00 |
|
Phil Wang
|
b7e22f7da0
|
complete ddim integration of diffusion prior as well as decoder for each unet, feature complete for https://github.com/lucidrains/DALLE2-pytorch/issues/157
|
2022-07-09 17:25:34 -07:00 |
|
Phil Wang
|
097afda606
|
0.18.0
|
2022-07-08 18:18:38 -07:00 |
|
Phil Wang
|
3070610231
|
just force it so researcher can never pass in an image that is less than the size that is required for CLIP or CoCa
|
2022-07-08 18:17:29 -07:00 |
|
Phil Wang
|
081d8d3484
|
0.17.0
|
2022-07-08 13:36:26 -07:00 |
|
Phil Wang
|
d7bc5fbedd
|
expose num_steps_taken helper method on trainer to retrieve number of training steps of each unet
|
2022-07-08 13:00:56 -07:00 |
|
Phil Wang
|
8c823affff
|
allow for control over use of nearest interp method of downsampling low res conditioning, in addition to being able to turn it off
|
2022-07-08 11:44:43 -07:00 |
|
Phil Wang
|
ec7cab01d9
|
extra insurance that diffusion prior is on the correct device, when using trainer with accelerator or device was given
|
2022-07-07 10:08:33 -07:00 |
|
Phil Wang
|
46be8c32d3
|
fix a potential issue in the low resolution conditioner, when downsampling and then upsampling using resize right, thanks to @marunine
|
2022-07-07 09:41:49 -07:00 |
|
Phil Wang
|
900f086a6d
|
fix condition_on_text_encodings in dalle2 orchestrator class, fix readme
|
2022-07-07 07:43:41 -07:00 |
|
Phil Wang
|
6a59c7093d
|
more shots in the dark regarding fp16 with learned variance for deepspeed issue
|
2022-07-06 19:05:50 -07:00 |
|
Phil Wang
|
a6cdbe0b9c
|
relax learning rate constraint, as @rom1504 wants to try a higher one
|
2022-07-06 18:09:11 -07:00 |
|
Phil Wang
|
e928ae5c34
|
default the device to the device that the diffusion prior parameters are on, if the trainer was never given the accelerator nor device
|
2022-07-06 12:47:48 -07:00 |
|
Phil Wang
|
1bd8a7835a
|
attempting to fix issue with deepspeed fp16 seeing overflowing gradient
|
2022-07-06 08:27:34 -07:00 |
|
Phil Wang
|
f33453df9f
|
debugging with Aidan
|
2022-07-05 18:22:43 -07:00 |
|
Phil Wang
|
1e4bb2bafb
|
cast long as float before deriving sinusoidal pos emb
|
2022-07-05 18:01:22 -07:00 |
|
Phil Wang
|
ee75515c7d
|
remove forcing of softmax in f32, in case it is interfering with deepspeed
|
2022-07-05 16:53:58 -07:00 |
|
Phil Wang
|
ec68243479
|
set ability to do warmup steps for each unet during training
|
2022-07-05 16:24:16 -07:00 |
|
Phil Wang
|
3afdcdfe86
|
need to keep track of training steps separately for each unet in decoder trainer
|
2022-07-05 15:17:59 -07:00 |
|
Phil Wang
|
b9a908ff75
|
bring in two tricks from the cogview paper for reducing the chances of overflow, for attention and layernorm
|
2022-07-05 14:27:04 -07:00 |
|
Phil Wang
|
e1fe3089df
|
do bias-less layernorm manually
|
2022-07-05 13:09:58 -07:00 |
|
Phil Wang
|
ec5a77fc55
|
0.15.4
|
2022-07-02 08:56:34 -07:00 |
|
Phil Wang
|
3d23ba4aa5
|
add ability to specify full self attention on specific stages in the unet
|
2022-07-01 10:22:07 -07:00 |
|
Phil Wang
|
282c35930f
|
0.15.2
|
2022-07-01 09:40:11 -07:00 |
|
Phil Wang
|
7b0edf9e42
|
allow for returning low resolution conditioning image on forward through decoder with return_lowres_cond_image flag
|
2022-07-01 09:35:39 -07:00 |
|
Phil Wang
|
a922a539de
|
bring back convtranspose2d upsampling, allow for nearest upsample with hyperparam, change kernel size of last conv to 1, make configurable, cleanup
|
2022-07-01 09:21:47 -07:00 |
|
Phil Wang
|
8f2466f1cd
|
blur sigma for upsampling training was 0.6 in the paper, make that the default value
|
2022-06-30 17:03:16 -07:00 |
|
Phil Wang
|
908ab83799
|
add skip connections for all intermediate resnet blocks, also add an extra resnet block for memory efficient version of unet, time condition for both initial resnet block and last one before output
|
2022-06-29 08:16:58 -07:00 |
|
Phil Wang
|
46a2558d53
|
bug in pydantic decoder config class
|
2022-06-29 07:17:35 -07:00 |
|
Phil Wang
|
6a11b9678b
|
bring in the skip connection scaling factor, used by imagen in their unets, cite original paper using it
|
2022-06-26 21:59:55 -07:00 |
|
Phil Wang
|
b90364695d
|
fix remaining issues with deriving cond_on_text_encodings from child unet settings
|
2022-06-26 21:07:42 -07:00 |
|
Phil Wang
|
032e83b0e0
|
nevermind, do not enforce text encodings on first unet
|
2022-06-26 12:45:05 -07:00 |
|
Phil Wang
|
2e85e736f3
|
remove unnecessary decoder setting, and if not unconditional, always make sure the first unet is condition-able on text
|
2022-06-26 12:32:17 -07:00 |
|
Phil Wang
|
4b994601ae
|
just make sure decoder learning rate is reasonable and help out budding researchers
|
2022-06-23 11:29:28 -07:00 |
|
Phil Wang
|
c8422ffd5d
|
fix EMA updating buffers with non-float tensors
|
2022-06-22 07:16:39 -07:00 |
|
Phil Wang
|
0021535c26
|
move ema to external repo
|
2022-06-20 11:48:32 -07:00 |
|
Phil Wang
|
f545ce18f4
|
be able to turn off p2 loss reweighting for upsamplers
|
2022-06-20 09:43:31 -07:00 |
|
Phil Wang
|
fc7abf624d
|
in paper, blur sigma was 0.6
|
2022-06-20 09:05:08 -07:00 |
|
Phil Wang
|
138079ca83
|
allow for setting beta schedules of unets differently in the decoder, as what was used in the paper was cosine, cosine, linear
|
2022-06-20 08:56:37 -07:00 |
|