DALLE2-pytorch

aljaz/DALLE2-pytorch

Fork 0

mirror of https://github.com/lucidrains/DALLE2-pytorch.git synced 2025-12-19 01:34:19 +01:00

Commit Graph

Select branches

Hide Pull Requests

fix_resizing_in_test

main

#105

#106

#108

#109

#11

#112

#113

#116

#117

#118

#12

#121

#122

#123

#127

#130

#133

#139

#14

#140

#141

#142

#147

#148

#153

#159

#16

#160

#162

#165

#166

#17

#171

#172

#175

#179

#181

#183

#186

#191

#193

#194

#195

#197

#201

#202

#204

#205

#210

#211

#212

#215

#226

#234

#24

#242

#253

#254

#259

#260

#261

#262

#270

#292

#294

#306

#312

#312

#317

#317

#318

#318

#323

#323

#324

#324

#327

#327

#34

#37

#43

#51

#52

#53

#55

#56

#57

#62

#66

#67

#70

#75

#77

#79

#81

#83

#84

#85

#92

#94

#95

0.0.1

0.0.10

0.0.100

0.0.101

0.0.102

0.0.104

0.0.105

0.0.106

0.0.107

0.0.108

0.0.109

0.0.11

0.0.12

0.0.14

0.0.15

0.0.16

0.0.17

0.0.18

0.0.2

0.0.20

0.0.21

0.0.22

0.0.23

0.0.24

0.0.25

0.0.26

0.0.27

0.0.28

0.0.31

0.0.32

0.0.33

0.0.34

0.0.35

0.0.36

0.0.37

0.0.38

0.0.39

0.0.4

0.0.40

0.0.41

0.0.42

0.0.43

0.0.44

0.0.45

0.0.46

0.0.47

0.0.48

0.0.49

0.0.5

0.0.50

0.0.51

0.0.52

0.0.53

0.0.54

0.0.55

0.0.56

0.0.57

0.0.58

0.0.59

0.0.6

0.0.60

0.0.61

0.0.62

0.0.63

0.0.64

0.0.65

0.0.67

0.0.6a

0.0.7

0.0.70

0.0.71

0.0.72

0.0.73

0.0.74

0.0.75

0.0.76

0.0.77

0.0.78

0.0.79

0.0.8

0.0.80

0.0.81

0.0.82

0.0.84

0.0.85

0.0.86

0.0.87

0.0.88

0.0.89

0.0.9

0.0.90

0.0.91

0.0.92

0.0.93

0.0.94

0.0.95

0.0.96

0.0.97

0.0.98

0.0.99

0.1.0

0.1.1

0.1.10

0.1.2

0.1.4

0.1.5

0.1.6

0.1.7

0.1.7a

0.1.8

0.1.9

0.10.8

0.11.4

0.2.0

0.2.1

0.2.10

0.2.11

0.2.12

0.2.14

0.2.15

0.2.16

0.2.17

0.2.18

0.2.19

0.2.2

0.2.20

0.2.21

0.2.22

0.2.23

0.2.24

0.2.25

0.2.26

0.2.27

0.2.28

0.2.29

0.2.2a

0.2.3

0.2.30

0.2.31

0.2.32

0.2.33

0.2.34

0.2.35

0.2.37

0.2.38

0.2.39

0.2.4

0.2.40

0.2.41

0.2.42

0.2.43

0.2.44

0.2.46

0.2.5

0.2.6

0.2.6a

0.2.7

0.2.8

0.2.9

0.26.2

0.3.0

0.3.1

0.3.2

0.3.3

0.3.4

0.3.5

0.3.6

0.3.7

0.3.8

0.3.9

0.4.0

0.4.1

0.4.10

0.4.11

0.4.12

0.4.2

0.4.3

0.4.5

0.4.6

0.4.7

0.4.8

0.5.0a

0.6.13

02.36

1.0.0

1.0.1

1.0.3

1.0.4

1.0.5

1.0.6

1.1.0

1.10.0

1.10.1

1.10.2

1.10.3

1.10.4

1.10.5

1.10.6

1.10.7

1.10.9

1.11.0

1.11.1

1.11.2

1.11.4

1.12.0

1.12.1

1.12.2

1.12.3

1.12.4

1.14.0

1.14.1

1.14.2

1.14.3

1.15.1

1.15.2

1.15.3

1.15.4

1.15.5

1.15.6

1.2.0

1.2.1

1.2.2

1.4.0

1.4.1

1.4.2

1.4.3

1.4.4

1.4.5

1.4.6

1.5.0

1.6.0

1.6.1

1.6.2

1.6.3

1.6.4

1.6.5

1.7.0

1.8.0

1.8.1

1.8.2

1.8.3

1.8.4

1.9.0

v0.10.0

v0.10.1

v0.11.0

v0.11.1

v0.11.2

v0.11.3

v0.11.5

v0.12.0

v0.12.1

v0.12.2

v0.12.3

v0.12.4

v0.14.0

v0.14.1

v0.15.0

v0.15.1

v0.15.2

v0.15.3

v0.15.4

v0.16.0

v0.16.1

v0.16.10

v0.16.11

v0.16.12

v0.16.13

v0.16.14

v0.16.15

v0.16.16

v0.16.17

v0.16.18

v0.16.19

v0.16.2

v0.16.3

v0.16.5

v0.16.6

v0.16.7

v0.16.8

v0.16.9

v0.17.0

v0.17.1

v0.18.0

v0.19.0

v0.19.1

v0.19.2

v0.19.3

v0.19.4

v0.19.5

v0.19.6

v0.20.0

v0.20.1

v0.21.0

v0.21.1

v0.21.2

v0.21.3

v0.22.1

v0.22.2

v0.22.3

v0.23.0

v0.23.1

v0.23.10

v0.23.2

v0.23.3

v0.23.4

v0.23.5

v0.23.6

v0.23.7

v0.23.8

v0.23.9

v0.24.0

v0.24.1

v0.24.2

v0.24.3

v0.25.0

v0.25.1

v0.25.2

v0.26.0

v0.26.1

v0.4.14

v0.4.9

v0.5.0

v0.5.1

v0.5.2

v0.5.3

v0.5.4

v0.5.5

v0.5.6

v0.5.7

v0.6.0

v0.6.1

v0.6.10

v0.6.11

v0.6.12

v0.6.14

v0.6.15

v0.6.16

v0.6.2

v0.6.3

v0.6.4

v0.6.5

v0.6.6

v0.6.7

v0.6.8

v0.6.9

v0.7.0

v0.7.1

v0.8.0

v0.8.1

v0.9.0

v0.9.1

v0.9.2

c7ea8748db default decoder learning rate to what was in the paper 0.3.1 Phil Wang 2022-05-16 13:33:54 -07:00
13382885d9 final update to dalle2 repository for a while - sampling from prior in chunks automatically with max_batch_size keyword given 0.3.0 Phil Wang 2022-05-16 12:57:31 -07:00
c3d4a7ffe4 update working unconditional decoder example Phil Wang 2022-05-16 12:50:07 -07:00
164d9be444 use a decorator and take care of sampling in chunks (max_batch_size keyword), in case one is sampling a huge grid of images 0.2.46 Phil Wang 2022-05-16 12:34:28 -07:00
5562ec6be2 status updates Phil Wang 2022-05-16 12:01:54 -07:00
89ff04cfe2 final tweak to EMA class 0.2.44 Phil Wang 2022-05-16 11:54:34 -07:00
f4016f6302 allow for overriding use of EMA during sampling in decoder trainer with use_non_ema keyword, also fix some issues with automatic normalization of images and low res conditioning image if latent diffusion is in play 0.2.43 Phil Wang 2022-05-16 11:18:30 -07:00
1212f7058d allow text encodings and text mask to be passed in on forward and sampling for Decoder class 0.2.42 Phil Wang 2022-05-16 10:40:32 -07:00
9232b01ff6 allow text encodings and text mask to be passed in on forward and sampling for Decoder class 0.2.41 Phil Wang 2022-05-16 10:25:06 -07:00
dab106d4e5 back to no_grad for now, also keep track and restore unet devices in one_unet_in_gpu contextmanager 0.2.40 Phil Wang 2022-05-16 09:36:14 -07:00
bb151ca6b1 unet_number on decoder trainer only needs to be passed in if there is greater than 1 unet, so that unconditional training of a single ddpm is seamless (experiment in progress locally) 0.2.39 Phil Wang 2022-05-16 09:17:17 -07:00
4a59dea4cf Migrate to text-conditioned prior training (#95) zion 2022-05-15 20:16:38 -07:00
ecf9e8027d make sure classifier free guidance is used only if conditional dropout is present on the DiffusionPrior and Decoder classes. also make sure prior can have a different conditional scale than decoder 0.2.38 Phil Wang 2022-05-15 19:09:38 -07:00
36c5079bd7 LazyLinear is not mature, make users pass in text_embed_dim if text conditioning is turned on 0.2.37 Phil Wang 2022-05-15 18:56:52 -07:00
4a4c7ac9e6 cond drop prob for diffusion prior network should default to 0 02.36 Phil Wang 2022-05-15 18:47:37 -07:00
fad7481479 todo Phil Wang 2022-05-15 17:00:25 -07:00
123658d082 cite Ho et al, since cascading ddpm is now trainable Phil Wang 2022-05-15 16:56:53 -07:00
11d4e11f10 allow for training unconditional ddpm or cascading ddpms 0.2.35 Phil Wang 2022-05-15 16:54:56 -07:00
99778e12de trainer classes now takes care of auto-casting numpy to torch tensors, and setting correct device based on model parameter devices 0.2.34 Phil Wang 2022-05-15 15:25:45 -07:00
b22ccd9dd0 trainer classes now takes care of auto-casting numpy to torch tensors, and setting correct device based on model parameter devices 0.2.33 Phil Wang 2022-05-15 15:21:43 -07:00
0f0011caf0 todo Phil Wang 2022-05-15 14:28:35 -07:00
7b7a62044a use eval vs training mode to determine whether to call backprop on trainer forward 0.2.32 Phil Wang 2022-05-15 14:20:59 -07:00
156fe5ed9f final cleanup for the day Phil Wang 2022-05-15 12:38:33 -07:00
5ec34bebe1 cleanup readme Phil Wang 2022-05-15 12:29:26 -07:00
8eaacf1ac1 remove indirection Phil Wang 2022-05-15 12:05:45 -07:00
e66c7b0249 incorrect naming Phil Wang 2022-05-15 11:23:52 -07:00
f7cd4a0992 product management Phil Wang 2022-05-15 11:21:12 -07:00
68e7d2f241 make sure gradient accumulation feature works even if all arguments passed in are keyword arguments 0.2.31 Phil Wang 2022-05-15 11:16:16 -07:00
74f222596a remove todo Phil Wang 2022-05-15 11:01:35 -07:00
aa6772dcff make sure optimizer and scaler is reloaded on resume for training diffusion prior script, move argparse to click Phil Wang 2022-05-15 10:48:10 -07:00
71d0c4edae cleanup to use diffusion prior trainer Phil Wang 2022-05-15 10:16:05 -07:00
f7eee09d8b 0.2.30 0.2.30 Phil Wang 2022-05-15 09:56:59 -07:00
89de5af63e experiment tracker agnostic Phil Wang 2022-05-15 09:56:40 -07:00
4ec6d0ba81 backwards pass is not recommended under the autocast context, per pytorch docs 0.2.29 Phil Wang 2022-05-14 18:26:19 -07:00
9549bd43b7 backwards pass is not recommended under the autocast context, per pytorch docs 0.2.28 Phil Wang 2022-05-14 18:20:48 -07:00
aee92dba4a simplify more Phil Wang 2022-05-14 17:16:46 -07:00
f1739267e4 simplify more 0.2.27 Phil Wang 2022-05-14 17:13:13 -07:00
b0cd5f24b6 take care of gradient accumulation automatically for researchers, by passing in a max_batch_size on the decoder or diffusion prior trainer forward 0.2.26 Phil Wang 2022-05-14 17:04:09 -07:00
708638d3d9 take care of gradient accumulation automatically for researchers, by passing in a max_batch_size on the decoder or diffusion prior trainer forward 0.2.25 Phil Wang 2022-05-14 16:50:44 -07:00
b494ed81d4 take care of backwards within trainer classes for diffusion prior and decoder, readying to take care of gradient accumulation as well (plus, unsure if loss should be backwards within autocast block) 0.2.24 Phil Wang 2022-05-14 15:49:24 -07:00
ff3474f05c normalize conditioning tokens outside of cross attention blocks 0.2.23 Phil Wang 2022-05-14 14:23:52 -07:00
d5293f19f1 lineup with paper 0.2.22 Phil Wang 2022-05-14 13:57:00 -07:00
e697183849 be able to customize adam eps 0.2.21 Phil Wang 2022-05-14 13:55:04 -07:00
591d37e266 lower default initial learning rate to what Jonathan Ho had in his original repo 0.2.20 Phil Wang 2022-05-14 13:22:43 -07:00
d1f02e8f49 always use sandwich norm for attention layer 0.2.19 Phil Wang 2022-05-14 12:13:41 -07:00
9faab59b23 use post-attn-branch layernorm in attempt to stabilize cross attention conditioning in decoder 0.2.18 Phil Wang 2022-05-14 11:58:09 -07:00
5d27029e98 make sure lowres conditioning image is properly normalized to -1 to 1 for cascading ddpm 0.2.17 Phil Wang 2022-05-14 01:23:54 -07:00
3115fa17b3 fix everything around normalizing images to -1 to 1 for ddpm training automatically 0.2.16 Phil Wang 2022-05-14 01:17:11 -07:00
124d8577c8 move the inverse normalization function called before image embeddings are derived from clip to within the diffusion prior and decoder classes 0.2.15 Phil Wang 2022-05-14 00:37:10 -07:00
2db0c9794c comments Phil Wang 2022-05-12 14:25:20 -07:00
2277b47ffd make sure learned variance can work for any number of unets in the decoder, defaults to first unet, as suggested was used in the paper 0.2.14 Phil Wang 2022-05-12 14:18:15 -07:00
28b58e568c cleanup in preparation of option for learned variance Phil Wang 2022-05-12 12:04:52 -07:00
924455d97d align the ema model device back after sampling from the cascading ddpm in the decoder 0.2.12 Phil Wang 2022-05-11 19:56:54 -07:00
6021945fc8 default to l2 loss 0.2.11 Phil Wang 2022-05-11 19:24:41 -07:00
6f76652d11 fix typo in README.md (#85) Light-V 2022-05-12 04:38:16 +08:00
3dda2570ed fix amp issue for https://github.com/lucidrains/DALLE2-pytorch/issues/82 0.2.10 Phil Wang 2022-05-11 08:21:39 -07:00
2f3c02dba8 numerical accuracy for noise schedule parameters 0.2.9 Phil Wang 2022-05-10 15:28:46 -07:00
908088cfea wrap up cross embed layer feature 0.2.8 Phil Wang 2022-05-10 12:19:34 -07:00
8dc8a3de0d product management Phil Wang 2022-05-10 11:51:38 -07:00
35f89556ba bring in the cross embed layer from Crossformer paper for initial convolution in unet 0.2.7 Phil Wang 2022-05-10 11:50:38 -07:00
2b55f753b9 fix new issue with github actions and auto pypi package uploading 0.2.6a Phil Wang 2022-05-10 10:51:15 -07:00
fc8fce38fb make sure cascading DDPM can be trained unconditionally, to ready for CLI one command training for the public 0.2.6 Phil Wang 2022-05-10 10:48:10 -07:00
a1bfb03ba4 project management Phil Wang 2022-05-10 10:13:51 -07:00
b1e7b5f6bb make sure resnet groups in unet is finely customizable 0.2.5 Phil Wang 2022-05-10 10:12:37 -07:00
10b905b445 smol typo (#81) z 2022-05-10 09:52:50 -07:00
9b322ea634 patch 0.2.4 Phil Wang 2022-05-09 19:46:19 -07:00
ba64ea45cc 0.2.3 0.2.3 Phil Wang 2022-05-09 16:50:31 -07:00
64f7be1926 some cleanup Phil Wang 2022-05-09 16:50:21 -07:00
db805e73e1 fix a bug with numerical stability in attention, sorry! 🐛 0.2.2a Phil Wang 2022-05-09 16:23:12 -07:00
cb07b37970 Ensure Eval Mode In Metric Functions (#79) 0.2.2 z 2022-05-09 16:05:40 -07:00
a774bfefe2 add attention and feedforward dropouts to train_diffusion_prior script Phil Wang 2022-05-09 13:57:15 -07:00
2ae57f0cf5 cleanup Phil Wang 2022-05-09 13:51:26 -07:00
e46eaec817 deal the diffusion prior problem yet another blow 0.2.1 Phil Wang 2022-05-09 11:08:46 -07:00
8647cb5e76 Val loss changes, with quite a few other changes. This is in place of the earlier PR(https://github.com/lucidrains/DALLE2-pytorch/pull/67) (#77) Kumar R 2022-05-09 21:23:29 +05:30
53c189e46a give more surface area for attention in diffusion prior 0.2.0 Phil Wang 2022-05-09 08:08:11 -07:00
dde51fd362 revert restriction for classifier free guidance for diffusion prior, given @crowsonkb advice 0.1.10 Phil Wang 2022-05-07 20:55:31 -07:00
2eac7996fa Additional image_embed metric (#75) Nasir Khalid 2022-05-07 17:32:33 -04:00
4010aec033 turn off classifier free guidance if predicting x_start for diffusion prior 0.1.9 Phil Wang 2022-05-07 09:38:17 -07:00
c87b84a259 todo Phil Wang 2022-05-07 09:21:08 -07:00
8b05468653 todo Phil Wang 2022-05-07 08:33:45 -07:00
830afd3c15 sinusoidal embed time embeddings for diffusion prior as well, for continuous version 0.1.8 Phil Wang 2022-05-07 08:32:26 -07:00
8f93729d19 when in doubt, make it a hyperparameter 0.1.7a Phil Wang 2022-05-07 07:52:02 -07:00
cd5f2c1de4 simulate unrelated captions as a training metric (#66) 0.1.7 z 2022-05-07 05:34:59 -07:00
85ed77d512 fix a potentially huge bug thanks to @CiaoHe https://github.com/lucidrains/DALLE2-pytorch/issues/71 0.1.6 Phil Wang 2022-05-07 05:05:46 -07:00
fd53fa17db Fix a typo in README (#70) Piero Rolando 2022-05-06 18:53:36 -05:00
3676ef4d49 make sure vqgan-vae trainer supports mixed precision 0.1.5 Phil Wang 2022-05-06 10:44:16 -07:00
28e944f328 make sure openai clip adapter outputs l2normed embeddings 0.1.4 Phil Wang 2022-05-06 10:12:03 -07:00
14e63a3f67 also offer l2norm clamping in diffusion prior during training, if one were using predict x0 objective 0.1.2 Phil Wang 2022-05-06 10:05:14 -07:00
09e9eaa5a6 project management Phil Wang 2022-05-06 09:00:22 -07:00
e6d752cf4a reprioritize Phil Wang 2022-05-06 08:55:26 -07:00
ad20a14a4d bring in rotary embeddings for diffusion prior causal transformer (the most powerful relative positional encoding, used in PaLM) - 0.1.0 because of breaking change 0.1.1 Phil Wang 2022-05-06 08:45:30 -07:00
0a65a86d03 bring in rotary embeddings for diffusion prior causal transformer (the most powerful relative positional encoding, used in PaLM) - 0.1.0 because of breaking change 0.1.0 Phil Wang 2022-05-06 08:44:28 -07:00
0be1e0d64c support CoCa, which seems to be better than CLIP (has an autoregressive text encoder) https://arxiv.org/abs/2205.01917 0.0.109 Phil Wang 2022-05-06 08:27:12 -07:00
98df1ba51e add diffusion prior trainer, which automatically takes care of the exponential moving average (training and sampling), as well as mixed precision, gradient clipping Phil Wang 2022-05-06 08:11:09 -07:00
740d644050 add diffusion prior trainer, which automatically takes care of the exponential moving average (training and sampling), as well as mixed precision, gradient clipping 0.0.108 Phil Wang 2022-05-06 08:06:28 -07:00
878b555ef7 fix training with clip 0.0.107 Phil Wang 2022-05-06 07:37:57 -07:00
63029f7388 remove l2norm output from train_diffusion_prior.py Phil Wang 2022-05-05 19:07:58 -07:00
c76a964fd6 allow for CLIP to be optional in Decoder, and allow DecoderTrainer to work off training pre-encoded image embeddings 0.0.106 Phil Wang 2022-05-05 08:11:01 -07:00
79fabc4341 reorg readme Phil Wang 2022-05-05 07:54:12 -07:00
f7ef4bde38 Added some documentation for the diffusion prior in README.md (#62) Kumar R 2022-05-05 20:21:31 +05:30