DALLE2-pytorch

aljaz/DALLE2-pytorch

Fork 0

mirror of https://github.com/lucidrains/DALLE2-pytorch.git synced 2025-12-19 01:34:19 +01:00

Commit Graph

Select branches

Hide Pull Requests

fix_resizing_in_test

main

#105

#106

#108

#109

#11

#112

#113

#116

#117

#118

#12

#121

#122

#123

#127

#130

#133

#139

#14

#140

#141

#142

#147

#148

#153

#159

#16

#160

#162

#165

#166

#17

#171

#172

#175

#179

#181

#183

#186

#191

#193

#194

#195

#197

#201

#202

#204

#205

#210

#211

#212

#215

#226

#234

#24

#242

#253

#254

#259

#260

#261

#262

#270

#292

#294

#306

#312

#312

#317

#317

#318

#318

#323

#323

#324

#324

#327

#327

#34

#37

#43

#51

#52

#53

#55

#56

#57

#62

#66

#67

#70

#75

#77

#79

#81

#83

#84

#85

#92

#94

#95

0.0.1

0.0.10

0.0.100

0.0.101

0.0.102

0.0.104

0.0.105

0.0.106

0.0.107

0.0.108

0.0.109

0.0.11

0.0.12

0.0.14

0.0.15

0.0.16

0.0.17

0.0.18

0.0.2

0.0.20

0.0.21

0.0.22

0.0.23

0.0.24

0.0.25

0.0.26

0.0.27

0.0.28

0.0.31

0.0.32

0.0.33

0.0.34

0.0.35

0.0.36

0.0.37

0.0.38

0.0.39

0.0.4

0.0.40

0.0.41

0.0.42

0.0.43

0.0.44

0.0.45

0.0.46

0.0.47

0.0.48

0.0.49

0.0.5

0.0.50

0.0.51

0.0.52

0.0.53

0.0.54

0.0.55

0.0.56

0.0.57

0.0.58

0.0.59

0.0.6

0.0.60

0.0.61

0.0.62

0.0.63

0.0.64

0.0.65

0.0.67

0.0.6a

0.0.7

0.0.70

0.0.71

0.0.72

0.0.73

0.0.74

0.0.75

0.0.76

0.0.77

0.0.78

0.0.79

0.0.8

0.0.80

0.0.81

0.0.82

0.0.84

0.0.85

0.0.86

0.0.87

0.0.88

0.0.89

0.0.9

0.0.90

0.0.91

0.0.92

0.0.93

0.0.94

0.0.95

0.0.96

0.0.97

0.0.98

0.0.99

0.1.0

0.1.1

0.1.10

0.1.2

0.1.4

0.1.5

0.1.6

0.1.7

0.1.7a

0.1.8

0.1.9

0.10.8

0.11.4

0.2.0

0.2.1

0.2.10

0.2.11

0.2.12

0.2.14

0.2.15

0.2.16

0.2.17

0.2.18

0.2.19

0.2.2

0.2.20

0.2.21

0.2.22

0.2.23

0.2.24

0.2.25

0.2.26

0.2.27

0.2.28

0.2.29

0.2.2a

0.2.3

0.2.30

0.2.31

0.2.32

0.2.33

0.2.34

0.2.35

0.2.37

0.2.38

0.2.39

0.2.4

0.2.40

0.2.41

0.2.42

0.2.43

0.2.44

0.2.46

0.2.5

0.2.6

0.2.6a

0.2.7

0.2.8

0.2.9

0.26.2

0.3.0

0.3.1

0.3.2

0.3.3

0.3.4

0.3.5

0.3.6

0.3.7

0.3.8

0.3.9

0.4.0

0.4.1

0.4.10

0.4.11

0.4.12

0.4.2

0.4.3

0.4.5

0.4.6

0.4.7

0.4.8

0.5.0a

0.6.13

02.36

1.0.0

1.0.1

1.0.3

1.0.4

1.0.5

1.0.6

1.1.0

1.10.0

1.10.1

1.10.2

1.10.3

1.10.4

1.10.5

1.10.6

1.10.7

1.10.9

1.11.0

1.11.1

1.11.2

1.11.4

1.12.0

1.12.1

1.12.2

1.12.3

1.12.4

1.14.0

1.14.1

1.14.2

1.14.3

1.15.1

1.15.2

1.15.3

1.15.4

1.15.5

1.15.6

1.2.0

1.2.1

1.2.2

1.4.0

1.4.1

1.4.2

1.4.3

1.4.4

1.4.5

1.4.6

1.5.0

1.6.0

1.6.1

1.6.2

1.6.3

1.6.4

1.6.5

1.7.0

1.8.0

1.8.1

1.8.2

1.8.3

1.8.4

1.9.0

v0.10.0

v0.10.1

v0.11.0

v0.11.1

v0.11.2

v0.11.3

v0.11.5

v0.12.0

v0.12.1

v0.12.2

v0.12.3

v0.12.4

v0.14.0

v0.14.1

v0.15.0

v0.15.1

v0.15.2

v0.15.3

v0.15.4

v0.16.0

v0.16.1

v0.16.10

v0.16.11

v0.16.12

v0.16.13

v0.16.14

v0.16.15

v0.16.16

v0.16.17

v0.16.18

v0.16.19

v0.16.2

v0.16.3

v0.16.5

v0.16.6

v0.16.7

v0.16.8

v0.16.9

v0.17.0

v0.17.1

v0.18.0

v0.19.0

v0.19.1

v0.19.2

v0.19.3

v0.19.4

v0.19.5

v0.19.6

v0.20.0

v0.20.1

v0.21.0

v0.21.1

v0.21.2

v0.21.3

v0.22.1

v0.22.2

v0.22.3

v0.23.0

v0.23.1

v0.23.10

v0.23.2

v0.23.3

v0.23.4

v0.23.5

v0.23.6

v0.23.7

v0.23.8

v0.23.9

v0.24.0

v0.24.1

v0.24.2

v0.24.3

v0.25.0

v0.25.1

v0.25.2

v0.26.0

v0.26.1

v0.4.14

v0.4.9

v0.5.0

v0.5.1

v0.5.2

v0.5.3

v0.5.4

v0.5.5

v0.5.6

v0.5.7

v0.6.0

v0.6.1

v0.6.10

v0.6.11

v0.6.12

v0.6.14

v0.6.15

v0.6.16

v0.6.2

v0.6.3

v0.6.4

v0.6.5

v0.6.6

v0.6.7

v0.6.8

v0.6.9

v0.7.0

v0.7.1

v0.8.0

v0.8.1

v0.9.0

v0.9.1

v0.9.2

93ba019069 product management Phil Wang 2022-05-05 07:39:51 -07:00
8518684ae9 does not make much sense, as researchers may want to try predicting noise with diffusionprior instead of predicting x0 0.0.105 Phil Wang 2022-05-05 07:37:00 -07:00
1d5dc08810 take @crowsonkb 's suggestion at https://github.com/lucidrains/DALLE2-pytorch/issues/60#issue-1226116132 0.0.104 Phil Wang 2022-05-05 07:28:53 -07:00
d8d8b6caf1 dataloaders for decoder training, from @Veldrovive 0.0.102 Phil Wang 2022-05-05 07:09:45 -07:00
15acc03bd4 Add a dataloader for training the decoder (#57) Aidan Dempster 2022-05-05 10:08:45 -04:00
896f19786d remove convnext blocks, they are illsuited for generative work, validated by early experimental results at https://github.com/lucidrains/video-diffusion-pytorch 0.0.101 Phil Wang 2022-05-05 07:07:21 -07:00
aec5575d09 take a bet on resize right, given Katherine is using it 0.0.100 Phil Wang 2022-05-04 19:26:45 -07:00
9773f10d6c use inference mode whenever possible, cleanup 0.0.99 Phil Wang 2022-05-04 15:24:57 -07:00
a6bf8ddef6 advertise laion Phil Wang 2022-05-04 15:04:05 -07:00
86e692d24f fix random crop probability 0.0.98 Phil Wang 2022-05-04 11:52:24 -07:00
97b751209f allow for last unet in the cascade to be trained on crops, if it is convolution-only 0.0.97 Phil Wang 2022-05-04 11:48:41 -07:00
74103fd8d6 product management Phil Wang 2022-05-04 11:20:50 -07:00
1992d25cad project management 0.0.96 Phil Wang 2022-05-04 11:18:54 -07:00
5b619c2fd5 make sure some hyperparameters for unet block is configurable Phil Wang 2022-05-04 11:18:32 -07:00
9359ad2e91 0.0.95 0.0.95 Phil Wang 2022-05-04 10:53:05 -07:00
9ff228188b offer old resnet blocks, from the original DDPM paper, just in case convnexts are unsuitable for generative work Phil Wang 2022-05-04 10:52:47 -07:00
2d9963d30e Reporting metrics - Cosine similarity. (#55) Kumar R 2022-05-04 20:34:36 +05:30
58d9b422f3 0.0.94 0.0.94 Phil Wang 2022-05-04 07:42:33 -07:00
44b319cb57 add missing import (#56) Ray Bell 2022-05-04 10:42:20 -04:00
c30f380689 final reminder Phil Wang 2022-05-03 08:18:53 -07:00
e4e884bb8b keep all doors open Phil Wang 2022-05-03 08:17:02 -07:00
803ad9c17d product management again Phil Wang 2022-05-03 08:15:25 -07:00
a88dd6a9c0 todo Phil Wang 2022-05-03 08:09:02 -07:00
72c16b496e Update train_diffusion_prior.py (#53) Kumar R 2022-05-03 11:14:57 +05:30
81d83dd7f2 defaults align with paper (#52) z 2022-05-02 13:52:11 -07:00
fa66f7e1e9 todo Phil Wang 2022-05-02 12:57:15 -07:00
aa8d135245 allow laion to experiment with normformer in diffusion prior Phil Wang 2022-05-02 11:35:00 -07:00
70282de23b add ability to turn on normformer settings, given @borisdayma reported good results and some personal anecdata 0.0.93 Phil Wang 2022-05-02 11:33:15 -07:00
83f761847e todo Phil Wang 2022-05-02 10:52:39 -07:00
11469dc0c6 makes more sense to keep this as True as default, for stability 0.0.92 Phil Wang 2022-05-02 10:50:55 -07:00
2d25c89f35 Fix passing of l2norm_output to DiffusionPriorNetwork (#51) Romain Beaumont 2022-05-02 19:48:16 +02:00
3fe96c208a add ability to train diffusion prior with l2norm on output image embed Phil Wang 2022-05-02 09:53:20 -07:00
0fc6c9cdf3 provide option to l2norm the output of the diffusion prior 0.0.91 Phil Wang 2022-05-02 09:41:03 -07:00
7ee0ecc388 mixed precision for training diffusion prior + save optimizer and scaler states Phil Wang 2022-05-02 09:31:04 -07:00
1924c7cc3d fix issue with mixed precision and gradient clipping 0.0.90 Phil Wang 2022-05-02 09:20:19 -07:00
f7df3caaf3 address not calculating average eval / test loss when training diffusion prior https://github.com/lucidrains/DALLE2-pytorch/issues/49 Phil Wang 2022-05-02 08:51:41 -07:00
fc954ee788 fix calculation of adaptive weight for vit-vqgan, thanks to @CiaoHe 0.0.89 Phil Wang 2022-05-02 07:57:28 -07:00
c1db2753f5 todo Phil Wang 2022-05-01 18:02:30 -07:00
ad87bfe28f switch to using linear attention for the sparse attention layers within unet, given success in GAN projects 0.0.88 Phil Wang 2022-05-01 17:59:03 -07:00
76c767b1ce update deps, commit to using webdatasets, per @rom1504 consultation Phil Wang 2022-05-01 12:22:15 -07:00
d991b8c39c just clip the diffusion prior network parameters Phil Wang 2022-05-01 12:01:01 -07:00
902693e271 todo Phil Wang 2022-05-01 11:57:08 -07:00
35cd63982d add gradient clipping, make sure weight decay is configurable, make sure learning rate is actually passed into get_optimizer, make sure model is set to training mode at beginning of each epoch Phil Wang 2022-05-01 11:55:38 -07:00
53ce6dfdf6 All changes implemented, current run happening. Link to wandb run in comments. (#43) Kumar R 2022-05-02 00:16:59 +05:30
ad8d7a368b product management Phil Wang 2022-05-01 11:26:21 -07:00
b8cf1e5c20 more attention 0.0.87 Phil Wang 2022-05-01 11:00:26 -07:00
94aaa08d97 product management Phil Wang 2022-05-01 09:43:10 -07:00
8b9bbec7d1 project management 0.0.86 Phil Wang 2022-05-01 09:32:57 -07:00
1bb9fc9829 add convnext backbone for vqgan-vae, still need to fix groupnorms in resnet encdec Phil Wang 2022-05-01 09:32:24 -07:00
5e421bd5bb let researchers do the hyperparameter search 0.0.85 Phil Wang 2022-05-01 08:46:21 -07:00
67fcab1122 add MLP based time conditioning to all convnexts, in addition to cross attention. also add an initial convolution, given convnext first depthwise conv 0.0.84 Phil Wang 2022-05-01 08:41:02 -07:00
5bfbccda22 port over vqgan vae trainer Phil Wang 2022-05-01 08:09:15 -07:00
989275ff59 product management Phil Wang 2022-04-30 16:57:56 -07:00
56408f4a40 project management Phil Wang 2022-04-30 16:57:02 -07:00
d1a697ac23 allows one to shortcut sampling at a specific unet number, if one were to be training in stages Phil Wang 2022-04-30 16:05:13 -07:00
8260fc933a allows one to shortcut sampling at a specific unet number, if one were to be training in stages 0.0.82 Phil Wang 2022-04-30 15:10:25 -07:00
ebe01749ed DecoderTrainer sample method uses the exponentially moving averaged 0.0.81 Phil Wang 2022-04-30 14:55:34 -07:00
63195cc2cb allow for division of loss prior to scaling, for gradient accumulation purposes 0.0.80 Phil Wang 2022-04-30 12:56:47 -07:00
a2ef69af66 take care of mixed precision, and make gradient accumulation do-able externally 0.0.79 Phil Wang 2022-04-30 12:27:24 -07:00
5fff22834e be able to finely customize learning parameters for each unet, take care of gradient clipping 0.0.78 Phil Wang 2022-04-30 11:56:05 -07:00
a9421f49ec simplify Decoder training for the public 0.0.77 Phil Wang 2022-04-30 11:45:18 -07:00
77fa34eae9 fix all clipping / clamping issues 0.0.76 Phil Wang 2022-04-30 10:08:24 -07:00
1c1e508369 fix all issues with text encodings conditioning in the decoder, using null padding tokens technique from dalle v1 0.0.75 Phil Wang 2022-04-30 09:13:34 -07:00
f19c99ecb0 fix decoder needing separate conditional dropping probabilities for image embeddings and text encodings, thanks to @xiankgx ! 0.0.74 Phil Wang 2022-04-30 08:47:56 -07:00
721a444686 Merge pull request #37 from ProGamerGov/patch-1 Phil Wang 2022-04-30 08:19:07 -07:00
63450b466d Fix spelling and grammatical errors ProGamerGov 2022-04-30 09:18:13 -06:00
20e7eb5a9b cleanup Phil Wang 2022-04-30 07:22:57 -07:00
e2f9615afa use @clip-anytorch , thanks to @rom1504 0.0.73 Phil Wang 2022-04-30 06:40:54 -07:00
0d1c07c803 fix a bug with classifier free guidance, thanks to @xiankgx again! 0.0.72 Phil Wang 2022-04-30 06:34:18 -07:00
a389f81138 todo 0.0.71 Phil Wang 2022-04-29 15:40:51 -07:00
0283556608 fix example in readme, since api changed Phil Wang 2022-04-29 13:40:55 -07:00
5063d192b6 now completely OpenAI CLIP compatible for training Phil Wang 2022-04-29 13:05:01 -07:00
846162ef3e just take care of the logic for AdamW and transformers 0.0.70 Phil Wang 2022-04-29 11:43:26 -07:00
39d3659ad9 now completely OpenAI CLIP compatible for training 0.0.67 Phil Wang 2022-04-29 11:26:24 -07:00
f4a54e475e add some training fns Phil Wang 2022-04-29 09:44:55 -07:00
fb662a62f3 fix another bug thanks to @xiankgx 0.0.65 Phil Wang 2022-04-29 07:38:32 -07:00
587c8c9b44 optimize for clarity Phil Wang 2022-04-28 21:59:13 -07:00
aa900213e7 force first unet in the cascade to be conditioned on image embeds 0.0.64 Phil Wang 2022-04-28 20:53:15 -07:00
cb26187450 vqgan-vae codebook dims should be 256 or smaller 0.0.63 Phil Wang 2022-04-28 08:59:03 -07:00
625ce23f6b 🐛 0.0.62 Phil Wang 2022-04-28 07:21:18 -07:00
dbf4a281f1 make sure another CLIP can actually be passed in, as long as it is wrapped in an adapter extended from BaseClipAdapter 0.0.61 Phil Wang 2022-04-27 20:45:27 -07:00
4ab527e779 some extra asserts for text encoding of diffusion prior and decoder 0.0.60 Phil Wang 2022-04-27 20:11:43 -07:00
d0cdeb3247 add ability for DALL-E2 to return PIL images with return_pil_images = True on forward, for those who have no clue about deep learning Phil Wang 2022-04-27 19:58:06 -07:00
8c2015fd39 add ability for DALL-E2 to return PIL images with return_pil_images = True on forward, for those who have no clue about deep learning 0.0.59 Phil Wang 2022-04-27 19:57:27 -07:00
8c610aad9a only pass text encodings conditioning in diffusion prior if specified on initialization 0.0.58 Phil Wang 2022-04-27 19:48:16 -07:00
6700381a37 prepare for ability to integrate other clips other than x-clip 0.0.57 Phil Wang 2022-04-27 19:34:56 -07:00
20377f889a todo Phil Wang 2022-04-27 17:22:14 -07:00
6edb1c5dd0 fix issue with ema class 0.0.56 Phil Wang 2022-04-27 16:40:02 -07:00
b093f92182 inform what is possible Phil Wang 2022-04-27 08:25:16 -07:00
fa3bb6ba5c make sure cpu-only still works 0.0.55 Phil Wang 2022-04-27 08:02:10 -07:00
2705e7c9b0 attention-based upsampling claims unsupported by local experiments, removing Phil Wang 2022-04-27 07:51:04 -07:00
77141882c8 complete vit-vqgan from https://arxiv.org/abs/2110.04627 0.0.54 Phil Wang 2022-04-26 17:20:47 -07:00
e024971dc3 complete vit-vqgan from https://arxiv.org/abs/2110.04627 0.0.53 Phil Wang 2022-04-26 17:04:18 -07:00
4075d02139 nevermind, it could be working, but only when i stabilize it with the feedforward layer + tanh as proposed in vit-vqgan paper (which will be built into the repository later for the latent diffusion) Phil Wang 2022-04-26 12:43:31 -07:00
de0296106b be able to turn off warning for use of LazyLinear by passing in text embedding dimension for unet 0.0.52 Phil Wang 2022-04-26 11:42:46 -07:00
eafb136214 suppress a warning 0.0.51 Phil Wang 2022-04-26 11:40:45 -07:00
bfbcc283a3 DRY a tiny bit for gaussian diffusion related logic Phil Wang 2022-04-26 11:39:12 -07:00
c30544b73a no CLIP altogether for training DiffusionPrior 0.0.50 Phil Wang 2022-04-26 10:23:34 -07:00
bdf5e9c009 todo Phil Wang 2022-04-26 09:56:54 -07:00
9878be760b have researcher explicitly state upfront whether to condition with text encodings in cascading ddpm decoder, have DALLE-2 class take care of passing in text if feature turned on 0.0.49 Phil Wang 2022-04-26 09:47:09 -07:00