0.26.2

Quality of life improvements for tracker savers (#210 )
The default save location is now none so if keys are not specified the corresponding checkpoint type is not saved. Models and checkpoints are now both saved with version number and the config used to create them in order to simplify loading. Documentation was fixed to be in line with current usage.
2026-02-23 09:24:23 +01:00 · 2022-07-19 17:50:36 -07:00 · 2022-07-19 17:50:18 -07:00 · 2022-07-19 11:31:56 -07:00 · 2022-07-19 09:47:44 -07:00 · 2022-07-19 09:36:45 -07:00
26 changed files with 1074 additions and 248 deletions
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@@ -1 +1 @@
-github: [lucidrains]
+github: [nousr, Veldrovive, lucidrains]
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -0,0 +1,33 @@
 name: Continuous integration
 on:
  push:
    branches:
    - main
  pull_request:
    branches:
    - main
 jobs:
  tests:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.8]
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v2
      with:
        python-version: ${{ matrix.python-version }}
    - name: Install
      run: |
        python3 -m venv .env
        source .env/bin/activate
        make install
    - name: Tests
      run: |
        source .env/bin/activate
        make test
--- a/.gitignore
+++ b/.gitignore
@@ -136,3 +136,5 @@ dmypy.json
 # Pyre type checker
 .pyre/
 .tracker_data
 *.pth
--- a/6
+++ b/6
@@ -0,0 +1,6 @@
 install:
 	pip install -U pip
 	pip install -e .
 test:
 	CUDA_VISIBLE_DEVICES= python train_decoder.py --config_file configs/train_decoder_config.test.json
--- a/README.md
+++ b/README.md
@@ -45,6 +45,7 @@ This library would not have gotten to this working state without the help of
 - <a href="https://github.com/rom1504">Romain</a> for the pull request reviews and project management
 - <a href="https://github.com/Ciaohe">He Cao</a> and <a href="https://github.com/xiankgx">xiankgx</a> for the Q&A and for identifying of critical bugs
 - <a href="https://github.com/marunine">Marunine</a> for identifying issues with resizing of the low resolution conditioner, when training the upsampler, in addition to various other bug fixes
 - <a href="https://github.com/malumadev">MalumaDev</a> for proposing the use of pixel shuffle upsampler for fixing checkboard artifacts
 - <a href="https://github.com/crowsonkb">Katherine</a> for her advice
 - <a href="https://stability.ai/">Stability AI</a> for the generous sponsorship
 - <a href="https://huggingface.co">🤗 Huggingface</a> and in particular <a href="https://github.com/sgugger">Sylvain</a> for the <a href="https://github.com/huggingface/accelerate">Accelerate</a> library
@@ -355,7 +356,8 @@ prior_network = DiffusionPriorNetwork(
 diffusion_prior = DiffusionPrior(
    net = prior_network,
    clip = clip,
-    timesteps = 100,
+    timesteps = 1000,
    sample_timesteps = 64,
    cond_drop_prob = 0.2
 ).cuda()
@@ -419,7 +421,7 @@ For the layperson, no worries, training will all be automated into a CLI tool, a
 ## Training on Preprocessed CLIP Embeddings
-It is likely, when scaling up, that you would first preprocess your images and text into corresponding embeddings before training the prior network. You can do so easily by simply passing in `image_embed`, `text_embed`, and optionally `text_encodings` and `text_mask`
+It is likely, when scaling up, that you would first preprocess your images and text into corresponding embeddings before training the prior network. You can do so easily by simply passing in `image_embed`, `text_embed`, and optionally `text_encodings`
 Working example below
@@ -583,6 +585,7 @@ unet1 = Unet(
    cond_dim = 128,
    channels = 3,
    dim_mults=(1, 2, 4, 8),
    text_embed_dim = 512,
    cond_on_text_encodings = True  # set to True for any unets that need to be conditioned on text encodings (ex. first unet in cascade)
 ).cuda()
@@ -598,7 +601,8 @@ decoder = Decoder(
    unet = (unet1, unet2),
    image_sizes = (128, 256),
    clip = clip,
-    timesteps = 100,
+    timesteps = 1000,
    sample_timesteps = (250, 27),
    image_cond_drop_prob = 0.1,
    text_cond_drop_prob = 0.5
 ).cuda()
@@ -624,6 +628,82 @@ images = dalle2(
 Now you'll just have to worry about training the Prior and the Decoder!
 ## Inpainting
 Inpainting is also built into the `Decoder`. You simply have to pass in the `inpaint_image` and `inpaint_mask` (boolean tensor where `True` indicates which regions of the inpaint image to keep)
 This repository uses the formulation put forth by <a href="https://arxiv.org/abs/2201.09865">Lugmayr et al. in Repaint</a>
 ```python
 import torch
 from dalle2_pytorch import Unet, Decoder, CLIP
 # trained clip from step 1
 clip = CLIP(
    dim_text = 512,
    dim_image = 512,
    dim_latent = 512,
    num_text_tokens = 49408,
    text_enc_depth = 6,
    text_seq_len = 256,
    text_heads = 8,
    visual_enc_depth = 6,
    visual_image_size = 256,
    visual_patch_size = 32,
    visual_heads = 8
 ).cuda()
 # 2 unets for the decoder (a la cascading DDPM)
 unet = Unet(
    dim = 16,
    image_embed_dim = 512,
    cond_dim = 128,
    channels = 3,
    dim_mults = (1, 1, 1, 1)
 ).cuda()
 # decoder, which contains the unet(s) and clip
 decoder = Decoder(
    clip = clip,
    unet = (unet,),               # insert both unets in order of low resolution to highest resolution (you can have as many stages as you want here)
    image_sizes = (256,),         # resolutions, 256 for first unet, 512 for second. these must be unique and in ascending order (matches with the unets passed in)
    timesteps = 1000,
    image_cond_drop_prob = 0.1,
    text_cond_drop_prob = 0.5
 ).cuda()
 # mock images (get a lot of this)
 images = torch.randn(4, 3, 256, 256).cuda()
 # feed images into decoder, specifying which unet you want to train
 # each unet can be trained separately, which is one of the benefits of the cascading DDPM scheme
 loss = decoder(images, unet_number = 1)
 loss.backward()
 # do the above for many steps for both unets
 mock_image_embed = torch.randn(1, 512).cuda()
 # then to do inpainting
 inpaint_image = torch.randn(1, 3, 256, 256).cuda()      # (batch, channels, height, width)
 inpaint_mask = torch.ones(1, 256, 256).bool().cuda()    # (batch, height, width)
 inpainted_images = decoder.sample(
    image_embed = mock_image_embed,
    inpaint_image = inpaint_image,    # just pass in the inpaint image
    inpaint_mask = inpaint_mask       # and the mask
 )
 inpainted_images.shape # (1, 3, 256, 256)
 ```
 ## Experimental
 ### DALL-E2 with Latent Diffusion
@@ -987,26 +1067,12 @@ dataset = ImageEmbeddingDataset(
 )
 ```
-### Scripts (wip)
+### Scripts
 #### `train_diffusion_prior.py`
 For detailed information on training the diffusion prior, please refer to the [dedicated readme](prior.md)
 ## CLI (wip)
 ```bash
 $ dream 'sharing a sunset at the summit of mount everest with my dog'
 ```
 Once built, images will be saved to the same directory the command is invoked
 <a href="https://github.com/lucidrains/big-sleep">template</a>
 ## Training CLI (wip)
 <a href="https://github.com/lucidrains/stylegan2-pytorch">template</a>
 ## Todo
 - [x] finish off gaussian diffusion class for latent embedding - allow for prediction of epsilon
@@ -1044,11 +1110,10 @@ Once built, images will be saved to the same directory the command is invoked
 - [x] bring in skip-layer excitations (from lightweight gan paper) to see if it helps for either decoder of unet or vqgan-vae training (doesnt work well)
 - [x] test out grid attention in cascading ddpm locally, decide whether to keep or remove https://arxiv.org/abs/2204.01697 (keeping, seems to be fine)
 - [x] allow for unet to be able to condition non-cross attention style as well
- [ ] become an expert with unets, cleanup unet code, make it fully configurable, port all learnings over to https://github.com/lucidrains/x-unet (test out unet² in ddpm repo) - consider https://github.com/lucidrains/uformer-pytorch attention-based unet
+- [x] speed up inference, read up on papers (ddim)
- [ ] speed up inference, read up on papers (ddim or diffusion-gan, etc)
+- [x] add inpainting ability using resampler from repaint paper https://arxiv.org/abs/2201.09865
- [ ] figure out if possible to augment with external memory, as described in https://arxiv.org/abs/2204.11824
+- [ ] try out the nested unet from https://arxiv.org/abs/2005.09007 after hearing several positive testimonies from researchers, for segmentation anyhow
 - [ ] interface out the vqgan-vae so a pretrained one can be pulled off the shelf to validate latent diffusion + DALL-E2
 - [ ] add inpainting ability using resampler from repaint paper https://arxiv.org/abs/2201.09865
 ## Citations
@@ -1166,4 +1231,14 @@ Once built, images will be saved to the same directory the command is invoked
 }
 ```
 ```bibtex
@article{Lugmayr2022RePaintIU,
    title   = {RePaint: Inpainting using Denoising Diffusion Probabilistic Models},
    author  = {Andreas Lugmayr and Martin Danelljan and Andr{\'e}s Romero and Fisher Yu and Radu Timofte and Luc Van Gool},
    journal = {ArXiv},
    year    = {2022},
    volume  = {abs/2201.09865}
 }
 ```
 *Creating noise from data is easy; creating data from noise is generative modeling.* - <a href="https://arxiv.org/abs/2011.13456">Yang Song's paper</a>
--- a/configs/README.md
+++ b/configs/README.md
@@ -30,6 +30,7 @@ Defines the configuration options for the decoder model. The unets defined above
 | `loss_type` | No | `l2` | The loss function. Options are `l1`, `huber`, or `l2`. |
 | `beta_schedule` | No | `cosine` | The noising schedule. Options are `cosine`, `linear`, `quadratic`, `jsd`, or `sigmoid`. |
 | `learned_variance` | No | `True` | Whether to learn the variance. |
 | `clip` | No | `None` | The clip model to use if embeddings are being generated on the fly. Takes keys `make` and `model` with defaults `openai` and `ViT-L/14`. |
 Any parameter from the `Decoder` constructor can also be given here.
@@ -39,7 +40,8 @@ Settings for creation of the dataloaders.
 | Option | Required | Default | Description |
 | ------ | -------- | ------- | ----------- |
 | `webdataset_base_url` | Yes | N/A | The url of a shard in the webdataset with the shard replaced with `{}`[^1]. |
-| `embeddings_url` | No | N/A | The url of the folder containing embeddings shards. Not required if embeddings are in webdataset. |
+| `img_embeddings_url` | No | `None` | The url of the folder containing image embeddings shards. Not required if embeddings are in webdataset or clip is being used. |
 | `text_embeddings_url` | No | `None` | The url of the folder containing text embeddings shards. Not required if embeddings are in webdataset or clip is being used. |
 | `num_workers` | No | `4` | The number of workers used in the dataloader. |
 | `batch_size` | No | `64` | The batch size. |
 | `start_shard` | No | `0` | Defines the start of the shard range the dataset will recall. |
@@ -72,9 +74,6 @@ Settings for controlling the training hyperparameters.
 | `validation_samples` | No | `None` | The number of samples to use for validation. None mean the entire validation set. |
 | `use_ema` | No | `True` | Whether to use exponential moving average models for sampling. |
 | `ema_beta` | No | `0.99` | The ema coefficient. |
 | `save_all` | No | `False` | If True, preserves a checkpoint for every epoch. |
 | `save_latest` | No | `True` | If True, overwrites the `latest.pth` every time the model is saved. |
 | `save_best` | No | `True` | If True, overwrites the `best.pth` every time the model has a lower validation loss than all previous models. |
 | `unet_training_mask` | No | `None` | A boolean array of the same length as the number of unets. If false, the unet is frozen. A value of `None` trains all unets. |
 **<ins>Evaluate</ins>:**
@@ -106,6 +105,13 @@ Tracking is split up into three sections:
 **Logging:**
 All loggers have the following keys:
 | Option | Required | Default | Description |
 | ------ | -------- | ------- | ----------- |
 | `log_type` | Yes | N/A | The type of logger class to use. |
 | `resume` | No | `False` | For loggers that have the option to resume an old run, resume it using maually input parameters. |
 | `auto_resume` | No | `False` | If true, the logger will attempt to resume an old run using parameters from that previous run. |
 If using `console` there is no further configuration than setting `log_type` to `console`.
 | Option | Required | Default | Description |
 | ------ | -------- | ------- | ----------- |
@@ -119,10 +125,15 @@ If using `wandb`
 | `wandb_project` | Yes | N/A | The wandb project save the run to. |
 | `wandb_run_name` | No | `None` | The wandb run name. |
 | `wandb_run_id` | No | `None` | The wandb run id. Used if resuming an old run. |
 | `wandb_resume` | No | `False` | Whether to resume an old run. |
 **Loading:**
 All loaders have the following keys:
 | Option | Required | Default | Description |
 | ------ | -------- | ------- | ----------- |
 | `load_from` | Yes | N/A | The type of loader class to use. |
 | `only_auto_resume` | No | `False` | If true, the loader will only load the model if the run is being auto resumed. |
 If using `local`
 | Option | Required | Default | Description |
 | ------ | -------- | ------- | ----------- |
@@ -149,9 +160,10 @@ All save locations have these configuration options
 | Option | Required | Default | Description |
 | ------ | -------- | ------- | ----------- |
 | `save_to` | Yes | N/A | Must be `local`, `huggingface`, or `wandb`. |
-| `save_latest_to` | No | `latest.pth` | Sets the relative path to save the latest model to. |
+| `save_latest_to` | No | `None` | Sets the relative path to save the latest model to. |
-| `save_best_to` | No | `best.pth` | Sets the relative path to save the best model to every time the model has a lower validation loss than all previous models. |
+| `save_best_to` | No | `None` | Sets the relative path to save the best model to every time the model has a lower validation loss than all previous models. |
-| `save_type` | No | `'checkpoint'` | The type of save. `'checkpoint'` saves a checkpoint, `'model'` saves a model without any fluff (Saves with ema if ema is enabled). |
+| `save_meta_to` | No | `None` | The path to save metadata files in. This includes the config files used to start the training. |
 | `save_type` | No | `checkpoint` | The type of save. `checkpoint` saves a checkpoint, `model` saves a model without any fluff (Saves with ema if ema is enabled). |
 If using `local`
 | Option | Required | Default | Description |
@@ -163,7 +175,6 @@ If using `huggingface`
 | ------ | -------- | ------- | ----------- |
 | `save_to` | Yes | N/A | Must be `huggingface`. |
 | `huggingface_repo` | Yes | N/A | The huggingface repository to save to. |
 | `huggingface_base_path` | Yes | N/A | The base path that checkpoints will be saved under. |
 | `token_path` | No | `None` | If logging in with the huggingface cli is not possible, point to a token file instead. |
 If using `wandb`
--- a/configs/train_decoder_config.example.json
+++ b/configs/train_decoder_config.example.json
@@ -20,7 +20,7 @@
    },
    "data": {
        "webdataset_base_url": "pipe:s3cmd get s3://bucket/path/{}.tar -",
-        "embeddings_url": "s3://bucket/embeddings/path/",
+        "img_embeddings_url": "s3://bucket/img_embeddings/path/",
        "num_workers": 4,
        "batch_size": 64,
        "start_shard": 0,
@@ -56,9 +56,6 @@
        "use_ema": true,
        "ema_beta": 0.99,
        "amp": false,
        "save_all": false,
        "save_latest": true,
        "save_best": true,
        "unet_training_mask": [true]
    },
    "evaluate": {
@@ -96,14 +93,15 @@
        },
        "save": [{
-            "save_to": "wandb"
+            "save_to": "wandb",
            "save_latest_to": "latest.pth"
        }, {
            "save_to": "huggingface",
            "huggingface_repo": "Veldrovive/test_model",
-            "save_all": true,
+            "save_latest_to": "path/to/model_dir/latest.pth",
-            "save_latest": true,
+            "save_best_to": "path/to/model_dir/best.pth",
-            "save_best": true,
+            "save_meta_to": "path/to/directory/for/assorted/files",
            "save_type": "model"
        }]
--- a/configs/train_decoder_config.test.json
+++ b/configs/train_decoder_config.test.json
@@ -0,0 +1,100 @@
 {
    "decoder": {
        "unets": [
            {
                "dim": 16,
                "image_embed_dim": 768,
                "cond_dim": 16,
                "channels": 3,
                "dim_mults": [1, 2, 4, 8],
                "attn_dim_head": 16,
                "attn_heads": 4,
 		"self_attn": [false, true, true, true]
            }
        ],
        "clip": {
            "make": "openai",
            "model": "ViT-L/14"
        },
 	"timesteps": 10,
        "image_sizes": [64],
        "channels": 3,
        "loss_type": "l2",
        "beta_schedule": ["cosine"],
        "learned_variance": true
    },
    "data": {
        "webdataset_base_url": "test_data/{}.tar",
        "num_workers": 4,
        "batch_size": 4,
        "start_shard": 0,
        "end_shard": 9,
        "shard_width": 1,
        "index_width": 1,
        "splits": {
            "train": 0.75,
            "val": 0.15,
            "test": 0.1
        },
        "shuffle_train": false,
        "resample_train": true,
        "preprocessing": {
            "RandomResizedCrop": {
                "size": [224, 224],
                "scale": [0.75, 1.0],
                "ratio": [1.0, 1.0]
            },
            "ToTensor": true
        }
    },
    "train": {
        "epochs": 1,
        "lr": 1e-16,
        "wd": 0.01,
        "max_grad_norm": 0.5,
        "save_every_n_samples": 100,
        "n_sample_images": 1,
        "device": "cpu",
        "epoch_samples": 50,
        "validation_samples": 5,
        "use_ema": true,
        "ema_beta": 0.99,
        "amp": false,
        "unet_training_mask": [true]
    },
    "evaluate": {
        "n_evaluation_samples": 2,
        "FID": {
            "feature": 64
        },
        "IS": {
            "feature": 64,
            "splits": 10
        },
        "KID": {
            "feature": 64,
            "subset_size": 2
        },
        "LPIPS": {
            "net_type": "vgg",
            "reduction": "mean"
        }
    },
    "tracker": {
        "overwrite_data_path": true,
 	"log": {
            "log_type": "console"
 	},
        "load": {
            "load_from": null
        },
       "save": [{
            "save_to": "local",
            "save_latest_to": "latest.pth"
        }]
    }
 }
--- a/dalle2_pytorch/dalle2_pytorch.py
+++ b/dalle2_pytorch/dalle2_pytorch.py
--- a/dalle2_pytorch/dataloaders/decoder_loader.py
+++ b/dalle2_pytorch/dataloaders/decoder_loader.py
@@ -1,6 +1,7 @@
 import os
 import webdataset as wds
 import torch
 from torch.utils.data import DataLoader
 import numpy as np
 import fsspec
 import shutil
@@ -255,7 +256,7 @@ def create_image_embedding_dataloader(
    )
    if shuffle_num is not None and shuffle_num > 0:
        ds.shuffle(1000)
-    return wds.WebLoader(
+    return DataLoader(
        ds,
        num_workers=num_workers,
        batch_size=batch_size,
--- a/dalle2_pytorch/trackers.py
+++ b/dalle2_pytorch/trackers.py
@@ -1,15 +1,18 @@
 import urllib.request
 import os
 import json
 from pathlib import Path
 import shutil
 from itertools import zip_longest
-from typing import Optional, List, Union
+from typing import Any, Optional, List, Union
 from pydantic import BaseModel
 import torch
-
+from dalle2_pytorch.dalle2_pytorch import Decoder, DiffusionPrior
 from dalle2_pytorch.utils import import_or_print_error
 from dalle2_pytorch.trainer import DecoderTrainer, DiffusionPriorTrainer
 from dalle2_pytorch.version import __version__
 from packaging import version
 # constants
@@ -20,16 +23,6 @@ DEFAULT_DATA_PATH = './.tracker-data'
 def exists(val):
    return val is not None
 # load file functions
 def load_wandb_file(run_path, file_path, **kwargs):
    wandb = import_or_print_error('wandb', '`pip install wandb` to use the wandb recall function')
    file_reference = wandb.restore(file_path, run_path=run_path)
    return file_reference.name
 def load_local_file(file_path, **kwargs):
    return file_path
 class BaseLogger:
    """
    An abstract class representing an object that can log data.
@@ -37,14 +30,17 @@ class BaseLogger:
        data_path (str): A file path for storing temporary data.
        verbose (bool): Whether of not to always print logs to the console.
    """
-    def __init__(self, data_path: str, verbose: bool = False, **kwargs):
+    def __init__(self, data_path: str, resume: bool = False, auto_resume: bool = False, verbose: bool = False, **kwargs):
        self.data_path = Path(data_path)
        self.resume = resume
        self.auto_resume = auto_resume
        self.verbose = verbose
    def init(self, full_config: BaseModel, extra_config: dict, **kwargs) -> None:
        """
        Initializes the logger.
        Errors if the logger is invalid.
        full_config is the config file dict while extra_config is anything else from the script that is not defined the config file.
        """
        raise NotImplementedError
@@ -60,6 +56,14 @@ class BaseLogger:
    def log_error(self, error_string, **kwargs) -> None:
        raise NotImplementedError
    def get_resume_data(self, **kwargs) -> dict:
        """
        Sets tracker attributes that along with { "resume": True } will be used to resume training.
        It is assumed that after init is called this data will be complete.
        If the logger does not have any resume functionality, it should return an empty dict.
        """
        raise NotImplementedError
 class ConsoleLogger(BaseLogger):
    def init(self, full_config: BaseModel, extra_config: dict, **kwargs) -> None:
        print("Logging to console")
@@ -76,6 +80,9 @@ class ConsoleLogger(BaseLogger):
    def log_error(self, error_string, **kwargs) -> None:
        print(error_string)
    def get_resume_data(self, **kwargs) -> dict:
        return {}
 class WandbLogger(BaseLogger):
    """
    Logs to a wandb run.
@@ -85,7 +92,6 @@ class WandbLogger(BaseLogger):
        wandb_project (str): The wandb project to log to.
        wandb_run_id (str): The wandb run id to resume.
        wandb_run_name (str): The wandb run name to use.
        wandb_resume (bool): Whether to resume a wandb run.
    """
    def __init__(self,
        data_path: str,
@@ -93,7 +99,6 @@ class WandbLogger(BaseLogger):
        wandb_project: str,
        wandb_run_id: Optional[str] = None,
        wandb_run_name: Optional[str] = None,
        wandb_resume: bool = False,
        **kwargs
    ):
        super().__init__(data_path, **kwargs)
@@ -101,7 +106,6 @@ class WandbLogger(BaseLogger):
        self.project = wandb_project
        self.run_id = wandb_run_id
        self.run_name = wandb_run_name
        self.resume = wandb_resume
    def init(self, full_config: BaseModel, extra_config: dict, **kwargs) -> None:
        assert self.entity is not None, "wandb_entity must be specified for wandb logger"
@@ -149,6 +153,14 @@ class WandbLogger(BaseLogger):
            print(error_string)
        self.wandb.log({"error": error_string, **kwargs}, step=step)
    def get_resume_data(self, **kwargs) -> dict:
        # In order to resume, we need wandb_entity, wandb_project, and wandb_run_id
        return {
            "entity": self.entity,
            "project": self.project,
            "run_id": self.wandb.run.id
        }
 logger_type_map = {
    'console': ConsoleLogger,
    'wandb': WandbLogger,
@@ -168,8 +180,9 @@ class BaseLoader:
    Parameters:
        data_path (str): A file path for storing temporary data.
    """
-    def __init__(self, data_path: str, **kwargs):
+    def __init__(self, data_path: str, only_auto_resume: bool = False, **kwargs):
        self.data_path = Path(data_path)
        self.only_auto_resume = only_auto_resume
    def init(self, logger: BaseLogger, **kwargs) -> None:
        raise NotImplementedError
@@ -213,7 +226,7 @@ class LocalLoader(BaseLoader):
    def init(self, logger: BaseLogger, **kwargs) -> None:
        # Makes sure the file exists to be loaded
-        if not self.file_path.exists():
+        if not self.file_path.exists() and not self.only_auto_resume:
            raise FileNotFoundError(f'Model not found at {self.file_path}')
    def recall(self) -> dict:
@@ -262,9 +275,9 @@ def create_loader(loader_type: str, data_path: str, **kwargs) -> BaseLoader:
 class BaseSaver:
    def __init__(self,
        data_path: str,
-        save_latest_to: Optional[Union[str, bool]] = 'latest.pth',
+        save_latest_to: Optional[Union[str, bool]] = None,
-        save_best_to: Optional[Union[str, bool]] = 'best.pth',
+        save_best_to: Optional[Union[str, bool]] = None,
-        save_meta_to: str = './',
+        save_meta_to: Optional[str] = None,
        save_type: str = 'checkpoint',
        **kwargs
    ):
@@ -274,10 +287,10 @@ class BaseSaver:
        self.save_best_to = save_best_to
        self.saving_best = save_best_to is not None and save_best_to is not False
        self.save_meta_to = save_meta_to
        self.saving_meta = save_meta_to is not None
        self.save_type = save_type
        assert save_type in ['checkpoint', 'model'], '`save_type` must be one of `checkpoint` or `model`'
-        assert self.save_meta_to is not None, '`save_meta_to` must be provided'
+        assert self.saving_latest or self.saving_best or self.saving_meta, 'At least one saving option must be specified'
        assert self.saving_latest or self.saving_best, '`save_latest_to` or `save_best_to` must be provided'
    def init(self, logger: BaseLogger, **kwargs) -> None:
        raise NotImplementedError
@@ -304,6 +317,10 @@ class LocalSaver(BaseSaver):
    def save_file(self, local_path: str, save_path: str, **kwargs) -> None:
        # Copy the file to save_path
        save_path_file_name = Path(save_path).name
        # Make sure parent directory exists
        save_path_parent = Path(save_path).parent
        if not save_path_parent.exists():
            save_path_parent.mkdir(parents=True)
        print(f"Saving {save_path_file_name} {self.save_type} to local path {save_path}")
        shutil.copy(local_path, save_path)
@@ -385,11 +402,7 @@ class Tracker:
    def __init__(self, data_path: Optional[str] = DEFAULT_DATA_PATH, overwrite_data_path: bool = False, dummy_mode: bool = False):
        self.data_path = Path(data_path)
        if not dummy_mode:
-            if overwrite_data_path:
+            if not overwrite_data_path:
                if self.data_path.exists():
                    shutil.rmtree(self.data_path)
                self.data_path.mkdir(parents=True)
            else:
                assert not self.data_path.exists(), f'Data path {self.data_path} already exists. Set overwrite_data_path to True to overwrite.'
                if not self.data_path.exists():
                    self.data_path.mkdir(parents=True)
@@ -398,7 +411,51 @@ class Tracker:
        self.savers: List[BaseSaver]= []
        self.dummy_mode = dummy_mode
    def _load_auto_resume(self) -> bool:
        # If the file does not exist, we return False. If autoresume is enabled we print a warning so that the user can know that this is the first run.
        if not self.auto_resume_path.exists():
            if self.logger.auto_resume:
                print("Auto_resume is enabled but no auto_resume.json file exists. Assuming this is the first run.")
            return False
        # Now we know that the autoresume file exists, but if we are not auto resuming we should remove it so that we don't accidentally load it next time
        if not self.logger.auto_resume:
            print(f'Removing auto_resume.json because auto_resume is not enabled in the config')
            self.auto_resume_path.unlink()
            return False
        # Otherwise we read the json into a dictionary will will override parts of logger.__dict__
        with open(self.auto_resume_path, 'r') as f:
            auto_resume_dict = json.load(f)
        # Check if the logger is of the same type as the autoresume save
        if auto_resume_dict["logger_type"] != self.logger.__class__.__name__:
            raise Exception(f'The logger type in the auto_resume file is {auto_resume_dict["logger_type"]} but the current logger is {self.logger.__class__.__name__}. Either use the original logger type, set `auto_resume` to `False`, or delete your existing tracker-data folder.')
        # Then we are ready to override the logger with the autoresume save
        self.logger.__dict__["resume"] = True
        print(f"Updating {self.logger.__dict__} with {auto_resume_dict}")
        self.logger.__dict__.update(auto_resume_dict)
        return True
    def _save_auto_resume(self):
        # Gets the autoresume dict from the logger and adds "logger_type" to it then saves it to the auto_resume file
        auto_resume_dict = self.logger.get_resume_data()
        auto_resume_dict['logger_type'] = self.logger.__class__.__name__
        with open(self.auto_resume_path, 'w') as f:
            json.dump(auto_resume_dict, f)
    def init(self, full_config: BaseModel, extra_config: dict):
        self.auto_resume_path = self.data_path / 'auto_resume.json'
        # Check for resuming the run
        self.did_auto_resume = self._load_auto_resume()
        if self.did_auto_resume:
            print(f'\n\nWARNING: RUN HAS BEEN AUTO-RESUMED WITH THE LOGGER TYPE {self.logger.__class__.__name__}.\nIf this was not your intention, stop this run and set `auto_resume` to `False` in the config.\n\n')
            print(f"New logger config: {self.logger.__dict__}")
        self.save_metadata = dict(
            version = version.parse(__version__)
        )  # Data that will be saved alongside the checkpoint or model
        self.blacklisted_checkpoint_metadata_keys = ['scaler', 'optimizer', 'model', 'version', 'step', 'steps']  # These keys would cause us to error if we try to save them as metadata
        assert self.logger is not None, '`logger` must be set before `init` is called'
        if self.dummy_mode:
            # The only thing we need is a loader
@@ -406,12 +463,17 @@ class Tracker:
                self.loader.init(self.logger)
            return
        assert len(self.savers) > 0, '`savers` must be set before `init` is called'
        self.logger.init(full_config, extra_config)
        if self.loader is not None:
            self.loader.init(self.logger)
        for saver in self.savers:
            saver.init(self.logger)
        if self.logger.auto_resume:
            # Then we need to save the autoresume file. It is assumed after logger.init is called that the logger is ready to be saved.
            self._save_auto_resume()
    def add_logger(self, logger: BaseLogger):
        self.logger = logger
@@ -442,8 +504,15 @@ class Tracker:
        # Save the config under config_name in the root folder of data_path
        shutil.copy(current_config_path, self.data_path / config_name)
        for saver in self.savers:
-            remote_path = Path(saver.save_meta_to) / config_name
+            if saver.saving_meta:
-            saver.save_file(current_config_path, str(remote_path))
+                remote_path = Path(saver.save_meta_to) / config_name
                saver.save_file(current_config_path, str(remote_path))
    def add_save_metadata(self, state_dict_key: str, metadata: Any):
        """
        Adds a new piece of metadata that will be saved along with the model or decoder.
        """
        self.save_metadata[state_dict_key] = metadata
    def _save_state_dict(self, trainer: Union[DiffusionPriorTrainer, DecoderTrainer], save_type: str, file_path: str, **kwargs) -> Path:
        """
@@ -453,24 +522,34 @@ class Tracker:
        """
        assert save_type in ['checkpoint', 'model']
        if save_type == 'checkpoint':
-            trainer.save(file_path, overwrite=True, **kwargs)
+            # Create a metadata dict without the blacklisted keys so we do not error when we create the state dict
            metadata = {k: v for k, v in self.save_metadata.items() if k not in self.blacklisted_checkpoint_metadata_keys}
            trainer.save(file_path, overwrite=True, **kwargs, **metadata)
        elif save_type == 'model':
            if isinstance(trainer, DiffusionPriorTrainer):
                prior = trainer.ema_diffusion_prior.ema_model if trainer.use_ema else trainer.diffusion_prior
-                state_dict = trainer.unwrap_model(prior).state_dict()
+                prior: DiffusionPrior = trainer.unwrap_model(prior)
-                torch.save(state_dict, file_path)
+                # Remove CLIP if it is part of the model
                prior.clip = None
                model_state_dict = prior.state_dict()
            elif isinstance(trainer, DecoderTrainer):
-                decoder = trainer.accelerator.unwrap_model(trainer.decoder)
+                decoder: Decoder = trainer.accelerator.unwrap_model(trainer.decoder)
                # Remove CLIP if it is part of the model
                decoder.clip = None
                if trainer.use_ema:
                    trainable_unets = decoder.unets
                    decoder.unets = trainer.unets  # Swap EMA unets in
-                    state_dict = decoder.state_dict()
+                    model_state_dict = decoder.state_dict()
                    decoder.unets = trainable_unets  # Swap back
                else:
-                    state_dict = decoder.state_dict()
+                    model_state_dict = decoder.state_dict()
                torch.save(state_dict, file_path)
            else:
                raise NotImplementedError('Saving this type of model with EMA mode enabled is not yet implemented. Actually, how did you get here?')
            state_dict = {
                **self.save_metadata,
                'model': model_state_dict
            }
            torch.save(state_dict, file_path)
        return Path(file_path)
    def save(self, trainer, is_best: bool, is_latest: bool, **kwargs):
@@ -503,11 +582,16 @@ class Tracker:
                    self.logger.log_error(f'Error saving checkpoint: {e}', **kwargs)
                    print(f'Error saving checkpoint: {e}')
    @property
    def can_recall(self):
        # Defines whether a recall can be performed.
        return self.loader is not None and (not self.loader.only_auto_resume or self.did_auto_resume)
    def recall(self):
-        if self.loader is not None:
+        if self.can_recall:
            return self.loader.recall()
        else:
-            raise ValueError('No loader specified')
+            raise ValueError('Tried to recall, but no loader was set or auto-resume was not performed.')
--- a/dalle2_pytorch/train_configs.py
+++ b/dalle2_pytorch/train_configs.py
@@ -47,6 +47,8 @@ class TrainSplitConfig(BaseModel):
 class TrackerLogConfig(BaseModel):
    log_type: str = 'console'
    resume: bool = False  # For logs that are saved to unique locations, resume a previous run
    auto_resume: bool = False  # If the process crashes and restarts, resume from the run that crashed
    verbose: bool = False
    class Config:
@@ -59,6 +61,7 @@ class TrackerLogConfig(BaseModel):
 class TrackerLoadConfig(BaseModel):
    load_from: Optional[str] = None
    only_auto_resume: bool = False  # Only attempt to load if the logger is auto-resuming
    class Config:
        extra = "allow"
@@ -126,6 +129,7 @@ class AdapterConfig(BaseModel):
 class DiffusionPriorNetworkConfig(BaseModel):
    dim: int
    depth: int
    max_text_len: int = None
    num_timesteps: int = None
    num_time_embeds: int = 1
    num_image_embeds: int = 1
@@ -133,6 +137,7 @@ class DiffusionPriorNetworkConfig(BaseModel):
    dim_head: int = 64
    heads: int = 8
    ff_mult: int = 4
    norm_in: bool = False
    norm_out: bool = True
    attn_dropout: float = 0.
    ff_dropout: float = 0.
@@ -151,6 +156,7 @@ class DiffusionPriorConfig(BaseModel):
    image_size: int
    image_channels: int = 3
    timesteps: int = 1000
    sample_timesteps: Optional[int] = None
    cond_drop_prob: float = 0.
    loss_type: str = 'l2'
    predict_x_start: bool = True
@@ -219,6 +225,7 @@ class UnetConfig(BaseModel):
    self_attn: ListOrTuple(int)
    attn_dim_head: int = 32
    attn_heads: int = 16
    init_cross_embed: bool = True
    class Config:
        extra = "allow"
@@ -230,6 +237,7 @@ class DecoderConfig(BaseModel):
    clip: Optional[AdapterConfig]   # The clip model to use if embeddings are not provided
    channels: int = 3
    timesteps: int = 1000
    sample_timesteps: Optional[SingularOrIterable(int)] = None
    loss_type: str = 'l2'
    beta_schedule: ListOrTuple(str) = 'cosine'
    learned_variance: bool = True
--- a/dalle2_pytorch/trainer.py
+++ b/dalle2_pytorch/trainer.py
@@ -21,7 +21,7 @@ import pytorch_warmup as warmup
 from ema_pytorch import EMA
-from accelerate import Accelerator
+from accelerate import Accelerator, DistributedType
 import numpy as np
@@ -76,6 +76,7 @@ def cast_torch_tensor(fn):
    def inner(model, *args, **kwargs):
        device = kwargs.pop('_device', next(model.parameters()).device)
        cast_device = kwargs.pop('_cast_device', True)
        cast_deepspeed_precision = kwargs.pop('_cast_deepspeed_precision', True)
        kwargs_keys = kwargs.keys()
        all_args = (*args, *kwargs.values())
@@ -85,6 +86,21 @@ def cast_torch_tensor(fn):
        if cast_device:
            all_args = tuple(map(lambda t: t.to(device) if exists(t) and isinstance(t, torch.Tensor) else t, all_args))
        if cast_deepspeed_precision:
            try:
                accelerator = model.accelerator
                if accelerator is not None and accelerator.distributed_type == DistributedType.DEEPSPEED:
                    cast_type_map = {
                        "fp16": torch.half,
                        "bf16": torch.bfloat16,
                        "no": torch.float
                    }
                    precision_type = cast_type_map[accelerator.mixed_precision]
                    all_args = tuple(map(lambda t: t.to(precision_type) if exists(t) and isinstance(t, torch.Tensor) else t, all_args))
            except AttributeError:
                # Then this model doesn't have an accelerator
                pass
        args, kwargs_values = all_args[:split_kwargs_index], all_args[split_kwargs_index:]
        kwargs = dict(tuple(zip(kwargs_keys, kwargs_values)))
@@ -446,6 +462,7 @@ class DecoderTrainer(nn.Module):
        self,
        decoder,
        accelerator = None,
        dataloaders = None,
        use_ema = True,
        lr = 1e-4,
        wd = 1e-2,
@@ -508,11 +525,31 @@ class DecoderTrainer(nn.Module):
        self.register_buffer('steps', torch.tensor([0] * self.num_unets))
        if self.accelerator.distributed_type == DistributedType.DEEPSPEED and decoder.clip is not None:
            # Then we need to make sure clip is using the correct precision or else deepspeed will error
            cast_type_map = {
                "fp16": torch.half,
                "bf16": torch.bfloat16,
                "no": torch.float
            }
            precision_type = cast_type_map[accelerator.mixed_precision]
            assert precision_type == torch.float, "DeepSpeed currently only supports float32 precision when using on the fly embedding generation from clip"
            clip = decoder.clip
            clip.to(precision_type)
        decoder, *optimizers = list(self.accelerator.prepare(decoder, *optimizers))
        schedulers = list(self.accelerator.prepare(*schedulers))
        self.decoder = decoder
        # prepare dataloaders
        train_loader = val_loader = None
        if exists(dataloaders):
            train_loader, val_loader = self.accelerator.prepare(dataloaders["train"], dataloaders["val"])
        self.train_loader = train_loader
        self.val_loader = val_loader
        # store optimizers
        for opt_ind, optimizer in zip(range(len(optimizers)), optimizers):
@@ -527,6 +564,17 @@ class DecoderTrainer(nn.Module):
        self.warmup_schedulers = warmup_schedulers
    def validate_and_return_unet_number(self, unet_number = None):
        if self.num_unets == 1:
            unet_number = default(unet_number, 1)
        assert exists(unet_number) and 1 <= unet_number <= self.num_unets
        return unet_number
    def num_steps_taken(self, unet_number = None):
        unet_number = self.validate_and_return_unet_number(unet_number)
        return self.steps[unet_number - 1].item()
    def save(self, path, overwrite = True, **kwargs):
        path = Path(path)
        assert not (path.exists() and not overwrite)
@@ -595,10 +643,7 @@ class DecoderTrainer(nn.Module):
        self.steps += F.one_hot(unet_index_tensor, num_classes = len(self.steps))
    def update(self, unet_number = None):
-        if self.num_unets == 1:
+        unet_number = self.validate_and_return_unet_number(unet_number)
            unet_number = default(unet_number, 1)
        assert exists(unet_number) and 1 <= unet_number <= self.num_unets
        index = unet_number - 1
        optimizer = getattr(self, f'optim{index}')
@@ -628,8 +673,14 @@ class DecoderTrainer(nn.Module):
    def sample(self, *args, **kwargs):
        distributed = self.accelerator.num_processes > 1
        base_decoder = self.accelerator.unwrap_model(self.decoder)
        was_training = base_decoder.training
        base_decoder.eval()
        if kwargs.pop('use_non_ema', False) or not self.use_ema:
-            return base_decoder.sample(*args, **kwargs, distributed = distributed)
+            out = base_decoder.sample(*args, **kwargs, distributed = distributed)
            base_decoder.train(was_training)
            return out
        trainable_unets = self.accelerator.unwrap_model(self.decoder).unets
        base_decoder.unets = self.unets                  # swap in exponential moving averaged unets for sampling
@@ -642,6 +693,7 @@ class DecoderTrainer(nn.Module):
        for ema in self.ema_unets:
            ema.restore_ema_model_device()
        base_decoder.train(was_training)
        return output
    @torch.no_grad()
@@ -664,11 +716,13 @@ class DecoderTrainer(nn.Module):
        max_batch_size = None,
        **kwargs
    ):
-        if self.num_unets == 1:
+        unet_number = self.validate_and_return_unet_number(unet_number)
            unet_number = default(unet_number, 1)
        total_loss = 0.
        using_amp = self.accelerator.mixed_precision != 'no'
        for chunk_size_frac, (chunked_args, chunked_kwargs) in split_args_and_kwargs(*args, split_size = max_batch_size, **kwargs):
            with self.accelerator.autocast():
                loss = self.decoder(*chunked_args, unet_number = unet_number, **chunked_kwargs)
--- a/dalle2_pytorch/version.py
+++ b/dalle2_pytorch/version.py
@@ -1 +1 @@
-__version__ = '0.16.18'
+__version__ = '0.26.2'
--- a/test_data/0.tar
+++ b/test_data/0.tar
--- a/test_data/1.tar
+++ b/test_data/1.tar
--- a/test_data/2.tar
+++ b/test_data/2.tar
--- a/test_data/3.tar
+++ b/test_data/3.tar
--- a/test_data/4.tar
+++ b/test_data/4.tar
--- a/test_data/5.tar
+++ b/test_data/5.tar
--- a/test_data/6.tar
+++ b/test_data/6.tar
--- a/test_data/7.tar
+++ b/test_data/7.tar
--- a/test_data/8.tar
+++ b/test_data/8.tar
--- a/test_data/9.tar
+++ b/test_data/9.tar
--- a/train_decoder.py
+++ b/train_decoder.py
@@ -132,7 +132,7 @@ def get_example_data(dataloader, device, n=5):
            break
    return list(zip(images[:n], img_embeddings[:n], text_embeddings[:n], captions[:n]))
-def generate_samples(trainer, example_data, condition_on_text_encodings=False, text_prepend=""):
+def generate_samples(trainer, example_data, condition_on_text_encodings=False, text_prepend="", match_image_size=True):
    """
    Takes example data and generates images from the embeddings
    Returns three lists: real images, generated images, and captions
@@ -160,6 +160,9 @@ def generate_samples(trainer, example_data, condition_on_text_encodings=False, t
    samples = trainer.sample(**sample_params)
    generated_images = list(samples)
    captions = [text_prepend + txt for txt in txts]
    if match_image_size:
        generated_image_size = generated_images[0].shape[-1]
        real_images = [resize_image_to(image, generated_image_size, clamp_range=(0, 1)) for image in real_images]
    return real_images, generated_images, captions
 def generate_grid_samples(trainer, examples, condition_on_text_encodings=False, text_prepend=""):
@@ -167,14 +170,6 @@ def generate_grid_samples(trainer, examples, condition_on_text_encodings=False,
    Generates samples and uses torchvision to put them in a side by side grid for easy viewing
    """
    real_images, generated_images, captions = generate_samples(trainer, examples, condition_on_text_encodings, text_prepend)
    real_image_size = real_images[0].shape[-1]
    generated_image_size = generated_images[0].shape[-1]
    # training images may be larger than the generated one
    if real_image_size > generated_image_size:
        real_images = [resize_image_to(image, generated_image_size) for image in real_images]
    grid_images = [torchvision.utils.make_grid([original_image, generated_image]) for original_image, generated_image in zip(real_images, generated_images)]
    return grid_images, captions
@@ -279,6 +274,7 @@ def train(
    trainer = DecoderTrainer(
        decoder=decoder,
        accelerator=accelerator,
        dataloaders=dataloaders,
        **kwargs
    )
@@ -289,9 +285,8 @@ def train(
    sample = 0
    samples_seen = 0
    val_sample = 0
    step = lambda: int(trainer.step.item())
-    if tracker.loader is not None:
+    if tracker.can_recall:
        start_epoch, validation_losses, next_task, recalled_sample, samples_seen = recall_trainer(tracker, trainer)
        if next_task == 'train':
            sample = recalled_sample
@@ -304,6 +299,8 @@ def train(
    if not exists(unet_training_mask):
        # Then the unet mask should be true for all unets in the decoder
        unet_training_mask = [True] * trainer.num_unets
    first_training_unet = min(index for index, mask in enumerate(unet_training_mask) if mask)
    step = lambda: int(trainer.num_steps_taken(unet_number=first_training_unet+1))
    assert len(unet_training_mask) == trainer.num_unets, f"The unet training mask should be the same length as the number of unets in the decoder. Got {len(unet_training_mask)} and {trainer.num_unets}"
    accelerator.print(print_ribbon("Generating Example Data", repeat=40))
@@ -361,6 +358,7 @@ def train(
                        else:
                            # Then we need to pass the text instead
                            tokenized_texts = tokenize(txt, truncate=True)
                            assert tokenized_texts.shape[0] == len(img), f"The number of texts ({tokenized_texts.shape[0]}) should be the same as the number of images ({len(img)})"
                            forward_params['text'] = tokenized_texts
                    loss = trainer.forward(img, **forward_params, unet_number=unet)
                    trainer.update(unet_number=unet)
@@ -419,7 +417,7 @@ def train(
            timer = Timer()
            accelerator.wait_for_everyone()
            i = 0
-            for i, (img, emb, txt) in enumerate(dataloaders["val"]):
+            for i, (img, emb, txt) in enumerate(dataloaders['val']):  # Use the accelerate prepared loader
                val_sample_length_tensor[0] = len(img)
                all_samples = accelerator.gather(val_sample_length_tensor)
                total_samples = all_samples.sum().item()
@@ -515,6 +513,7 @@ def create_tracker(accelerator: Accelerator, config: TrainDecoderConfig, config_
    }
    tracker: Tracker = tracker_config.create(config, accelerator_config, dummy_mode=dummy)
    tracker.save_config(config_path, config_name='decoder_config.json')
    tracker.add_save_metadata(state_dict_key='config', metadata=config.dict())
    return tracker
 def initialize_training(config: TrainDecoderConfig, config_path):
@@ -524,6 +523,20 @@ def initialize_training(config: TrainDecoderConfig, config_path):
    # Set up accelerator for configurable distributed training
    ddp_kwargs = DistributedDataParallelKwargs(find_unused_parameters=config.train.find_unused_parameters)
    accelerator = Accelerator(kwargs_handlers=[ddp_kwargs])
    if accelerator.num_processes > 1:
        # We are using distributed training and want to immediately ensure all can connect
        accelerator.print("Waiting for all processes to connect...")
        accelerator.wait_for_everyone()
        accelerator.print("All processes online and connected")
    # If we are in deepspeed fp16 mode, we must ensure learned variance is off
    if accelerator.mixed_precision == "fp16" and accelerator.distributed_type == accelerate_dataclasses.DistributedType.DEEPSPEED and config.decoder.learned_variance:
        raise ValueError("DeepSpeed fp16 mode does not support learned variance")
    if accelerator.process_index != accelerator.local_process_index and accelerator.distributed_type == accelerate_dataclasses.DistributedType.DEEPSPEED:
        # This is an invalid configuration until we figure out how to handle this
        raise ValueError("DeepSpeed does not support multi-node distributed training")
    # Set up data
    all_shards = list(range(config.data.start_shard, config.data.end_shard + 1))
@@ -546,7 +559,7 @@ def initialize_training(config: TrainDecoderConfig, config_path):
    # Create the decoder model and print basic info
    decoder = config.decoder.create()
-    num_parameters = sum(p.numel() for p in decoder.parameters())
+    get_num_parameters = lambda model, only_training=False: sum(p.numel() for p in model.parameters() if (p.requires_grad or not only_training))
    # Create and initialize the tracker if we are the master
    tracker = create_tracker(accelerator, config, config_path, dummy = rank!=0)
@@ -575,7 +588,10 @@ def initialize_training(config: TrainDecoderConfig, config_path):
    accelerator.print(print_ribbon("Loaded Config", repeat=40))
    accelerator.print(f"Running training with {accelerator.num_processes} processes and {accelerator.distributed_type} distributed training")
    accelerator.print(f"Training using {data_source_string}. {'conditioned on text' if conditioning_on_text else 'not conditioned on text'}")
-    accelerator.print(f"Number of parameters: {num_parameters}")
+    accelerator.print(f"Number of parameters: {get_num_parameters(decoder)} total; {get_num_parameters(decoder, only_training=True)} training")
    for i, unet in enumerate(decoder.unets):
        accelerator.print(f"Unet {i} has {get_num_parameters(unet)} total; {get_num_parameters(unet, only_training=True)} training")
    train(dataloaders, decoder, accelerator,
        tracker=tracker,
        inference_device=accelerator.device,
--- a/train_diffusion_prior.py
+++ b/train_diffusion_prior.py
@@ -126,9 +126,9 @@ def report_cosine_sims(
        # we are text conditioned, we produce an embedding from the tokenized text
        if text_conditioned:
-            text_embedding, text_encodings, text_mask = trainer.embed_text(text_data)
+            text_embedding, text_encodings = trainer.embed_text(text_data)
            text_cond = dict(
-                text_embed=text_embedding, text_encodings=text_encodings, mask=text_mask
+                text_embed=text_embedding, text_encodings=text_encodings
            )
        else:
            text_embedding = text_data
@@ -146,15 +146,12 @@ def report_cosine_sims(
        if text_conditioned:
            text_encodings_shuffled = text_encodings[rolled_idx]
            text_mask_shuffled = text_mask[rolled_idx]
        else:
            text_encodings_shuffled = None
            text_mask_shuffled = None
        text_cond_shuffled = dict(
            text_embed=text_embed_shuffled,
-            text_encodings=text_encodings_shuffled,
+            text_encodings=text_encodings_shuffled
            mask=text_mask_shuffled,
        )
        # prepare the text embedding
Author	SHA1	Message	Date
Phil Wang	4b912a38c6	0.26.2	2022-07-19 17:50:36 -07:00
Aidan Dempster	f97e55ec6b	Quality of life improvements for tracker savers (#210 ) The default save location is now none so if keys are not specified the corresponding checkpoint type is not saved. Models and checkpoints are now both saved with version number and the config used to create them in order to simplify loading. Documentation was fixed to be in line with current usage.	2022-07-19 17:50:18 -07:00
Phil Wang	291377bb9c	@jacobwjs reports dynamic thresholding works very well and 0.95 is a better value	2022-07-19 11:31:56 -07:00
Phil Wang	7f120a8b56	cleanup, CLI no longer necessary since Zion + Aidan have https://github.com/LAION-AI/dalle2-laion and colab notebook going	2022-07-19 09:47:44 -07:00
Phil Wang	8c003ab1e1	readme and citation	2022-07-19 09:36:45 -07:00
Phil Wang	723bf0abba	complete inpainting ability using inpaint_image and inpaint_mask passed into sample function for decoder	2022-07-19 09:26:55 -07:00
Phil Wang	d88c7ba56c	fix a bug with ddim and predict x0 objective	2022-07-18 19:04:26 -07:00
Phil Wang	3676a8ce78	comments	2022-07-18 15:02:04 -07:00
Phil Wang	da8e99ada0	fix sample bug	2022-07-18 13:50:22 -07:00
Phil Wang	6afb886cf4	complete imagen-like noise level conditioning	2022-07-18 13:43:57 -07:00
Phil Wang	c7fe4f2f44	project management	2022-07-17 17:27:44 -07:00
Phil Wang	a2ee3fa3cc	offer way to turn off initial cross embed convolutional module, for debugging upsampler artifacts	2022-07-15 17:29:10 -07:00
Phil Wang	a58a370d75	takes care of a grad strides error at https://github.com/lucidrains/DALLE2-pytorch/issues/196 thanks to @YUHANG-Ma	2022-07-14 15:28:34 -07:00
Phil Wang	1662bbf226	protect against random cropping for base unet	2022-07-14 12:49:43 -07:00
Phil Wang	5be1f57448	update	2022-07-14 12:03:42 -07:00
Phil Wang	c52ce58e10	update	2022-07-14 10:54:51 -07:00
Phil Wang	a34f60962a	let the neural network peek at the low resolution conditioning one last time before making prediction, for upsamplers	2022-07-14 10:27:04 -07:00
Phil Wang	0b40cbaa54	just always use nearest neighbor interpolation when resizing for low resolution conditioning, for https://github.com/lucidrains/DALLE2-pytorch/pull/181	2022-07-13 20:59:43 -07:00
Phil Wang	f141144a6d	allow for using classifier free guidance for some unets but not others, by passing in a tuple of cond_scale during sampling for decoder, just in case it is causing issues for upsamplers	2022-07-13 13:12:30 -07:00
Phil Wang	f988207718	hack around some inplace error, also make sure for openai clip text encoding, only tokens after eos_id is masked out	2022-07-13 12:56:02 -07:00
Phil Wang	b2073219f0	foolproof sampling for decoder to always use eval mode (and restore training state afterwards)	2022-07-13 10:21:00 -07:00
Phil Wang	cc0f7a935c	fix non pixel shuffle upsample	2022-07-13 10:16:02 -07:00
Phil Wang	95a512cb65	fix a potential bug with conditioning with blurred low resolution image, blur should be applied only 50% of the time	2022-07-13 10:11:49 -07:00
Phil Wang	972ee973bc	fix issue with ddim and normalization of lowres conditioning image	2022-07-13 09:48:40 -07:00
Phil Wang	79e2a3bc77	only use the stable layernorm for final output norm in transformer	2022-07-13 07:56:30 -07:00
Aidan Dempster	544cdd0b29	Reverted to using basic dataloaders (#205 ) Accelerate removes the ability to collate strings. Likely since it cannot gather strings.	2022-07-12 18:22:27 -07:00
Phil Wang	349aaca56f	add yet another transformer stability measure	2022-07-12 17:49:16 -07:00
Phil Wang	3ee3c56d2a	add learned padding tokens, same strategy as dalle1, for diffusion prior, and get rid of masking in causal transformer	2022-07-12 17:33:14 -07:00
Phil Wang	cd26c6b17d	0.22.3	2022-07-12 17:08:31 -07:00
Phil Wang	775abc4df6	add setting to attend to all text encodings regardless of padding, for diffusion prior	2022-07-12 17:08:12 -07:00
Phil Wang	11b1d533a0	make sure text encodings being passed in has the correct batch dimension	2022-07-12 16:00:19 -07:00
Phil Wang	e76e89f9eb	remove text masking altogether in favor of deriving from text encodings (padded text encodings must be pad value of 0.)	2022-07-12 15:40:31 -07:00
Phil Wang	bb3ff0ac67	protect against bad text mask being passed into decoder	2022-07-12 15:33:13 -07:00
Phil Wang	1ec4dbe64f	one more fix for text mask, if the length of the text encoding exceeds max_text_len, add an assert for better error msg	2022-07-12 15:01:46 -07:00
Phil Wang	e0835acca9	generate text mask within the unet and diffusion prior itself from the text encodings, if not given	2022-07-12 12:54:59 -07:00
Phil Wang	e055793e5d	shoutout for @MalumaDev	2022-07-11 16:12:35 -07:00
Phil Wang	1d9ef99288	add PixelShuffleUpsample thanks to @MalumaDev and @marunine for running the experiment and verifyng absence of checkboard artifacts	2022-07-11 16:07:23 -07:00
Phil Wang	bdd62c24b3	zero init final projection in unet, since openai and @crowsonkb are both doing it	2022-07-11 13:22:06 -07:00
Phil Wang	1f1557c614	make it so even if text mask is omitted, it will be derived based on whether text encodings are all 0s or not, simplify dataloading	2022-07-11 10:56:19 -07:00
Aidan Dempster	1a217e99e3	Unet parameter count is now shown (#202 )	2022-07-10 16:45:59 -07:00
Phil Wang	7ea314e2f0	allow for final l2norm clamping of the sampled image embed	2022-07-10 09:44:38 -07:00
Phil Wang	4173e88121	more accurate readme	2022-07-09 20:57:26 -07:00
Phil Wang	3dae43fa0e	fix misnamed variable, thanks to @nousr	2022-07-09 19:01:37 -07:00
Phil Wang	a598820012	do not noise for the last step in ddim	2022-07-09 18:38:40 -07:00
Phil Wang	4878762627	fix for small validation bug for sampling steps	2022-07-09 17:31:54 -07:00
Phil Wang	47ae17b36e	more informative error for something that tripped me up	2022-07-09 17:28:14 -07:00
Phil Wang	b7e22f7da0	complete ddim integration of diffusion prior as well as decoder for each unet, feature complete for https://github.com/lucidrains/DALLE2-pytorch/issues/157	2022-07-09 17:25:34 -07:00
Romain Beaumont	68de937aac	Fix decoder test by fixing the resizing output size (#197 )	2022-07-09 07:48:07 -07:00
Phil Wang	097afda606	0.18.0	2022-07-08 18:18:38 -07:00
Aidan Dempster	5c520db825	Added deepspeed support (#195 )	2022-07-08 18:18:08 -07:00
Phil Wang	3070610231	just force it so researcher can never pass in an image that is less than the size that is required for CLIP or CoCa	2022-07-08 18:17:29 -07:00
Aidan Dempster	870aeeca62	Fixed issue where evaluation would error when large image was loaded (#194 )	2022-07-08 17:11:34 -07:00
Romain Beaumont	f28dc6dc01	setup simple ci (#193 )	2022-07-08 16:51:56 -07:00
Phil Wang	081d8d3484	0.17.0	2022-07-08 13:36:26 -07:00
Aidan Dempster	a71f693a26	Add the ability to auto restart the last run when started after a crash (#191 ) * Added autoresume after crash functionality to the trackers * Updated documentation * Clarified what goes in the autorestart object * Fixed style issues Unraveled conditional block Chnaged to using helper function to get step count	2022-07-08 13:35:40 -07:00
Phil Wang	d7bc5fbedd	expose num_steps_taken helper method on trainer to retrieve number of training steps of each unet	2022-07-08 13:00:56 -07:00
`@@ -1 +1 @@`
	`github: [lucidrains]`	`github: [nousr, Veldrovive, lucidrains]`
`@@ -1 +1 @@`
	`__version__ = '0.16.18'`	`__version__ = '0.26.2'`