Add a dataloader for training the decoder (#57)

* Added dataloader and updated requirements * Added option to set embedding shard width separately from webdataset shard length. There must be a better way to do this. * Changed embedding loader to read using fsspec * Moved the loader into a more compatible location * Removed unnecessary package * Fixed typo (Embeding -> Embedding) * Simplified example embedding finder code to remove unnecessary get_file_list function * Added example usage of ImageEmbeddingDataset * Changed the name of create_dataloader to be more verbose Added a dataloaders __init__.py
2025-12-19 09:44:19 +01:00 · 2022-05-05 10:08:45 -04:00
parent 896f19786d
commit 15acc03bd4
4 changed files with 216 additions and 3 deletions
--- a/setup.py
+++ b/setup.py
@@ -34,9 +34,10 @@ setup(
    'torchvision',
    'tqdm',
    'vector-quantize-pytorch',
-    'webdataset',
-    'x-clip>=0.5.1',
-    'youtokentome'
+    'x-clip>=0.4.4',
+    'youtokentome',
+    'webdataset>=0.2.5',
+    'fsspec>=2022.1.0'
  ],
  classifiers=[
    'Development Status :: 4 - Beta',