Add a dataloader for training the decoder (#57)

* Added dataloader and updated requirements

* Added option to set embedding shard width separately from webdataset shard length.
There must be a better way to do this.

* Changed embedding loader to read using fsspec

* Moved the loader into a more compatible location

* Removed unnecessary package

* Fixed typo (Embeding -> Embedding)

* Simplified example embedding finder code to remove unnecessary get_file_list function

* Added example usage of ImageEmbeddingDataset

* Changed the name of create_dataloader to be more verbose
Added a dataloaders __init__.py
This commit is contained in:
Aidan Dempster
2022-05-05 10:08:45 -04:00
committed by GitHub
parent 896f19786d
commit 15acc03bd4
4 changed files with 216 additions and 3 deletions

View File

@@ -34,9 +34,10 @@ setup(
'torchvision',
'tqdm',
'vector-quantize-pytorch',
'webdataset',
'x-clip>=0.5.1',
'youtokentome'
'x-clip>=0.4.4',
'youtokentome',
'webdataset>=0.2.5',
'fsspec>=2022.1.0'
],
classifiers=[
'Development Status :: 4 - Beta',