Distributed Training of the Decoder (#121)

mirror of https://github.com/lucidrains/DALLE2-pytorch.git synced 2025-12-19 17:54:20 +01:00

* Converted decoder trainer to use accelerate

* Fixed issue where metric evaluation would hang on distributed mode

* Implemented functional saving
Loading still fails due to some issue with the optimizer

* Fixed issue with loading decoders

* Fixed issue with tracker config

* Fixed issue with amp
Updated logging to be more logical

* Saving checkpoint now saves position in training as well
Fixed an issue with running out of gpu space due to loading weights into the gpu twice

* Fixed ema for distributed training

* Fixed isue where get_pkg_version was reintroduced

* Changed decoder trainer to upload config as a file

Fixed issue where loading best would error

This commit is contained in:

Aidan Dempster

2022-06-19 12:25:54 -04:00

committed by

GitHub

parent e37072a48c

commit 58892135d9

7 changed files with 331 additions and 207 deletions

									
										1

dalle2_pytorch/utils.py
									
												View File
												
				@@ -1,4 +1,5 @@

				import time

				import importlib

				# time helpers

Distributed Training of the Decoder (#121)

1 dalle2_pytorch/utils.py Unescape Escape View File

1

dalle2_pytorch/utils.py

View File