Add the ability to auto restart the last run when started after a crash (#191)

* Added autoresume after crash functionality to the trackers

* Updated documentation

* Clarified what goes in the autorestart object

* Fixed style issues

Unraveled conditional block

Chnaged to using helper function to get step count
This commit is contained in:
Aidan Dempster
2022-07-08 16:35:40 -04:00
committed by GitHub
parent d7bc5fbedd
commit a71f693a26
6 changed files with 104 additions and 18 deletions

View File

@@ -289,9 +289,9 @@ def train(
sample = 0
samples_seen = 0
val_sample = 0
step = lambda: int(trainer.step.item())
step = lambda: int(trainer.num_steps_taken(unet_number=1))
if tracker.loader is not None:
if tracker.can_recall:
start_epoch, validation_losses, next_task, recalled_sample, samples_seen = recall_trainer(tracker, trainer)
if next_task == 'train':
sample = recalled_sample