[tune] Cross-Node Recovery#3725
Conversation
|
Test FAILed. |
|
Test PASSed. |
|
Test FAILed. |
|
|
||
| def update_location(self, worker_ip): | ||
| """Sends the current log directory to the remote node.""" | ||
| if worker_ip != self._log_syncer.worker_ip: |
There was a problem hiding this comment.
Will this crash on non-autoscaler clusters?
Also, log a warning if sync is happening?
There was a problem hiding this comment.
Done; and no, won't crash (1sync_to_worker_if_possible1 catches it)
| self._log_syncer.sync_now(force=True) | ||
| self._log_syncer.wait() | ||
|
|
||
| def update_location(self, worker_ip): |
There was a problem hiding this comment.
update_location() sounds more harmless than this is, maybe sync_results_to_new_location()?
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
This is ready for another review. |
|
Test PASSed. |
|
Test PASSed. |
Augments trial restore to also check if the runner is at the same
location. If not, the checkpoint files are pushed onto the new location.
TODO:
We don't have any tests for this right now...