You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Load Checkpoints from Cloud Storage
Added support to load checkpoints stored in object storage (rather than just on the local disk). Closes#192.
- Refactored the run directory uploader to separate out object store related utilites to composer.utils.object_store (and added test coverage).
- Updated the checkpointer hparams to optionally take `composer.utils.object_store.ObjectStoreProviderHparams`, which would be used to download the checkpoint from storage.
- Updated the trainer init to propegate through this change.
* Libcloud intersphinx
* rebasing off of dev
* starting an LR sweep
* adding proper dataset and batch size
* 2.0e-3 LR causes NaNs, lowering lr
* changing adam
* adding SST-2
* adding validation tracking
* adding SST-2 -- training but not at the right accuracy
* cleaning up code & debugging why training loss is so large
* finalized YAML for SST-2, starting hparam sweeps
* updating hparams to sweep:
* finalized current setup for SST-2
* starting hparam sweeps on RTE
* adding support for warmup_ratio
* adding non-standard metrics
* adding support for duration as a time abstraction
* adding compatability with DataloaderSpec changes
* adding a linear learning rate decay
* adding linear LR warmup
* finalizing GLUE
* refactoring implementation to add regression tasks
* fixing checkpoint bug
* finalizing fine-tuning a checkpointed model
* fixing checkpoint bug
* adding validation
* adding mid-training
* starting LR sweep
* adding checkpointing feedback part 1
* fix validation interval
* address PR feedback
* address PR feedback
* adding save_checkpoint and load_checkpoint hparams interface
* adding save_checkpoint and load_checkpoint hparams interface
* yapf & pyright
* fixed error with logging pre-training validation loss
* cleaning up model forward pass
* cleaning up custom metrics
* renaming Checkpointer -> CheckpointSaver
* addressing pyright
* adding tests
* moving commits to BERT branch
* changing folder to be relative to run dir
* formatting
* adding tests
* adding initial YAML changes
* removing a copy of outdated files
* adding GLUE default params
* addressing pyright
* finalizing task-specific YAMLs
* code cleanup
* yapf
* adding license
* addressing tests
* formatting
* adding tests for the duration abstraction
* can i sue pyright for emotional damages?
* final formatting
* adding in finalized pre-training hyperparameters
* Update composer/models/bert/bert_hparams.py
Co-authored-by: Abhi Venigalla <[email protected]>
* Load Checkpoints from Cloud Storage
Added support to load checkpoints stored in object storage (rather than just on the local disk). Closes#192.
- Refactored the run directory uploader to separate out object store related utilites to composer.utils.object_store (and added test coverage).
- Updated the checkpointer hparams to optionally take `composer.utils.object_store.ObjectStoreProviderHparams`, which would be used to download the checkpoint from storage.
- Updated the trainer init to propegate through this change.
* Libcloud intersphinx
* addressing PR feedback
* changing checkpoints into a cloud URl
* addressing Landan's feedback
* filepath -> checkpoint in the YAMLs
* Fixed merge
* Removed auto-parsing s3 and gs urls, as libcloud requires authentication. Fixed tests.
* Flattened run directory uploader hparams
* Fixed object store provider hparams
* updating sampler to be composer.dist
* Added tqdm progress bars and chunk sizing paramterization
Refactored checkpoint storage
* Fix pyright
* Fixed timeout
* Fix checkpointing
* Fixed deepspeed checkpoints
* Cleaned up PR
* finalized checkpointing loading
* refactored metric to avoid lists
* addressing pyright
* updating YAMLs with checkpoints
* final change
* adding unit tests
* adding LICENSE
* addressing conflicts & tests
* isort
* removing finished TODOs
* adding new GPT-2 YAMLs
Co-authored-by: Ravi Rahman <[email protected]>
Co-authored-by: Moin Nadeem <[email protected]>
Co-authored-by: Abhi Venigalla <[email protected]>
0 commit comments