Release v0.31.0 · mosaicml/composer

What's New

1. PyTorch 2.7.0 Compatibility (#3850)

We've added support for PyTorch 2.7.0 and created a Dockerfile to support PyTorch 2.7.0 + CUDA 12.8. The current Composer image supports PyTorch 2.7.0 + CUDA 12.6.3.

2. Experimental FSDP2 support has been added to `Trainer` (#3852)

Experimental FSDP2 support was added to Trainer with:

auto_wrap based on _fsdp_wrap_fn and/or _fsdp_wrap attributes within the model (#3826)
Activation checkpointing and CPU offloading (#3832)
Meta initialization (#3852)

Note: Not all features are supported yet (e.g. automicrobatching, monolithic checkpointing)

Usage:

Add FSDP_VERSION=2 as an environment variable and set your FSDP2 config (parallelism_config) as desired. The full set of available attributes can be found here.

Bug Fixes

Resolve a memory hang issue in Mlflow monitor process (#3830)

What's Changed

Bump Composer 0.31.0.dev0 by @KuuCi in #3808
Update Checkpoint Back-Compatibility Test by @KuuCi in #3810
Extend docker build matrix to add an entry for pytorch2.6+cu126 by @sirejdua-db in #3805
Bump databricks-sdk from 0.47.0 to 0.49.0 by @dependabot in #3814
Bump pypandoc from 1.14 to 1.15 by @dependabot in #3813
Update google-cloud-storage requirement from <3.0,>=2.0.0 to >=2.0.0,<4.0 by @dependabot in #3812
Update setuptools version by @irenedea in #3816
Kickstart FSDP2 by @bowenyang008 in #3806
Remove network calls to HF in CI by @dakinggg in #3817
Update psutil requirement from <7,>=5.8.0 to >=5.8.0,<8 by @dependabot in #3818
[FSDP2] Init FSDP2 based checkpointing by @bowenyang008 in #3824
Update torchmetrics requirement from <1.6.1,>=1.0 to >=1.0,<1.7.2 by @dependabot in #3829
Bump coverage[toml] from 7.6.8 to 7.8.0 by @dependabot in #3827
Bump yamllint from 1.35.1 to 1.37.0 by @dependabot in #3820
Update numpy requirement from <2.2.0,>=1.21.5 to >=1.21.5,<2.3.0 by @dependabot in #3828
Update optimizer params for fsdp2 by @rithwik-db in #3822
Change Mlflow monitor process from fork to spawn to reduce memory usage by @dakinggg in #3830
Ignore mlflow warning in test by @dakinggg in #3831
Bump HF hub version by @dakinggg in #3839
Bump databricks-sdk from 0.49.0 to 0.50.0 by @dependabot in #3834
Update transformers requirement from !=4.34.0,<4.51,>=4.11 to >=4.11,!=4.34.0,<4.52 by @dependabot in #3838
Eliminate dead code before torch version 2.4 by @bowenyang008 in #3833
Support submodule wrapping for FSDP2 according to model definition (with _fsdp_wrap and fsdp_wrap_fn) by @rithwik-db in #3826
Activation Checkpointing and Offloading for FSDP2 by @rithwik-db in #3832
Pin EFA installer version by @dakinggg in #3842
Add two legacy torch images to the container build matrix by @asfandyarq in #3841
Bump yamllint from 1.37.0 to 1.37.1 by @dependabot in #3845
Update packaging requirement from <24.3,>=21.3.0 to >=21.3.0,<25.1 by @dependabot in #3846
Bump cryptography from 44.0.0 to 44.0.3 by @dependabot in #3848
Upgrade yapf version by @dakinggg in #3840
Bump ipython from 8.11.0 to 8.36.0 by @dependabot in #3847
Update huggingface-hub requirement from <0.31,>=0.21.2 to >=0.21.2,<0.32 by @dependabot in #3851
Update EFA installer version by @dakinggg in #3844
Fix typos by @omahs in #3853
Integrate FSDP2 wrapper into Trainer by @bowenyang008 in #3852
Deprecate code eval utils by @dakinggg in #3854
FSDP2 time and verbose logging by @bowenyang008 in #3856
Fix RDMA installation by @dakinggg in #3857
Update ci-testing version to latest by @dakinggg in #3859
Updating composer to support Torch 2.7 by @rithwik-db in #3850
Cleanup version gating pre-2.6.0 by @rithwik-db in #3863

New Contributors

@sirejdua-db made their first contribution in #3805
@asfandyarq made their first contribution in #3841
@omahs made their first contribution in #3853

Full Changelog: v0.30.0...v0.31.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.31.0

What's New

1. PyTorch 2.7.0 Compatibility (#3850)

2. Experimental FSDP2 support has been added to `Trainer` (#3852)

Bug Fixes

What's Changed

New Contributors

Contributors

Uh oh!

v0.31.0

What's New

1. PyTorch 2.7.0 Compatibility (#3850)

2. Experimental FSDP2 support has been added to Trainer (#3852)

Bug Fixes

What's Changed

New Contributors

Contributors

Uh oh!

2. Experimental FSDP2 support has been added to `Trainer` (#3852)