Skip to content

v0.31.0

Compare
Choose a tag to compare
@rithwik-db rithwik-db released this 28 May 17:30

What's New

1. PyTorch 2.7.0 Compatibility (#3850)

We've added support for PyTorch 2.7.0 and created a Dockerfile to support PyTorch 2.7.0 + CUDA 12.8. The current Composer image supports PyTorch 2.7.0 + CUDA 12.6.3.

2. Experimental FSDP2 support has been added to Trainer (#3852)

Experimental FSDP2 support was added to Trainer with:

  • auto_wrap based on _fsdp_wrap_fn and/or _fsdp_wrap attributes within the model (#3826)
  • Activation checkpointing and CPU offloading (#3832)
  • Meta initialization (#3852)

Note: Not all features are supported yet (e.g. automicrobatching, monolithic checkpointing)

Usage:

Add FSDP_VERSION=2 as an environment variable and set your FSDP2 config (parallelism_config) as desired. The full set of available attributes can be found here.

Bug Fixes

  • Resolve a memory hang issue in Mlflow monitor process (#3830)

What's Changed

New Contributors

Full Changelog: v0.30.0...v0.31.0