Skip to content

Conversation

bowenyang008
Copy link
Contributor

@bowenyang008 bowenyang008 commented Apr 17, 2025

What does this PR do?

Kick start support of FSDP2 checkpointing along with Trainer and State:

  1. Support same interface in State.fsdp_config between FSDP2Config and FSDPConfig so both config can be passed to Trainer/State. Currently FSDP2Config supports a minimum default interface methods/properties to make State functional
  2. Support FSDP2 based checkpointing by State
  3. Support loading checkpoints from FSDP1 artifacts to FSDP2

Tests

  1. Two unit tests to test FSDP2 checkpoint saving/loading and FSDP1 saving -> FSDP2 loading
  2. One unit test to test GradScaler

@bowenyang008 bowenyang008 changed the title Boweny/fsdp2/state checkpoint [FSDP2] Init FSDP2 based checkpointing Apr 17, 2025
@bowenyang008 bowenyang008 marked this pull request as ready for review April 17, 2025 06:32
Copy link
Contributor

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@bowenyang008 bowenyang008 merged commit 03ac462 into main Apr 17, 2025
14 checks passed
@bowenyang008 bowenyang008 deleted the boweny/fsdp2/state-checkpoint branch April 17, 2025 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants