Skip to content

Conversation

rithwik-db
Copy link
Contributor

@rithwik-db rithwik-db commented Apr 16, 2025

What does this PR do?

PR to update optimizer params to point to dtensors instead of regular model tensors. This should work fine for checkpointing as well since we fully shard before updating the optimizer's state (which is referenced in update_optimizer_modules)

@rithwik-db rithwik-db requested a review from bowenyang008 April 17, 2025 19:48
@rithwik-db rithwik-db changed the title update optimizer code for fsdp2 Update optimizer code for fsdp2 Apr 21, 2025
Copy link
Contributor

@bowenyang008 bowenyang008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @rithwik-db!

@rithwik-db rithwik-db requested a review from bowenyang008 April 22, 2025 01:23
Copy link
Contributor

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sashaDoubov
Copy link
Contributor

LGTM as well.

@dakinggg
Copy link
Contributor

Just FYI, I'm pretty sure the new code does not have this issue, but one quirk of the FSDP1 impl: #2320

@rithwik-db rithwik-db changed the title Update optimizer code for fsdp2 Update optimizer modules for fsdp2 Apr 23, 2025
@rithwik-db rithwik-db changed the title Update optimizer modules for fsdp2 Update optimizer params for fsdp2 Apr 23, 2025
@rithwik-db rithwik-db merged commit 29e7593 into main Apr 23, 2025
14 checks passed
@rithwik-db rithwik-db deleted the optimizer-inplace branch April 23, 2025 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants