Skip to content

Conversation

rithwik-db
Copy link
Contributor

@rithwik-db rithwik-db commented Apr 25, 2025

What does this PR do?

Adding activation checkpointing and activation cpu offloading using the same method as TorchTitan in their examples (which aligns with our previous approach using checkpoint_wrapper)

Added tests to check if checkpointing and cpu offloading is working as expected.

@rithwik-db rithwik-db requested a review from bowenyang008 April 25, 2025 19:49
@rithwik-db rithwik-db force-pushed the activation_checkpointing branch from 92e948a to 470b3d7 Compare April 29, 2025 20:18
@rithwik-db rithwik-db force-pushed the activation_checkpointing branch from f047d18 to beee38f Compare April 29, 2025 23:09
@rithwik-db rithwik-db changed the title [WIP] Activation Checkpointing FSDP2 Activation Checkpointing and Offloading for FSDP2 Apr 29, 2025
@rithwik-db rithwik-db requested a review from bowenyang008 May 1, 2025 20:42
Copy link
Contributor

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment about whether the validation is really necessary, otherwise lgtm

@rithwik-db rithwik-db requested a review from bowenyang008 May 2, 2025 00:35
@rithwik-db rithwik-db merged commit 570fd2e into main May 2, 2025
14 checks passed
@rithwik-db rithwik-db deleted the activation_checkpointing branch May 2, 2025 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants