Skip to content

Conversation

littlebullGit
Copy link
Contributor

@littlebullGit littlebullGit commented Sep 22, 2025

This PR enhances callback to properly handle manual optimization scenarios, ensuring checkpoints reflect the intended model state and providing clear user guidance.

Fixes #20947

Key Changes:

  • Manual Optimization Support: Ensures checkpoints capture the model state before optimization when using manual optimization with every_n_train_steps.
  • User Warning: Adds a clear warning when pre-optimization state isn't saved, helping users understand the checkpoint behavior.
  • Documentation: Updates docstrings and examples to clarify the behavior with manual optimization.

Testing:

  • Added test cases to verify checkpoint behavior with manual optimization
  • Ensured backward compatibility with automatic optimization
  • Verified warning messages are shown in appropriate scenarios

📚 Documentation preview 📚: https://pytorch-lightning--21239.org.readthedocs.build/en/21239/

@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Sep 22, 2025
@littlebullGit littlebullGit force-pushed the fix/20947-checkpoint-manual-opt branch from 4f06495 to 16552e5 Compare September 22, 2025 03:53
@Borda Borda changed the title Fix ModelCheckpoint with manual optimization and every_n_train_steps Fix ModelCheckpoint with manual optimization and every_n_train_steps Sep 22, 2025
@littlebullGit littlebullGit reopened this Sep 22, 2025
@littlebullGit
Copy link
Contributor Author

littlebullGit commented Sep 23, 2025

The link error is (generated/CONTRIBUTING: line 6) broken https://medium.com/pytorch-lightning/quick-contribution-guide-86d977171b3a - 429 Client Error: Too Many Requests for url: https://medium.com/pytorch-lightning/quick-contribution-guide-86d977171b3a. Not related to my code. The other one is just timed out.
@Borda @SkafteNicki let me know how to proceed.

@SkafteNicki
Copy link
Collaborator

The link error is (generated/CONTRIBUTING: line 6) broken https://medium.com/pytorch-lightning/quick-contribution-guide-86d977171b3a - 429 Client Error: Too Many Requests for url: https://medium.com/pytorch-lightning/quick-contribution-guide-86d977171b3a. Not related to my code. The other one is just timed out.
@Borda @SkafteNicki let me know how to proceed.

Our CI is broken at the moment, nothing you can do. Please stand by while it being fixed.

@littlebullGit littlebullGit force-pushed the fix/20947-checkpoint-manual-opt branch from 9754a79 to 059625b Compare September 28, 2025 14:41
- Ensure checkpoints reflect the model state before optimization when using manual optimization
- Add warning when pre-optimization state isn't saved
- Update documentation to clarify the behavior with manual optimization

Fixes Lightning-AI#20947
@littlebullGit littlebullGit force-pushed the fix/20947-checkpoint-manual-opt branch from 059625b to 7672de3 Compare October 1, 2025 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pl Generic label for PyTorch Lightning package
Projects
None yet
2 participants