Skip to content

Conversation

Flakes342
Copy link
Contributor

@Flakes342 Flakes342 commented Sep 25, 2025

What does this PR do?

Fixes #41164

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker @Rocketknight1

Description

This PR addresses a bug in StaticCache that can lead to torch.compile / TorchDynamo crashes when users pass:

  • Unexpected keyword arguments (**kwargs)
  • Misordered positional arguments
  • Non-boolean values for offloading or offload_only_non_sliding

Root causes / API issues:

  1. Positional flexibility vs. keyword expectations

    • StaticCache accepts config, max_cache_len as positional arguments, but offloading and offload_only_non_sliding were also sometimes passed positionally.
    • This can silently assign non-boolean types (e.g., a device object) to offloading, leading to CUDA/TorchDynamo crashes in update().
  2. Inconsistent argument handling across cache classes

    • Base Cache is strict: offloading and offload_only_non_sliding must be booleans.
    • StaticCache had a flexible signature with **kwargs, which could allow invalid arguments without immediate error.
    • Deprecated children sometimes passed extra arguments through **kwargs, increasing risk of silent misassignment.
  3. Potential user misuse

    • Users passing device/dtype arguments positionally (common in examples or from auto-generated code) could crash torch.compile unexpectedly.
    • Without clear error messages, this was non-obvious and difficult to debug.

Reproduction

image Using Qwen3-omni or similar models:
  • Here, device and compute_dtype were passed as positional arguments.
  • offloading is expected to be a bool, but now receives "cuda:0".
  • This triggers the crash: InternalTorchDynamoError: AttributeError: 'int' object has no attribute 'device'

Solution

  1. Validate types in StaticCache.__init__:
    Ensure offloading and offload_only_non_sliding are bool.
    Raise a clear TypeError if invalid.

  2. Catch unknown kwargs:
    Raise TypeError if users pass unrecognized keyword arguments (**kwargs) to prevent silent misassignment.

  3. Forward **kwargs safely in children:
    Deprecated children like OffloadedHybridCache are untouched for now, but any kwargs they pass are safely handled in StaticCache.

Additional Notes

  • We could optionally enforce keyword-only arguments via * to fully prevent positional misassignment, but this may break backward compatibility.
  • Deprecated cache classes (OffloadedHybridCache, etc.) are left untouched since they will be removed in v4.59.
  • A more advanced fix could include runtime coercion of devices to offloading=False to fully prevent user mistakes, but raising errors is safer and explicit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cache offloading check is incorrect
1 participant