Skip to content

Config rope_scaling and text_config.rope_scaling might be the same or different dict objects #41020

@albertvillanova

Description

@albertvillanova

System Info

  • transformers version: 4.57.0.dev0
  • Platform: Linux-5.15.0-153-generic-x86_64-with-glibc2.31
  • Python version: 3.12.9
  • Huggingface_hub version: 0.34.4
  • Safetensors version: 0.6.2
  • Accelerate version: 1.4.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.8.0+cu128 (NA)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: no

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

After investigation of an issue in trl, I found a weird behavior of transformers config rope_scaling: the config.rope_scaling (at the root config level) and config.text_config.rope_scaling (under text_config) might be the same or different dict objects depending on whether we pass text_config param to AutoConfig.from_pretrained

  • if we don't pass text_config param, the 2 rope_scaling point to the same dict object
  • if we pass text_config param, the 2 rope_scaling are different dict objects
In [1]: from transformers import AutoConfig

In [2]: model_id = "Qwen/Qwen2.5-VL-3B-Instruct"

In [3]: config1 = AutoConfig.from_pretrained(model_id)
In [4]: config1.text_config.rope_scaling
Out[4]: {'type': 'default', 'mrope_section': [16, 24, 24], 'rope_type': 'default'}
In [5]: config1.rope_scaling
Out[5]: {'type': 'default', 'mrope_section': [16, 24, 24], 'rope_type': 'default'}
In [6]: id(config1.text_config.rope_scaling)
Out[6]: 140211029392000
In [7]: id(config1.rope_scaling)
Out[7]: 140211029392000
# Both are the same dict object

In [8]: config2 = AutoConfig.from_pretrained(model_id, text_config={})
In [9]:config2.text_config.rope_scaling
Out[9]: {'type': 'default', 'mrope_section': [16, 24, 24], 'rope_type': 'default'}
In [10]: config2.rope_scaling
Out[10]: {'type': 'default', 'mrope_section': [16, 24, 24], 'rope_type': 'default'}
In [11]: id(config2.text_config.rope_scaling)
Out[11]: 140210801100608
In [12]: id(config2.rope_scaling)
Out[12]: 140211029786688
# Both are different dict objects

Is this expected?

We discovered this while investigating why changing (after initialization) the config.text_config.rope_scaling will or will not change the config.rope_scaling as well. See related comment in trl PR:

Expected behavior

  • Either they should be the same dict object in any case
  • Or they should be different dict objects in any case

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions