-
Notifications
You must be signed in to change notification settings - Fork 30.5k
Open
Labels
Description
System Info
transformers
version: 4.57.0.dev0- Platform: Linux-5.15.0-153-generic-x86_64-with-glibc2.31
- Python version: 3.12.9
- Huggingface_hub version: 0.34.4
- Safetensors version: 0.6.2
- Accelerate version: 1.4.0
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.8.0+cu128 (NA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: no
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
After investigation of an issue in trl
, I found a weird behavior of transformers
config rope_scaling
: the config.rope_scaling
(at the root config level) and config.text_config.rope_scaling
(under text_config
) might be the same or different dict objects depending on whether we pass text_config
param to AutoConfig.from_pretrained
- if we don't pass
text_config
param, the 2rope_scaling
point to the same dict object - if we pass
text_config
param, the 2rope_scaling
are different dict objects
In [1]: from transformers import AutoConfig
In [2]: model_id = "Qwen/Qwen2.5-VL-3B-Instruct"
In [3]: config1 = AutoConfig.from_pretrained(model_id)
In [4]: config1.text_config.rope_scaling
Out[4]: {'type': 'default', 'mrope_section': [16, 24, 24], 'rope_type': 'default'}
In [5]: config1.rope_scaling
Out[5]: {'type': 'default', 'mrope_section': [16, 24, 24], 'rope_type': 'default'}
In [6]: id(config1.text_config.rope_scaling)
Out[6]: 140211029392000
In [7]: id(config1.rope_scaling)
Out[7]: 140211029392000
# Both are the same dict object
In [8]: config2 = AutoConfig.from_pretrained(model_id, text_config={})
In [9]:config2.text_config.rope_scaling
Out[9]: {'type': 'default', 'mrope_section': [16, 24, 24], 'rope_type': 'default'}
In [10]: config2.rope_scaling
Out[10]: {'type': 'default', 'mrope_section': [16, 24, 24], 'rope_type': 'default'}
In [11]: id(config2.text_config.rope_scaling)
Out[11]: 140210801100608
In [12]: id(config2.rope_scaling)
Out[12]: 140211029786688
# Both are different dict objects
Is this expected?
We discovered this while investigating why changing (after initialization) the config.text_config.rope_scaling
will or will not change the config.rope_scaling
as well. See related comment in trl
PR:
Expected behavior
- Either they should be the same dict object in any case
- Or they should be different dict objects in any case