-
Notifications
You must be signed in to change notification settings - Fork 30.6k
fix qwen text config #41158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix qwen text config #41158
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Test failures are not related |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arf, very sad we did not use subconfigs when the model was added 🥲
super().__init__(**kwargs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should stay after setting the subconfigs no? Otherwise the __setattr__
won't work as it won't see the subcinfigs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I added it for extra kwargs that users can add in a flat dict. Otherwise those are getting serialized in text config because we pass all kwargs to the text config
For usage it has no effects, only if someone has non-common kwarg which is not supposed to be in text config. I think supporting correct attention is more important than a rare edge case, so I'll revert it
return setattr(text_config, key, value) | ||
|
||
return super().__setattr__(key, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no return
for setattr
!
[For maintainers] Suggested jobs to run (before merge) run-slow: glm4v, glm4v_moe, qwen2_5_vl, qwen2_vl |
* fix qwen text config * fix tests * fix one more test * address comments
What does this PR do?
Fixes #41020 and ensures constency in Qwen-VL text config. Side note: prev we had flat dict structure in Qwen and for BC passed kwargs to
super()
and totext_config
. This caused confusion in TRL which apparently resets some text attributes manually when trainingIn this PR, Qwen will set/get text related attributes only through text config. The attributes are obtainable from nested config as
config.text_config.vocab_size
and from root asconfig.vocab_size
(BC)