🚨 [`v5`] Remove relative position embeddings (for bert like models) #41170

vasqu · 2025-09-25T18:28:04Z

These embedding types are barely used and make the modeling files just more complex without justifying their existence. Position embedding types still exist in a few models; this PR just addresses the relative_key(_query) ones.

Some stats:

None of the slow tests use them except bert
The respective models in those tests together have less than 2k downloads in the last month

cc @hmellor this should remove any clashes with the kwargs you encountered in vLLM :D

vasqu · 2025-09-25T18:28:50Z

run-slow: flava, instructblib, mra

vasqu · 2025-09-25T18:29:59Z

examples/modular-transformers/modeling_dummy_bert.py

This is mostly due to me forgetting to update them in my bert refactor PR --> big diff because the whole refactor is included (same for the roberta example)

Updated: Only includes the changes here now

github-actions · 2025-09-25T18:30:34Z

This comment contains run-slow, running the specified jobs:

models: ['models/flava', 'models/mra']
quantizations: [] ...

HuggingFaceDocBuilderDev · 2025-09-25T18:37:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu · 2025-09-25T18:40:53Z

run-slow: instructblip

github-actions · 2025-09-25T18:42:25Z

This comment contains run-slow, running the specified jobs:

models: ['models/instructblip']
quantizations: [] ...

vasqu · 2025-09-26T11:16:26Z

run-slow: bert, roberta, albert, mra, instructblip, blip_2, flava

github-actions · 2025-09-26T11:17:59Z

This comment contains run-slow, running the specified jobs:

models: ['models/albert', 'models/bert', 'models/blip_2', 'models/flava', 'models/instructblip', 'models/mra', 'models/roberta']
quantizations: [] ...

vasqu · 2025-09-26T11:31:00Z

Failing slow tests are the same as in main 👀

github-actions · 2025-09-30T15:10:30Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: albert, align, altclip, bert, bert_generation, big_bird, blip, blip_2, bridgetower, bros, camembert, canine, chinese_clip, clap, data2vec, dpr

zucchini-nlp

Thanks, super nice clean-up! 🧼

zucchini-nlp · 2025-09-30T17:01:54Z

src/transformers/models/bert/modeling_bert.py

        return embeddings


 def eager_attention_forward(


i think now we can copy bert from llama or another big model group? 👀 Keeping less sources of truth makes it easier to submit PRs

Let me make a follow-up PR for that, would like to sync bert and bart instead tho since llama would indicate causal masks which is not the case here + unnecessary gqa dependency from llama

Opened #41248 for the sync

zucchini-nlp · 2025-09-30T17:04:52Z

src/transformers/models/instructblip/modeling_instructblip.py

-        if self.position_embedding_type == "relative_key" or self.position_embedding_type == "relative_key_query":
-            seq_length = hidden_states.size()[1]
-            position_ids_l = torch.arange(seq_length, dtype=torch.long, device=hidden_states.device).view(-1, 1)
-            position_ids_r = torch.arange(seq_length, dtype=torch.long, device=hidden_states.device).view(1, -1)
-            distance = position_ids_l - position_ids_r
-            positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)
-            positional_embedding = positional_embedding.to(dtype=query_layer.dtype)  # fp16 compatibility
-
-            if self.position_embedding_type == "relative_key":
-                relative_position_scores = torch.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
-                attention_scores = attention_scores + relative_position_scores
-            elif self.position_embedding_type == "relative_key_query":
-                relative_position_scores_query = torch.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
-                relative_position_scores_key = torch.einsum("bhrd,lrd->bhlr", key_layer, positional_embedding)
-                attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key
-


happy to see it, I assumed that BLIP models use relative positions haha. Now it can support attention implementation API in qformer 🙌🏻

yea, it's honestly a bit baffling how many models have this while not using it at all 👀

vasqu commented Sep 25, 2025

View reviewed changes

vasqu added 4 commits September 25, 2025 20:48

remove from modeling files

967cb1d

remaining changes

e0569a5

style / copies

34c3605

revert deprecated models and fixup some models

0dbd18b

vasqu force-pushed the remove-relative-positions-bert-likes branch from 6046d27 to 0dbd18b Compare September 25, 2025 18:49

vasqu marked this pull request as ready for review September 26, 2025 11:14

Merge branch 'main' into remove-relative-positions-bert-likes

9cb62af

vasqu requested review from ArthurZucker, Cyrilvallez and zucchini-nlp September 26, 2025 11:17

vasqu added 2 commits September 30, 2025 17:09

Merge branch 'main' into remove-relative-positions-bert-likes

b95b5be

oops

c626431

vasqu added the for_v5? label Sep 30, 2025

zucchini-nlp approved these changes Oct 1, 2025

View reviewed changes

vasqu mentioned this pull request Oct 1, 2025

[v5] Sync Bert and Bart eager attention #41248

Draft

🚨 [v5] Remove relative position embeddings (for bert like models) #41170

Are you sure you want to change the base?

🚨 [v5] Remove relative position embeddings (for bert like models) #41170

Conversation

vasqu commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu commented Sep 25, 2025

Uh oh!

vasqu Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 25, 2025

Uh oh!

vasqu commented Sep 25, 2025

Uh oh!

github-actions bot commented Sep 25, 2025

Uh oh!

vasqu commented Sep 26, 2025

Uh oh!

github-actions bot commented Sep 26, 2025

Uh oh!

vasqu commented Sep 26, 2025

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

🚨 [`v5`] Remove relative position embeddings (for bert like models) #41170

🚨 [`v5`] Remove relative position embeddings (for bert like models) #41170

vasqu commented Sep 25, 2025 •

edited

Loading

vasqu Oct 1, 2025 •

edited

Loading