Skip to content

Conversation

tiger100256-hu
Copy link
Contributor

@tiger100256-hu tiger100256-hu commented Sep 3, 2025

Details:

  • support 3D rope fusing in Qwen2.5-vl-3b and Qwen2-vl-7b

Tickets:

@github-actions github-actions bot added category: IE Tests OpenVINO Test: plugins and common category: GPU OpenVINO GPU plugin category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations labels Sep 3, 2025
Signed-off-by: HU Yuan2 <[email protected]>
@tiger100256-hu tiger100256-hu marked this pull request as ready for review September 5, 2025 03:08
@tiger100256-hu tiger100256-hu requested review from a team as code owners September 5, 2025 03:08
@tiger100256-hu tiger100256-hu requested review from mryzhov and zaixing-wang and removed request for a team September 5, 2025 03:08
Signed-off-by: HU Yuan2 <[email protected]>
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for 3D RoPE (Rotary Positional Embeddings) operation for both CPU and GPU implementations, specifically targeting Qwen2.5-vl-3b and Qwen2-vl-7b models. This enhancement extends the existing RoPE functionality to handle 3-dimensional tensor operations.

  • Introduces new RoPEFusionVIT3D transformation pass for recognizing and fusing 3D RoPE patterns
  • Adds support for 3D tensor handling in both CPU and GPU RoPE implementations
  • Implements comprehensive test coverage for the new 3D RoPE functionality

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/tests/functional/plugin/shared/src/subgraph/rotary_pos_emb.cpp Adds RoPETestQwenVL class and test implementation for 3D RoPE patterns
src/tests/functional/plugin/shared/include/subgraph_tests/rotary_pos_emb.hpp Declares new test class and types for QwenVL 3D RoPE testing
src/tests/functional/plugin/shared/include/shared_test_classes/subgraph/rotary_pos_emb.hpp Defines RoPETestQwenVL class structure and parameter types
src/plugins/intel_gpu/tests/functional/shared_tests_instances/subgraph_tests/rotary_pos_emb.cpp Instantiates GPU-specific test cases for QwenVL 3D RoPE
src/plugins/intel_gpu/src/graph/impls/ocl_v2/rope_opt.cpp Updates GPU kernel work group sizing for 3D RoPE support
src/plugins/intel_gpu/src/graph/impls/ocl_v2/rope_opt.cl Adds OpenCL kernel support for 3D cos/sin cache indexing
src/plugins/intel_cpu/tests/functional/shared_tests_instances/subgraph_tests/rotary_pos_emb.cpp Instantiates CPU-specific test cases for QwenVL 3D RoPE
src/plugins/intel_cpu/src/nodes/rope.cpp Implements CPU 3D tensor reshaping and handling logic
src/common/transformations/tests/common_optimizations/fuse_rotary_positional_embeddings.cpp Adds comprehensive tests for VIT 3D RoPE transformation
src/common/transformations/src/transformations/common_optimizations/fuse_rotary_positional_embeddings.cpp Implements RoPEFusionVIT3D transformation pass
src/common/transformations/include/transformations/common_optimizations/fuse_rotary_positional_embeddings.hpp Declares RoPEFusionVIT3D class
src/common/transformations/include/ov_ops/rotary_positional_embeddings.hpp Adds support_3d_rope configuration flag

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

}

void RoPETestQwenVL::SetUp() {
const auto& [element_type, _targetDevice, splip_op_type] = this->GetParam();
Copy link
Preview

Copilot AI Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable name contains a typo. 'splip_op_type' should be 'split_op_type'.

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

InputShape cos_shape_value = {cos_shape, {Shape{80, 1, 80}}};
InputShape sin_shape_value = {sin_shape, {Shape{80, 1, 80}}};
init_input_shapes({input_shape_value, cos_shape_value, sin_shape_value});
function = buildROPE_QwenVL(element_type, input_shape, cos_shape, sin_shape, splip_op_type);
Copy link
Preview

Copilot AI Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable name contains a typo. 'splip_op_type' should be 'split_op_type' to match the intended parameter name.

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

uint sin_idx = cos_sin_idx;
#else
uint cos_idx = INPUT1_GET_INDEX(cos_sin_b, cos_sin_h, 0, 0);
uint sin_idx = INPUT2_GET_INDEX(cos_sin_b, cos_sin_h, 0, 0);
Copy link
Preview

Copilot AI Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line has trailing whitespace that should be removed.

Suggested change
uint sin_idx = INPUT2_GET_INDEX(cos_sin_b, cos_sin_h, 0, 0);
uint sin_idx = INPUT2_GET_INDEX(cos_sin_b, cos_sin_h, 0, 0);

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

auto x1 = NewGenSlice(x, 0, "half_ndims", 1, 2);
auto x_rotate_half = pattern::wrap_type<v0::Concat>({x2neg, x1 | varsplit->output(0)}, {{"axis", -1}});

auto mul_cos = pattern::wrap_type<v1::Multiply>({x_or_cos1, x_or_cos2}, {{"auto_broadcast", "numpy"}});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For mul_cos, if you want to switch lefthand and righthand of the Multiply maybe a or is better

auto mul_cos1 = pattern::wrap_type<v1::Multiply>({x, cos}, {{"auto_broadcast", "numpy"}});
auto mul_cos2 = pattern::wrap_type<v1::Multiply>({cos, x}, {{"auto_broadcast", "numpy"}});
auto mul_cos = mul_cos1 | mul_cos2 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, please also help to take a look at the comment #31949 (comment)


// check mul(x, cos) exists
Output<Node> v_cos;
if (pattern_map.at(x_or_cos1) == pattern_map.at(x)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If with | of mul_cos suggested above ,this check here could be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

this->register_matcher(m, callback);
}

ov::pass::RoPEFusionVIT3D::RoPEFusionVIT3D() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern matching is almost the same with RoPEFusionGPTNEOX, the only difference is that the rank of input is changed from 4 to 3, do you think we could merge them together ? Just like RoPEFusionQwen(int split_output_id) we could change to RoPEFusionGPTNEOX(int rank)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not easy to do now, we can leave this as an improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangYiIntel Yes, the transformation code is copy from RoPEFusionGPTNEOX and just modify the rank from 4 to 3, but I don't know the background of RoPEFusionGPTNEOX, so I didn't merge them together, I think we need a review from transformation team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RoPEFusionGPTNEOX is copied from CPU previous internal ROPE transformation, it has no special requirement but only describes the vanilla Rope with RotateHalf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will merge them together

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already merged them

@maxnick
Copy link
Contributor

maxnick commented Sep 17, 2025

@mitruska , could you please review this PR, especially the transformations part?

// each head. change input order to [batch, head_cnt, 4608] to support 2d rope
bool is_qwen = false; // Qwen is special which overrides other setting
bool use_rope_cache = false; // use precomputed RoPE cache for trigonometric values (cosine and sine)
bool support_3d_rope = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As new attribute "support_3d_rope" has been added, the attribute should be included in visit attributes method as well:

bool RoPE::visit_attributes(ov::AttributeVisitor& visitor) {
INTERNAL_OP_SCOPE(internal_RoPE_visit_attributes);
visitor.start_structure("config");
visitor.on_attribute("slice_start", m_config.slice_start);
visitor.on_attribute("slice_stop", m_config.slice_stop);
visitor.on_attribute("input_trans0213", m_config.input_trans0213);
visitor.on_attribute("output_trans0213", m_config.output_trans0213);
visitor.on_attribute("is_interleaved", m_config.is_interleaved);
visitor.on_attribute("rotary_ndims", m_config.rotary_ndims);
visitor.on_attribute("is_chatglm", m_config.is_chatglm);
visitor.on_attribute("support_2d_rope", m_config.support_2d_rope);
visitor.on_attribute("is_qwen", m_config.is_qwen);
visitor.on_attribute("use_rope_cache", m_config.use_rope_cache);
visitor.on_attribute("head_cnt", m_config.head_cnt);
visitor.on_attribute("head_size", m_config.head_size);
visitor.on_attribute("gather_position_arg_id", m_config.gather_position_arg_id);
visitor.finish_structure();
return true;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zaixing-wang the new config support_3d_rope is used by GPU currently in this PR, could you check if we must add this new attribute, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 195 to 197
auto mul_cos1 = pattern::wrap_type<v1::Multiply>({x, cos}, {{"auto_broadcast", "numpy"}});
auto mul_cos2 = pattern::wrap_type<v1::Multiply>({cos, x}, {{"auto_broadcast", "numpy"}});
auto mul_cos = mul_cos1 | mul_cos2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a mechanism in transformations for commutative operations like Multiply, which match permutation of the pattern inputs, so this explicit "Or" may be redundant here. Please verify.

bool ov::op::util::is_commutative(const ov::Node* node) {
return ov::as_type<const ov::op::v1::Add>(node) != nullptr ||
ov::as_type<const ov::op::v1::Maximum>(node) != nullptr ||
ov::as_type<const ov::op::v1::Equal>(node) != nullptr ||
ov::as_type<const ov::op::v1::NotEqual>(node) != nullptr ||
ov::as_type<const ov::op::v1::LogicalAnd>(node) != nullptr ||
ov::as_type<const ov::op::v0::Xor>(node) != nullptr ||
ov::as_type<const ov::op::v1::LogicalXor>(node) != nullptr ||
ov::as_type<const ov::op::v1::Minimum>(node) != nullptr ||
ov::as_type<const ov::op::v1::Multiply>(node) != nullptr ||

if (ov::op::util::is_commutative(graph_node)) {

Suggested change
auto mul_cos1 = pattern::wrap_type<v1::Multiply>({x, cos}, {{"auto_broadcast", "numpy"}});
auto mul_cos2 = pattern::wrap_type<v1::Multiply>({cos, x}, {{"auto_broadcast", "numpy"}});
auto mul_cos = mul_cos1 | mul_cos2;
auto mul_cos = pattern::wrap_type<v1::Multiply>({x, cos}, {{"auto_broadcast", "numpy"}});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. I modify this part as the comment of #31949 (comment), @mitruska could you also help to take a look at it thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is more universal, I think you could adopt this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert back to origin code

Comment on lines 181 to 183
auto x = pattern::any_input(pattern::rank_equals(rank));
auto cos = pattern::any_input(pattern::rank_equals(rank));
auto t_sin = pattern::any_input(pattern::rank_equals(rank));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different ranks could be handled within a single pattern match, so the ranks to be fused could be passed as a list of ranks instead of single value. But it's rather a suggestion to be considered not a blocker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will take a look to how to add a pattern for different ranks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only find rank_equals and rank_more_than, I am not sure if need to add a new one, the rank also need be use later, and I don't know how to get it from x.

auto varsplit = pattern::wrap_type<v1::VariadicSplit>({x, rank - 1, {"half_ndims", "?"}});

op::Predicate rank_equals(const Dimension& expected_rank) {

Comment on lines 61 to 63
symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTNEOX>(4);
symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTJ>();
symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTNEOX>(3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the order of the pass registration here intentional and crucial for correct functionality?
I wonder why RoPEFusionGPTNEOX(3) is not right after RoPEFusionGPTNEOX(4).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because when it was a draft PR, the code is like this, the new registration is at the end.

    symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTNEOX>();
    symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTJ>();
    symbolic_ctx_manager->register_pass<ov::pass::RoPERoPEFusionVIT3D>();

after fix this #31949 (comment), it become

    symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTNEOX>(4);
    symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTJ>();
    symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTNEOX>(3);

I will add it right after RoPEFusionGPTNEOX(4)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

INSTANTIATE_TEST_SUITE_P(smoke_RoPEQwenVL,
RoPETestQwenVL,
::testing::Combine(
::testing::Values(ov::element::f32),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add f16 test too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@yeonbok
Copy link
Contributor

yeonbok commented Sep 24, 2025

LGTM for GPU plugin change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin category: GPU OpenVINO GPU plugin category: IE Tests OpenVINO Test: plugins and common category: transformations OpenVINO Runtime library - Transformations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants