-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[transformation][CPU][GPU] support 3D rope #31949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[transformation][CPU][GPU] support 3D rope #31949
Conversation
Signed-off-by: HU Yuan2 <[email protected]>
Signed-off-by: HU Yuan2 <[email protected]>
Signed-off-by: HU Yuan2 <[email protected]>
Signed-off-by: HU Yuan2 <[email protected]>
Signed-off-by: HU Yuan2 <[email protected]>
1557419
to
45eef10
Compare
Signed-off-by: HU Yuan2 <[email protected]>
Signed-off-by: HU Yuan2 <[email protected]>
339f91b
to
324cb6a
Compare
Signed-off-by: HU Yuan2 <[email protected]>
Signed-off-by: HU Yuan2 <[email protected]>
22cdecd
to
0cf277a
Compare
Signed-off-by: HU Yuan2 <[email protected]>
b6e5d19
to
1bbd7e8
Compare
Signed-off-by: HU Yuan2 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for 3D RoPE (Rotary Positional Embeddings) operation for both CPU and GPU implementations, specifically targeting Qwen2.5-vl-3b and Qwen2-vl-7b models. This enhancement extends the existing RoPE functionality to handle 3-dimensional tensor operations.
- Introduces new RoPEFusionVIT3D transformation pass for recognizing and fusing 3D RoPE patterns
- Adds support for 3D tensor handling in both CPU and GPU RoPE implementations
- Implements comprehensive test coverage for the new 3D RoPE functionality
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
src/tests/functional/plugin/shared/src/subgraph/rotary_pos_emb.cpp | Adds RoPETestQwenVL class and test implementation for 3D RoPE patterns |
src/tests/functional/plugin/shared/include/subgraph_tests/rotary_pos_emb.hpp | Declares new test class and types for QwenVL 3D RoPE testing |
src/tests/functional/plugin/shared/include/shared_test_classes/subgraph/rotary_pos_emb.hpp | Defines RoPETestQwenVL class structure and parameter types |
src/plugins/intel_gpu/tests/functional/shared_tests_instances/subgraph_tests/rotary_pos_emb.cpp | Instantiates GPU-specific test cases for QwenVL 3D RoPE |
src/plugins/intel_gpu/src/graph/impls/ocl_v2/rope_opt.cpp | Updates GPU kernel work group sizing for 3D RoPE support |
src/plugins/intel_gpu/src/graph/impls/ocl_v2/rope_opt.cl | Adds OpenCL kernel support for 3D cos/sin cache indexing |
src/plugins/intel_cpu/tests/functional/shared_tests_instances/subgraph_tests/rotary_pos_emb.cpp | Instantiates CPU-specific test cases for QwenVL 3D RoPE |
src/plugins/intel_cpu/src/nodes/rope.cpp | Implements CPU 3D tensor reshaping and handling logic |
src/common/transformations/tests/common_optimizations/fuse_rotary_positional_embeddings.cpp | Adds comprehensive tests for VIT 3D RoPE transformation |
src/common/transformations/src/transformations/common_optimizations/fuse_rotary_positional_embeddings.cpp | Implements RoPEFusionVIT3D transformation pass |
src/common/transformations/include/transformations/common_optimizations/fuse_rotary_positional_embeddings.hpp | Declares RoPEFusionVIT3D class |
src/common/transformations/include/ov_ops/rotary_positional_embeddings.hpp | Adds support_3d_rope configuration flag |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
} | ||
|
||
void RoPETestQwenVL::SetUp() { | ||
const auto& [element_type, _targetDevice, splip_op_type] = this->GetParam(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable name contains a typo. 'splip_op_type' should be 'split_op_type'.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
InputShape cos_shape_value = {cos_shape, {Shape{80, 1, 80}}}; | ||
InputShape sin_shape_value = {sin_shape, {Shape{80, 1, 80}}}; | ||
init_input_shapes({input_shape_value, cos_shape_value, sin_shape_value}); | ||
function = buildROPE_QwenVL(element_type, input_shape, cos_shape, sin_shape, splip_op_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable name contains a typo. 'splip_op_type' should be 'split_op_type' to match the intended parameter name.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
uint sin_idx = cos_sin_idx; | ||
#else | ||
uint cos_idx = INPUT1_GET_INDEX(cos_sin_b, cos_sin_h, 0, 0); | ||
uint sin_idx = INPUT2_GET_INDEX(cos_sin_b, cos_sin_h, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line has trailing whitespace that should be removed.
uint sin_idx = INPUT2_GET_INDEX(cos_sin_b, cos_sin_h, 0, 0); | |
uint sin_idx = INPUT2_GET_INDEX(cos_sin_b, cos_sin_h, 0, 0); |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
auto x1 = NewGenSlice(x, 0, "half_ndims", 1, 2); | ||
auto x_rotate_half = pattern::wrap_type<v0::Concat>({x2neg, x1 | varsplit->output(0)}, {{"axis", -1}}); | ||
|
||
auto mul_cos = pattern::wrap_type<v1::Multiply>({x_or_cos1, x_or_cos2}, {{"auto_broadcast", "numpy"}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For mul_cos
, if you want to switch lefthand
and righthand
of the Multiply
maybe a or
is better
auto mul_cos1 = pattern::wrap_type<v1::Multiply>({x, cos}, {{"auto_broadcast", "numpy"}});
auto mul_cos2 = pattern::wrap_type<v1::Multiply>({cos, x}, {{"auto_broadcast", "numpy"}});
auto mul_cos = mul_cos1 | mul_cos2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, please also help to take a look at the comment #31949 (comment)
|
||
// check mul(x, cos) exists | ||
Output<Node> v_cos; | ||
if (pattern_map.at(x_or_cos1) == pattern_map.at(x)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If with |
of mul_cos
suggested above ,this check here could be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
this->register_matcher(m, callback); | ||
} | ||
|
||
ov::pass::RoPEFusionVIT3D::RoPEFusionVIT3D() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern matching is almost the same with RoPEFusionGPTNEOX
, the only difference is that the rank of input is changed from 4 to 3, do you think we could merge them together ? Just like RoPEFusionQwen(int split_output_id)
we could change to RoPEFusionGPTNEOX(int rank)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not easy to do now, we can leave this as an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangYiIntel Yes, the transformation code is copy from RoPEFusionGPTNEOX and just modify the rank from 4 to 3, but I don't know the background of RoPEFusionGPTNEOX, so I didn't merge them together, I think we need a review from transformation team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RoPEFusionGPTNEOX
is copied from CPU previous internal ROPE transformation, it has no special requirement but only describes the vanilla Rope with RotateHalf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will merge them together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already merged them
Signed-off-by: HU Yuan2 <[email protected]>
Signed-off-by: HU Yuan2 <[email protected]>
Signed-off-by: HU Yuan2 <[email protected]>
@mitruska , could you please review this PR, especially the transformations part? |
// each head. change input order to [batch, head_cnt, 4608] to support 2d rope | ||
bool is_qwen = false; // Qwen is special which overrides other setting | ||
bool use_rope_cache = false; // use precomputed RoPE cache for trigonometric values (cosine and sine) | ||
bool support_3d_rope = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As new attribute "support_3d_rope" has been added, the attribute should be included in visit attributes method as well:
openvino/src/common/transformations/src/ov_ops/rotary_positional_embeddings.cpp
Lines 87 to 105 in a03e4c1
bool RoPE::visit_attributes(ov::AttributeVisitor& visitor) { | |
INTERNAL_OP_SCOPE(internal_RoPE_visit_attributes); | |
visitor.start_structure("config"); | |
visitor.on_attribute("slice_start", m_config.slice_start); | |
visitor.on_attribute("slice_stop", m_config.slice_stop); | |
visitor.on_attribute("input_trans0213", m_config.input_trans0213); | |
visitor.on_attribute("output_trans0213", m_config.output_trans0213); | |
visitor.on_attribute("is_interleaved", m_config.is_interleaved); | |
visitor.on_attribute("rotary_ndims", m_config.rotary_ndims); | |
visitor.on_attribute("is_chatglm", m_config.is_chatglm); | |
visitor.on_attribute("support_2d_rope", m_config.support_2d_rope); | |
visitor.on_attribute("is_qwen", m_config.is_qwen); | |
visitor.on_attribute("use_rope_cache", m_config.use_rope_cache); | |
visitor.on_attribute("head_cnt", m_config.head_cnt); | |
visitor.on_attribute("head_size", m_config.head_size); | |
visitor.on_attribute("gather_position_arg_id", m_config.gather_position_arg_id); | |
visitor.finish_structure(); | |
return true; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zaixing-wang the new config support_3d_rope is used by GPU currently in this PR, could you check if we must add this new attribute, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
auto mul_cos1 = pattern::wrap_type<v1::Multiply>({x, cos}, {{"auto_broadcast", "numpy"}}); | ||
auto mul_cos2 = pattern::wrap_type<v1::Multiply>({cos, x}, {{"auto_broadcast", "numpy"}}); | ||
auto mul_cos = mul_cos1 | mul_cos2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a mechanism in transformations for commutative operations like Multiply, which match permutation of the pattern inputs, so this explicit "Or" may be redundant here. Please verify.
openvino/src/core/src/op/util/op_types.cpp
Lines 72 to 81 in a03e4c1
bool ov::op::util::is_commutative(const ov::Node* node) { | |
return ov::as_type<const ov::op::v1::Add>(node) != nullptr || | |
ov::as_type<const ov::op::v1::Maximum>(node) != nullptr || | |
ov::as_type<const ov::op::v1::Equal>(node) != nullptr || | |
ov::as_type<const ov::op::v1::NotEqual>(node) != nullptr || | |
ov::as_type<const ov::op::v1::LogicalAnd>(node) != nullptr || | |
ov::as_type<const ov::op::v0::Xor>(node) != nullptr || | |
ov::as_type<const ov::op::v1::LogicalXor>(node) != nullptr || | |
ov::as_type<const ov::op::v1::Minimum>(node) != nullptr || | |
ov::as_type<const ov::op::v1::Multiply>(node) != nullptr || |
openvino/src/core/src/pattern/matcher.cpp
Line 159 in a03e4c1
if (ov::op::util::is_commutative(graph_node)) { |
auto mul_cos1 = pattern::wrap_type<v1::Multiply>({x, cos}, {{"auto_broadcast", "numpy"}}); | |
auto mul_cos2 = pattern::wrap_type<v1::Multiply>({cos, x}, {{"auto_broadcast", "numpy"}}); | |
auto mul_cos = mul_cos1 | mul_cos2; | |
auto mul_cos = pattern::wrap_type<v1::Multiply>({x, cos}, {{"auto_broadcast", "numpy"}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. I modify this part as the comment of #31949 (comment), @mitruska could you also help to take a look at it thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is more universal, I think you could adopt this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert back to origin code
auto x = pattern::any_input(pattern::rank_equals(rank)); | ||
auto cos = pattern::any_input(pattern::rank_equals(rank)); | ||
auto t_sin = pattern::any_input(pattern::rank_equals(rank)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different ranks could be handled within a single pattern match, so the ranks to be fused could be passed as a list of ranks instead of single value. But it's rather a suggestion to be considered not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will take a look to how to add a pattern for different ranks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only find rank_equals and rank_more_than, I am not sure if need to add a new one, the rank also need be use later, and I don't know how to get it from x.
auto varsplit = pattern::wrap_type<v1::VariadicSplit>({x, rank - 1, {"half_ndims", "?"}});
openvino/src/core/src/pattern/op/pattern.cpp
Line 139 in 09d1712
op::Predicate rank_equals(const Dimension& expected_rank) { |
symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTNEOX>(4); | ||
symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTJ>(); | ||
symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTNEOX>(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the order of the pass registration here intentional and crucial for correct functionality?
I wonder why RoPEFusionGPTNEOX(3) is not right after RoPEFusionGPTNEOX(4).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because when it was a draft PR, the code is like this, the new registration is at the end.
symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTNEOX>();
symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTJ>();
symbolic_ctx_manager->register_pass<ov::pass::RoPERoPEFusionVIT3D>();
after fix this #31949 (comment), it become
symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTNEOX>(4);
symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTJ>();
symbolic_ctx_manager->register_pass<ov::pass::RoPEFusionGPTNEOX>(3);
I will add it right after RoPEFusionGPTNEOX(4)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
INSTANTIATE_TEST_SUITE_P(smoke_RoPEQwenVL, | ||
RoPETestQwenVL, | ||
::testing::Combine( | ||
::testing::Values(ov::element::f32), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add f16 test too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
LGTM for GPU plugin change |
Signed-off-by: HU Yuan2 <[email protected]>
Signed-off-by: HU Yuan2 <[email protected]>
0a0ebf6
to
ea00c5f
Compare
Signed-off-by: HU Yuan2 <[email protected]>
Details:
Tickets: