Skip to content

Conversation

popovaan
Copy link
Contributor

ov::Tensor slices can be removed from EncodedImage, as it is used during resampling, which is currently a part of encode(), so there's no need to keep slices in encode() output.

Tocket: CVS-167405

@Wovchena Wovchena requested a review from Copilot May 23, 2025 07:56
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the slices tensor and its size from the EncodedImage struct, replacing them with a lightweight slices_shape, and updates related functions to pass slices and target_size through as parameters rather than as part of the struct.

  • Remove ov::Tensor slices and ImageSize slices_size from EncodedImage
  • Add ov::Shape slices_shape to EncodedImage
  • Update llava_image_embed_make_with_bytes_slice, encode, and resample_encoded_image to forward slices and target_size, and populate slices_shape

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/cpp/src/visual_language/vision_encoder.hpp Removed slices/slices_size, added slices_shape field
src/cpp/src/visual_language/minicpm/classes.hpp Updated resample_encoded_image signature to accept slices and target_sizes
src/cpp/src/visual_language/minicpm/classes.cpp Extended llava_image_embed_make_with_bytes_slice signature and body; updated encode and resample_encoded_image to use new parameters and set slices_shape; updated downstream slice-logic to use slices_shape
Comments suppressed due to low confidence (1)

src/cpp/src/visual_language/minicpm/classes.hpp:41

  • [nitpick] Rename parameter target_sizes to target_size to match its singular usage and improve readability.
ResampledImage resample_encoded_image(const EncodedImage& image, const ov::Tensor slices, const ImageSize& target_sizes);

@github-actions github-actions bot added the category: visual language Visual language pipeline label May 23, 2025
@as-suvorov as-suvorov requested a review from yatarkan May 23, 2025 08:54
@popovaan popovaan requested a review from Wovchena May 23, 2025 10:10
@Wovchena Wovchena requested a review from Copilot May 23, 2025 10:33
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the actual slice tensors from EncodedImage and instead carries only their shape metadata, refactoring downstream resampling to receive slice data via a new helper struct.

  • Deleted ov::Tensor slices and ImageSize slices_size, added ov::Shape slices_shape in EncodedImage.
  • Introduced ImageSliceResult to bundle slice tensor and target size, and updated llava_image_embed_make_with_bytes_slice to return it.
  • Changed resample_encoded_image signature and all call sites to accept explicit slice tensor and target size.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/cpp/src/visual_language/vision_encoder.hpp Removed slices/slices_size, added slices_shape doc.
src/cpp/src/visual_language/minicpm/classes.hpp Updated resample_encoded_image declaration to take slices + size.
src/cpp/src/visual_language/minicpm/classes.cpp Added ImageSliceResult, refactored slicing and resample flows.
Comments suppressed due to low confidence (2)

src/cpp/src/visual_language/minicpm/classes.hpp:29

  • [nitpick] The parameter name target_sizes is plural but its type is a single ImageSize; consider renaming it to target_size for clarity.
ResampledImage resample_encoded_image(const EncodedImage& image, const ov::Tensor& slices, const ImageSize& target_sizes);

src/cpp/src/visual_language/minicpm/classes.cpp:288

  • No unit tests cover the new multi-slice code path or verify that slices_shape is populated and used correctly. Consider adding tests for both single- and multi-slice scenarios.
std::pair<EncodedImage, ImageSliceResult> llava_image_embed_make_with_bytes_slice(clip_ctx& ctx_clip, const ov::Tensor& img, ov::InferRequest& encoder, int max_slice_nums, int scale_resolution, size_t patch_size, bool never_split) {

@as-suvorov as-suvorov enabled auto-merge May 27, 2025 11:18
@as-suvorov as-suvorov added this pull request to the merge queue May 27, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 27, 2025
@as-suvorov as-suvorov added this pull request to the merge queue May 27, 2025
Merged via the queue into openvinotoolkit:master with commit b161e5c May 27, 2025
66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: visual language Visual language pipeline
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants