Skip to content

Conversation

yatarkan
Copy link
Contributor

@yatarkan yatarkan commented Jun 11, 2025

Ticket: CVS-169069

@github-actions github-actions bot added the category: tokenizers Tokenizer class or submodule update label Jun 11, 2025
@Wovchena
Copy link
Collaborator

ov::Core core;

#ifdef _WIN32
const wchar_t* ov_tokenizer_path_w = _wgetenv(ScopedVar::ENVIRONMENT_VARIABLE_NAME_W);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can replace wchat_t * and char * with std::filesystem::path after the PR#30938 is merged into OpenVINO. This PR enables the core.add_extension() to support Unicode path via std::filesystem::path.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still required to specify the way to read the env var so it doesn't really matter if it's casted to wsting or path later. Feel free to submit a patch after OV's PR is merged.

@yatarkan yatarkan marked this pull request as ready for review June 16, 2025 11:46
@yatarkan yatarkan requested review from Wovchena and Copilot June 16, 2025 11:47
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes path handling for tokenizers by introducing proper wide-character support on Windows.

  • Uses wide-character environment variables and Windows API functions.
  • Updates get_ov_genai_library_path to return a std::filesystem::path.
  • Adjusts tokenizer.cpp to handle wide-character environment variables in both core extension initialization and TokenizerImpl.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/cpp/src/tokenizer/tokenizers_path.hpp Added wide-character environment variable constant and updated Windows API calls.
src/cpp/src/tokenizer/tokenizers_path.cpp Updated get_ov_genai_library_path to use wide-character API functions and return a filesystem path.
src/cpp/src/tokenizer/tokenizer.cpp Modified environment variable usage and extension-loading code to support unicode paths.
Comments suppressed due to low confidence (2)

src/cpp/src/tokenizer/tokenizers_path.cpp:42

  • The cast reinterpret_cast(get_ov_genai_library_path) might not correctly convey that a function pointer is being passed; consider using a cast that better reflects the intended address pointer type for GetModuleHandleExW.
if (!GetModuleHandleExW(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS | GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,

src/cpp/src/tokenizer/tokenizer.cpp:265

  • There is no null check for the return value of _wgetenv before converting it to a std::wstring; adding an assertion or error handling here would prevent potential null pointer dereference.
const wchar_t* ov_tokenizer_path_w = _wgetenv(ScopedVar::ENVIRONMENT_VARIABLE_NAME_W);

@Wovchena Wovchena enabled auto-merge June 16, 2025 14:08
@Wovchena Wovchena added this pull request to the merge queue Jun 16, 2025
Merged via the queue into openvinotoolkit:master with commit e6f12e3 Jun 16, 2025
85 of 87 checks passed
yatarkan added a commit to yatarkan/openvino.genai that referenced this pull request Jun 20, 2025
Wovchena pushed a commit that referenced this pull request Jun 24, 2025
Port "Fix paths with unicode for tokenizers (#2337)" to
`releases/2025/2` branch

Ticket: CVS-169069
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: tokenizers Tokenizer class or submodule update port to 2025.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants