-
Notifications
You must be signed in to change notification settings - Fork 284
Fix paths with unicode for tokenizers #2337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix paths with unicode for tokenizers #2337
Conversation
For your reference. this is how it was done for argv in samples: https://github.com/openvinotoolkit/openvino/blob/c168ebba59ec796e69f0b2b927cdb28bb8d9960c/samples/cpp/common/utils/include/samples/common.hpp#L47 |
ov::Core core; | ||
|
||
#ifdef _WIN32 | ||
const wchar_t* ov_tokenizer_path_w = _wgetenv(ScopedVar::ENVIRONMENT_VARIABLE_NAME_W); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can replace wchat_t *
and char *
with std::filesystem::path
after the PR#30938 is merged into OpenVINO. This PR enables the core.add_extension()
to support Unicode path via std::filesystem::path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's still required to specify the way to read the env var so it doesn't really matter if it's casted to wsting or path later. Feel free to submit a patch after OV's PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes path handling for tokenizers by introducing proper wide-character support on Windows.
- Uses wide-character environment variables and Windows API functions.
- Updates get_ov_genai_library_path to return a std::filesystem::path.
- Adjusts tokenizer.cpp to handle wide-character environment variables in both core extension initialization and TokenizerImpl.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
File | Description |
---|---|
src/cpp/src/tokenizer/tokenizers_path.hpp | Added wide-character environment variable constant and updated Windows API calls. |
src/cpp/src/tokenizer/tokenizers_path.cpp | Updated get_ov_genai_library_path to use wide-character API functions and return a filesystem path. |
src/cpp/src/tokenizer/tokenizer.cpp | Modified environment variable usage and extension-loading code to support unicode paths. |
Comments suppressed due to low confidence (2)
src/cpp/src/tokenizer/tokenizers_path.cpp:42
- The cast reinterpret_cast(get_ov_genai_library_path) might not correctly convey that a function pointer is being passed; consider using a cast that better reflects the intended address pointer type for GetModuleHandleExW.
if (!GetModuleHandleExW(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS | GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
src/cpp/src/tokenizer/tokenizer.cpp:265
- There is no null check for the return value of _wgetenv before converting it to a std::wstring; adding an assertion or error handling here would prevent potential null pointer dereference.
const wchar_t* ov_tokenizer_path_w = _wgetenv(ScopedVar::ENVIRONMENT_VARIABLE_NAME_W);
e6f12e3
Ticket: CVS-169069
Port "Fix paths with unicode for tokenizers (#2337)" to `releases/2025/2` branch Ticket: CVS-169069
Ticket: CVS-169069