Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
bffaa94
support multi images for vlm test
wgzintel May 12, 2025
cdd8f90
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 13, 2025
9edc628
code format
wgzintel May 13, 2025
e512c31
using ov::genai::images to convert images
wgzintel May 13, 2025
9fccf73
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel May 13, 2025
eed1dd7
fix none Type
wgzintel May 13, 2025
82aa6aa
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 13, 2025
503b74e
fix NoneTyPE in optimim-intel pipeline
wgzintel May 13, 2025
7c48e7e
Merge branch 'guozhong/support_multi_files_for_vlm_test' of https://g…
wgzintel May 13, 2025
fa68faa
Support read images from dir
wgzintel May 14, 2025
98e90e4
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 14, 2025
8e62754
fix cmake_list.txt
wgzintel May 15, 2025
3c8b091
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel May 15, 2025
5585335
Output token size in benchmark_genai.cpp
wgzintel May 15, 2025
2c13704
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 16, 2025
fd8c859
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 18, 2025
771e928
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 20, 2025
e902812
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 21, 2025
f70ab0d
print ov version
wgzintel May 21, 2025
b6240fd
using load_image() in optimum pipeline
wgzintel May 21, 2025
36e3dad
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel May 22, 2025
e004cea
Make it an error if prompt_file and prompt are given at the same time
wgzintel May 22, 2025
537e2d6
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel May 22, 2025
10c9940
revert get prompt from default args
wgzintel May 22, 2025
65f5c02
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel May 22, 2025
9bb9081
Remove redundant code
wgzintel May 22, 2025
e1e5326
get prompt token size from shape[1]
wgzintel May 22, 2025
8a88215
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel May 22, 2025
6688a09
remove if
wgzintel May 23, 2025
16faddc
Update samples/cpp/text_generation/benchmark_genai.cpp
wgzintel May 23, 2025
f48ae43
Update samples/cpp/visual_language_chat/benchmark_vlm.cpp
wgzintel May 23, 2025
9c7fa07
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 24, 2025
0936b50
Update benchmark_genai.py, benchmark_vlm.py and readme
wgzintel May 27, 2025
2dbba1b
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 27, 2025
9d8aa7f
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel May 30, 2025
d9000a9
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 2, 2025
41ce10c
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
peterchen-intel Jun 7, 2025
ac12a98
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 10, 2025
72e999f
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
peterchen-intel Jun 12, 2025
1348dd1
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
peterchen-intel Jun 13, 2025
9980152
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 13, 2025
8ac1fc5
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 13, 2025
f484244
Update samples/cpp/text_generation/read_prompt_from_file.cpp
wgzintel Jun 13, 2025
cb07e0b
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 13, 2025
507f48a
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 16, 2025
624e8fc
default values
wgzintel Jun 17, 2025
013fac4
Merge branch 'guozhong/support_multi_files_for_vlm_test' of https://g…
wgzintel Jun 17, 2025
689d264
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel Jun 17, 2025
5121bfb
Use the regular assignment for scheduler_config
wgzintel Jun 17, 2025
4537a2b
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel Jun 17, 2025
d4da4b6
Update samples/cpp/text_generation/read_prompt_from_file.cpp
wgzintel Jun 18, 2025
1b8ecba
Update tools/llm_bench/task/visual_language_generation.py
wgzintel Jun 18, 2025
9cc975b
Update tools/llm_bench/task/visual_language_generation.py
wgzintel Jun 18, 2025
cf910e0
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 18, 2025
3e3a321
print input image nums for vlm
wgzintel Jun 18, 2025
2c6872f
Remove the corresponding return
wgzintel Jun 18, 2025
1a13411
remove if input_data.get("media", None)
wgzintel Jun 18, 2025
9b692e3
Merge branch 'master' into guozhong/support_multi_files_for_vlm_test
wgzintel Jun 19, 2025
ed896f5
Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…
wgzintel Jun 19, 2025
61a2f22
Merge branch 'guozhong/support_multi_files_for_vlm_test' of https://g…
wgzintel Jun 19, 2025
62e627a
resolve conflict
wgzintel Jun 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion samples/cpp/text_generation/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ FetchContent_Declare(cxxopts
URL_HASH SHA256=523175f792eb0ff04f9e653c90746c12655f10cb70f1d5e6d6d9491420298a08)
FetchContent_MakeAvailable(cxxopts)

add_executable(benchmark_genai benchmark_genai.cpp)
add_executable(benchmark_genai benchmark_genai.cpp read_prompt_from_file.cpp)
target_link_libraries(benchmark_genai PRIVATE openvino::genai cxxopts::cxxopts)
set_target_properties(benchmark_genai PROPERTIES
# Ensure out of box LC_RPATH on macOS with SIP
Expand Down
3 changes: 2 additions & 1 deletion samples/cpp/text_generation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,8 @@ For more information how performance metrics are calculated please follow [perfo
```
#### Options
- `-m, --model`: Path to the model and tokenizers base directory.
- `-p, --prompt` (default: `"The Sky is blue because"`): The prompt to generate text.
- `-p, --prompt` (default: ''): The prompt to generate text. If without `-p` and `--pf`, the default prompt is `"The Sky is blue because"`
- `--pf, --prompt_file` Read prompt from file.
- `--nw, --num_warmup` (default: `1`): Number of warmup iterations.
- `--mt, --max_new_tokens` (default: `20`): Maximal number of new tokens.
- `-n, --num_iter` (default: `3`): Number of iterations.
Expand Down
34 changes: 31 additions & 3 deletions samples/cpp/text_generation/benchmark_genai.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@

#include "openvino/genai/llm_pipeline.hpp"
#include <cxxopts.hpp>
#include "read_prompt_from_file.h"

int main(int argc, char* argv[]) try {
cxxopts::Options options("benchmark_vanilla_genai", "Help command");

options.add_options()
("m,model", "Path to model and tokenizers base directory", cxxopts::value<std::string>())
("p,prompt", "Prompt", cxxopts::value<std::string>()->default_value("The Sky is blue because"))
("p,prompt", "Prompt", cxxopts::value<std::string>()->default_value(""))
("pf,prompt_file", "Read prompt from file", cxxopts::value<std::string>())
("nw,num_warmup", "Number of warmup iterations", cxxopts::value<size_t>()->default_value(std::to_string(1)))
("n,num_iter", "Number of iterations", cxxopts::value<size_t>()->default_value(std::to_string(3)))
("mt,max_new_tokens", "Maximal number of new tokens", cxxopts::value<size_t>()->default_value(std::to_string(20)))
Expand All @@ -30,7 +32,22 @@ int main(int argc, char* argv[]) try {
return EXIT_SUCCESS;
}

std::string prompt = result["prompt"].as<std::string>();
std::string prompt;
if (result.count("prompt") && result.count("prompt_file")) {
std::cout << "Prompt and prompt file should not exist together!" << std::endl;
return EXIT_FAILURE;
} else {
if (result.count("prompt_file")) {
prompt = utils::read_prompt(result["prompt_file"].as<std::string>());
} else {
prompt = result["prompt"].as<std::string>().empty() ? "The Sky is blue because" : result["prompt"].as<std::string>();
}
}
if (prompt.empty()) {
std::cout << "Prompt is empty!" << std::endl;
return EXIT_FAILURE;
}

const std::string models_path = result["model"].as<std::string>();
std::string device = result["device"].as<std::string>();
size_t num_warmup = result["num_warmup"].as<size_t>();
Expand All @@ -39,7 +56,17 @@ int main(int argc, char* argv[]) try {
ov::genai::GenerationConfig config;
config.max_new_tokens = result["max_new_tokens"].as<size_t>();

ov::genai::LLMPipeline pipe(models_path, device);
ov::genai::SchedulerConfig scheduler_config;
scheduler_config.enable_prefix_caching = false;
scheduler_config.max_num_batched_tokens = std::numeric_limits<std::size_t>::max();

std::cout << ov::get_openvino_version() << std::endl;

ov::genai::LLMPipeline pipe(models_path, device, ov::genai::scheduler_config(scheduler_config));

auto input_data = pipe.get_tokenizer().encode(prompt);
size_t prompt_token_size = input_data.input_ids.get_shape()[1];
std::cout << "Prompt token size:" << prompt_token_size << std::endl;

for (size_t i = 0; i < num_warmup; i++)
pipe.generate(prompt, config);
Expand All @@ -52,6 +79,7 @@ int main(int argc, char* argv[]) try {
}

std::cout << std::fixed << std::setprecision(2);
std::cout << "Output token size:" << res.perf_metrics.get_num_generated_tokens() << std::endl;
std::cout << "Load time: " << metrics.get_load_time() << " ms" << std::endl;
std::cout << "Generate time: " << metrics.get_generate_duration().mean << " ± " << metrics.get_generate_duration().std << " ms" << std::endl;
std::cout << "Tokenization time: " << metrics.get_tokenization_duration().mean << " ± " << metrics.get_tokenization_duration().std << " ms" << std::endl;
Expand Down
19 changes: 19 additions & 0 deletions samples/cpp/text_generation/read_prompt_from_file.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
// Copyright (C) 2023-2025 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

#include <iostream>
#include <fstream>
#include "read_prompt_from_file.h"

std::string utils::read_prompt(const std::string& file_path) {
std::ifstream file(file_path);
if (file.is_open()) {
std::stringstream buffer;
buffer << file.rdbuf();
return buffer.str();
} else {
std::stringstream error_message;
error_message << "Error opening prompt file: '" << file_path << "'";
throw std::runtime_error{error_message.str()};
}
}
11 changes: 11 additions & 0 deletions samples/cpp/text_generation/read_prompt_from_file.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@

// Copyright (C) 2023-2025 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

#pragma once

#include <sstream>

namespace utils {
std::string read_prompt(const std::string& file_path);
}
3 changes: 1 addition & 2 deletions samples/cpp/visual_language_chat/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,7 @@ install(TARGETS encrypted_model_vlm
EXCLUDE_FROM_ALL)

# create benchmark executable

add_executable(benchmark_vlm benchmark_vlm.cpp load_image.cpp)
add_executable(benchmark_vlm benchmark_vlm.cpp load_image.cpp ../text_generation/read_prompt_from_file.cpp)
target_include_directories(benchmark_vlm PRIVATE "${CMAKE_BINARY_DIR}")
target_link_libraries(benchmark_vlm PRIVATE openvino::genai cxxopts::cxxopts)
set_target_properties(benchmark_vlm PROPERTIES
Expand Down
3 changes: 2 additions & 1 deletion samples/cpp/visual_language_chat/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ benchmark_vlm [OPTIONS]
### Options

- `-m, --model`(default: `.`): Path to the model and tokenizers base directory.
- `-p, --prompt` (default: `What is on the image?`): The prompt to generate text.
- `-p, --prompt` (default: ''): The prompt to generate text. If without `-p` and `--pf`, the default prompt is `"What is on the image?"`
- `--pf, --prompt_file` Read prompt from file.
- `-i, --image` (default: `image.jpg`): Path to the image.
- `-nw, --num_warmup` (default: `1`): Number of warmup iterations.
- `-mt, --max_new_tokens` (default: `20`): Maximal number of new tokens.
Expand Down
48 changes: 38 additions & 10 deletions samples/cpp/visual_language_chat/benchmark_vlm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,15 @@

#include "load_image.hpp"
#include <openvino/genai/visual_language/pipeline.hpp>

#include "../text_generation/read_prompt_from_file.h"

int main(int argc, char* argv[]) try {
cxxopts::Options options("benchmark_vlm", "Help command");

options.add_options()
("m,model", "Path to model and tokenizers base directory", cxxopts::value<std::string>()->default_value("."))
("p,prompt", "Prompt", cxxopts::value<std::string>()->default_value("What is on the image?"))
("p,prompt", "Prompt", cxxopts::value<std::string>()->default_value(""))
("pf,prompt_file", "Read prompt from file", cxxopts::value<std::string>())
("i,image", "Image", cxxopts::value<std::string>()->default_value("image.jpg"))
("nw,num_warmup", "Number of warmup iterations", cxxopts::value<size_t>()->default_value(std::to_string(1)))
("n,num_iter", "Number of iterations", cxxopts::value<size_t>()->default_value(std::to_string(3)))
Expand All @@ -35,30 +36,57 @@ int main(int argc, char* argv[]) try {
return EXIT_SUCCESS;
}

std::string prompt = result["prompt"].as<std::string>();
std::string prompt;
if (result.count("prompt") && result.count("prompt_file")) {
std::cout << "Prompt and prompt file should not exist together!" << std::endl;
return EXIT_FAILURE;
} else {
if (result.count("prompt_file")) {
prompt = utils::read_prompt(result["prompt_file"].as<std::string>());
} else {
prompt = result["prompt"].as<std::string>().empty() ? "What is on the image?" : result["prompt"].as<std::string>();
}
}
if (prompt.empty()) {
std::cout << "Prompt is empty!" << std::endl;
return EXIT_FAILURE;
}

const std::string models_path = result["model"].as<std::string>();
const std::string image_path = result["image"].as<std::string>();
std::string device = result["device"].as<std::string>();
size_t num_warmup = result["num_warmup"].as<size_t>();
size_t num_iter = result["num_iter"].as<size_t>();
ov::Tensor image = utils::load_image(image_path);
std::vector<ov::Tensor> images = utils::load_images(image_path);

ov::genai::GenerationConfig config;
config.max_new_tokens = result["max_new_tokens"].as<size_t>();
config.ignore_eos = true;

ov::genai::SchedulerConfig scheduler_config;
scheduler_config.enable_prefix_caching = false;
scheduler_config.max_num_batched_tokens = std::numeric_limits<std::size_t>::max();

std::cout << ov::get_openvino_version() << std::endl;

ov::genai::VLMPipeline pipe(models_path, device, ov::genai::scheduler_config(scheduler_config));

auto input_data = pipe.get_tokenizer().encode(prompt);
size_t prompt_token_size = input_data.input_ids.get_shape()[1];
std::cout << "Number of images:" << images.size() << ", prompt token size:" << prompt_token_size << std::endl;

ov::genai::VLMPipeline pipe(models_path, device);

for (size_t i = 0; i < num_warmup; i++)
pipe.generate(prompt, ov::genai::image(image), ov::genai::generation_config(config));
pipe.generate(prompt, ov::genai::images(images), ov::genai::generation_config(config));

auto res = pipe.generate(prompt, ov::genai::image(image), ov::genai::generation_config(config));
auto res = pipe.generate(prompt, ov::genai::images(images), ov::genai::generation_config(config));
auto metrics = res.perf_metrics;
for (size_t i = 0; i < num_iter - 1; i++) {
res = pipe.generate(prompt, ov::genai::image(image), ov::genai::generation_config(config));
res = pipe.generate(prompt, ov::genai::images(images), ov::genai::generation_config(config));
metrics = metrics + res.perf_metrics;
}

std::cout << std::fixed << std::setprecision(2);
std::cout << "Output token size:" << res.perf_metrics.get_num_generated_tokens() << std::endl;
std::cout << "Load time: " << metrics.get_load_time() << " ms" << std::endl;
std::cout << "Generate time: " << metrics.get_generate_duration().mean << " ± " << metrics.get_generate_duration().std << " ms" << std::endl;
std::cout << "Tokenization time: " << metrics.get_tokenization_duration().mean << " ± " << metrics.get_tokenization_duration().std << " ms" << std::endl;
Expand Down
3 changes: 2 additions & 1 deletion samples/python/text_generation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,8 @@ For more information how performance metrics are calculated please follow [perfo
```
#### Options
- `-m, --model`: Path to the model and tokenizers base directory.
- `-p, --prompt` (default: `"The Sky is blue because"`): The prompt to generate text.
- `-p, --prompt` (default: `None`): The prompt to generate text. If without `-p` and `-pf`, the default prompt is `"The Sky is blue because"`
- `-pf, --prompt_file` Read prompt from file.
- `-nw, --num_warmup` (default: `1`): Number of warmup iterations.
- `-mt, --max_new_tokens` (default: `20`): Maximal number of new tokens.
- `-n, --num_iter` (default: `3`): Number of iterations.
Expand Down
30 changes: 27 additions & 3 deletions samples/python/text_generation/benchmark_genai.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,38 @@
# Copyright (C) 2023-2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import sys
import argparse
import openvino_genai as ov_genai
from openvino import get_version

def main():
parser = argparse.ArgumentParser(description="Help command")
parser.add_argument("-m", "--model", type=str, required=True, help="Path to model and tokenizers base directory")
parser.add_argument("-p", "--prompt", type=str, default="The Sky is blue because", help="Prompt")
parser.add_argument("-p", "--prompt", type=str, default=None, help="Prompt")
parser.add_argument("-pf", "--prompt_file", type=str, help="Read prompt from file")
parser.add_argument("-nw", "--num_warmup", type=int, default=1, help="Number of warmup iterations")
parser.add_argument("-n", "--num_iter", type=int, default=2, help="Number of iterations")
parser.add_argument("-mt", "--max_new_tokens", type=int, default=20, help="Maximal number of new tokens")
parser.add_argument("-d", "--device", type=str, default="CPU", help="Device")

args = parser.parse_args()

if args.prompt is not None and args.prompt_file is not None:
raise RuntimeError(f'Prompt and prompt file should not exist together!')
else:
if args.prompt_file is not None:
with open(args.prompt_file, 'r', encoding='utf-8') as f:
prompt = [f.read()]
else:
prompt = ['The Sky is blue because'] if args.prompt is None else [args.prompt]
if len(prompt) == 0:
raise RuntimeError(f'Prompt is empty!')

print(f'openvino runtime version: {get_version()}')

# Perf metrics is stored in DecodedResults.
# In order to get DecodedResults instead of a string input should be a list.
prompt = [args.prompt]
models_path = args.model
device = args.device
num_warmup = args.num_warmup
Expand All @@ -26,8 +41,16 @@ def main():
config = ov_genai.GenerationConfig()
config.max_new_tokens = args.max_new_tokens

pipe = ov_genai.LLMPipeline(models_path, device)
scheduler_config = ov_genai.SchedulerConfig()
scheduler_config.enable_prefix_caching = False
scheduler_config.max_num_batched_tokens = sys.maxsize

pipe = ov_genai.LLMPipeline(models_path, device, scheduler_config=scheduler_config)

input_data = pipe.get_tokenizer().encode(prompt)
prompt_token_size = input_data.input_ids.get_shape()[1]
print(f"Prompt token size: {prompt_token_size}")

for _ in range(num_warmup):
pipe.generate(prompt, config)

Expand All @@ -37,6 +60,7 @@ def main():
res = pipe.generate(prompt, config)
perf_metrics += res.perf_metrics

print(f"Output token size: {res.perf_metrics.get_num_generated_tokens()}")
print(f"Load time: {perf_metrics.get_load_time():.2f} ms")
print(f"Generate time: {perf_metrics.get_generate_duration().mean:.2f} ± {perf_metrics.get_generate_duration().std:.2f} ms")
print(f"Tokenization time: {perf_metrics.get_tokenization_duration().mean:.2f} ± {perf_metrics.get_tokenization_duration().std:.2f} ms")
Expand Down
3 changes: 2 additions & 1 deletion samples/python/visual_language_chat/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ python benchmark_vlm.py [OPTIONS]
### Options

- `-m, --model`(default: `.`): Path to the model and tokenizers base directory.
- `-p, --prompt` (default: `What is on the image?`): The prompt to generate text.
- `-p, --prompt` (default: `None`): The prompt to generate text. If without `-p` and `-pf`, the default prompt is `"What is on the image?"`
- `-pf, --prompt_file` Read prompt from file.
- `-i, --image` (default: `image.jpg`): Path to the image.
- `-nw, --num_warmup` (default: `1`): Number of warmup iterations.
- `-mt, --max_new_tokens` (default: `20`): Maximal number of new tokens.
Expand Down
Loading