Skip to content

Commit fa51774

Browse files
Merge branch 'master' into adrianboguszewski-patch-1
# Conflicts: # samples/python/speech_generation/README.md
2 parents a310203 + 62fc868 commit fa51774

File tree

11 files changed

+74
-27
lines changed

11 files changed

+74
-27
lines changed

.github/workflows/linux.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -389,8 +389,7 @@ jobs:
389389
test:
390390
- name: 'Whisper'
391391
cmd: 'tests/python_tests/test_whisper_pipeline.py tests/python_tests/test_whisper_pipeline_static.py'
392-
run_condition: false
393-
# run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).whisper.test }}
392+
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).whisper.test }}
394393
timeout: 45
395394
- name: 'Cacheopt E2E'
396395
cmd: 'tests/python_tests/test_kv_cache_eviction.py'

.github/workflows/mac.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -392,8 +392,7 @@ jobs:
392392
test:
393393
- name: 'Whisper'
394394
cmd: 'tests/python_tests/test_whisper_pipeline.py'
395-
run_condition: false
396-
# run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).whisper.test }}
395+
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).whisper.test }}
397396
timeout: 45
398397
- name: 'Cacheopt E2E'
399398
cmd: 'tests/python_tests/test_kv_cache_eviction.py'

.github/workflows/windows.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -441,8 +441,7 @@ jobs:
441441
test:
442442
- name: 'Whisper'
443443
cmd: 'tests/python_tests/test_whisper_pipeline.py tests/python_tests/test_whisper_pipeline_static.py'
444-
run_condition: false
445-
# run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).whisper.test }}
444+
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).whisper.test }}
446445
timeout: 45
447446
- name: 'Cacheopt E2E'
448447
cmd: 'tests/python_tests/test_kv_cache_eviction.py'

README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ OpenVINO™ GenAI library provides very lightweight C++ and Python APIs to run f
2323
- Image generation using Diffuser models, for example, generation using Stable Diffusion models
2424
- Speech recognition using Whisper family models
2525
- Text generation using Large Visual Models, for instance, Image analysis using LLaVa or miniCPM models family
26+
- Text-to-speech generation using SpeechT5 TTS models
2627
- Text embedding for Retrieval-Augmented Generation (RAG). For example, compute embeddings for documents and queries to enable efficient retrieval in RAG workflows.
2728

2829
Library efficiently supports LoRA adapters for Text and Image generation scenarios:
@@ -391,6 +392,61 @@ See [here](https://openvinotoolkit.github.io/openvino_notebooks/?search=Automati
391392
392393
</details>
393394
395+
## Performing text-to-speech generation
396+
<details>
397+
398+
For more examples check out our [Generative AI workflow](https://docs.openvino.ai/2025/openvino-workflow-generative.html)
399+
400+
NOTE: Currently, text-to-speech in OpenVINO GenAI supports the SpeechT5 TTS model. The generated audio signal is a single-channel (mono) waveform with a sampling rate of 16 kHz.
401+
402+
### Converting text-to-speech model from Hugging Face library
403+
```sh
404+
# Download and convert to OpenVINO
405+
optimum-cli export openvino --model microsoft/speecht5_tts --model-kwargs "{\"vocoder\": \"microsoft/speecht5_hifigan\"}" ov_speecht5_tts
406+
```
407+
408+
### Run generation using Text-to-speech API in Python
409+
410+
NOTE: This sample is a simplified version of the full sample that is available [here](./samples/python/speech_generation/text2speech.py)
411+
412+
```python
413+
import openvino_genai
414+
import soundfile as sf
415+
416+
pipe = openvino_genai.Text2SpeechPipeline("ov_speecht5_tts", "CPU")
417+
418+
# additionally, a speaker embedding can be specified as the target voice input to the generate method
419+
result = pipe.generate("Hello OpenVINO GenAI")
420+
speech = result.speeches[0]
421+
sf.write("output_audio.wav", speech.data[0], samplerate=16000)
422+
```
423+
424+
425+
### Run generation using Text-to-speech API in C++
426+
427+
NOTE: This sample is a simplified version of the full sample that is available [here](./samples/cpp/speech_generation/text2speech.cpp)
428+
429+
```cpp
430+
#include "audio_utils.hpp"
431+
#include "openvino/genai/speech_generation/text2speech_pipeline.hpp"
432+
433+
int main(int argc, char* argv[]) {
434+
ov::genai::Text2SpeechPipeline pipe("ov_speecht5_tts", "CPU");
435+
436+
// additionally, a speaker embedding can be specified as the target voice input to the generate method
437+
auto gen_speech = pipe.generate("Hello OpenVINO GenAI");
438+
439+
auto waveform_size = gen_speech.speeches[0].get_size();
440+
auto waveform_ptr = gen_speech.speeches[0].data<const float>();
441+
auto bits_per_sample = gen_speech.speeches[0].get_element_type().bitwidth();
442+
utils::audio::save_to_wav(waveform_ptr, waveform_size, "output_audio.wav", bits_per_sample);
443+
444+
return 0;
445+
}
446+
```
447+
448+
</details>
449+
394450
## Text Embeddings
395451
<details>
396452

samples/cpp/speech_generation/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@ Install [../../export-requirements.txt](../../export-requirements.txt) to conver
1212

1313
```sh
1414
pip install --upgrade-strategy eager -r ../../export-requirements.txt
15-
optimum-cli export openvino --model microsoft/speecht5_tts --model-kwargs '{\"vocoder\": \"microsoft/speecht5_hifigan\"}' speecht5_tts
15+
optimum-cli export openvino --model microsoft/speecht5_tts --model-kwargs "{\"vocoder\": \"microsoft/speecht5_hifigan\"}" speecht5_tts
1616
```
1717

18-
**Note:** Currently, text-to-speech in OpenVINO GenAI supports the `SpeechT5 TTS` model.
18+
**Note:** Currently, text-to-speech in OpenVINO GenAI supports the `SpeechT5 TTS` model.
1919
When exporting the model, you must specify a vocoder using the `--model-kwargs` option in JSON format.
2020

2121
## Prepare speaker embedding file

samples/export-requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
--extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
33
--extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
44
openvino-tokenizers~=2025.3.0.0.dev
5-
optimum-intel[nncf] @ git+https://github.com/huggingface/optimum-intel.git@dba7dced0145b539bb0563e5d5741d00daeb8025
5+
optimum-intel[nncf] @ git+https://github.com/huggingface/optimum-intel.git@main
66
numpy<2.0.0; sys_platform == 'darwin'
77
einops==0.8.1 # For Qwen
88
transformers_stream_generator==0.0.5 # For Qwen

samples/python/speech_generation/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ pip install --upgrade-strategy eager -r ../../export-requirements.txt
1717
Then, run the export with Optimum CLI:
1818

1919
```sh
20-
optimum-cli export openvino --model microsoft/speecht5_tts --model-kwargs '{\"vocoder\": \"microsoft/speecht5_hifigan\"}' speecht5_tts
20+
optimum-cli export openvino --model microsoft/speecht5_tts --model-kwargs "{\"vocoder\": \"microsoft/speecht5_hifigan\"}" speecht5_tts
2121
```
2222

2323
Alternatively, you can do it in Python code:
@@ -27,7 +27,7 @@ from optimum.exporters.openvino.convert import export_tokenizer
2727
from optimum.intel import OVModelForTextToSpeechSeq2Seq
2828
from transformers import AutoTokenizer
2929

30-
output_dir = "tts_model"
30+
output_dir = "speecht5_tts"
3131

3232
model = OVModelForTextToSpeechSeq2Seq.from_pretrained("microsoft/speecht5_tts", vocoder="microsoft/speecht5_hifigan", export=True)
3333
model.save_pretrained(output_dir)
@@ -36,7 +36,7 @@ tokenizer = AutoTokenizer.from_pretrained("microsoft/speecht5_tts")
3636
export_tokenizer(tokenizer, output_dir)
3737
```
3838

39-
**Note:** Currently, text-to-speech in OpenVINO GenAI supports the `SpeechT5 TTS` model.
39+
**Note:** Currently, text-to-speech in OpenVINO GenAI supports the `SpeechT5 TTS` model.
4040
When exporting the model, you must specify a vocoder using the `--model-kwargs` option in JSON format.
4141

4242
## Prepare speaker embedding file

samples/python/speech_generation/tmp.py

Lines changed: 0 additions & 11 deletions
This file was deleted.

0 commit comments

Comments
 (0)