openvinotoolkit
diff --git a/‎.github/workflows/linux.yml‎
Lines changed: 1 addition & 2 deletions b/‎.github/workflows/linux.yml‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎.github/workflows/mac.yml‎
Lines changed: 1 addition & 2 deletions b/‎.github/workflows/mac.yml‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎.github/workflows/windows.yml‎
Lines changed: 1 addition & 2 deletions b/‎.github/workflows/windows.yml‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎README.md‎
Lines changed: 56 additions & 0 deletions b/‎README.md‎
Lines changed: 56 additions & 0 deletions
diff --git a/‎samples/cpp/speech_generation/README.md‎
Lines changed: 2 additions & 2 deletions b/‎samples/cpp/speech_generation/README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎samples/export-requirements.txt‎
Lines changed: 1 addition & 1 deletion b/‎samples/export-requirements.txt‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎samples/python/speech_generation/README.md‎
Lines changed: 3 additions & 3 deletions b/‎samples/python/speech_generation/README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎samples/python/speech_generation/tmp.py‎
Lines changed: 0 additions & 11 deletions b/‎samples/python/speech_generation/tmp.py‎
Lines changed: 0 additions & 11 deletions
@@ -389,8 +389,7 @@ jobs:
         test:
           - name: 'Whisper'
             cmd: 'tests/python_tests/test_whisper_pipeline.py tests/python_tests/test_whisper_pipeline_static.py'
-            run_condition: false
-            # run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).whisper.test }}
+            run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).whisper.test }}
             timeout: 45
           - name: 'Cacheopt E2E'
             cmd: 'tests/python_tests/test_kv_cache_eviction.py'
 
@@ -392,8 +392,7 @@ jobs:
         test:
           - name: 'Whisper'
             cmd: 'tests/python_tests/test_whisper_pipeline.py'
-            run_condition: false
-            # run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).whisper.test }}
+            run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).whisper.test }}
             timeout: 45
           - name: 'Cacheopt E2E'
             cmd: 'tests/python_tests/test_kv_cache_eviction.py'
 
@@ -441,8 +441,7 @@ jobs:
         test:
           - name: 'Whisper'
             cmd: 'tests/python_tests/test_whisper_pipeline.py tests/python_tests/test_whisper_pipeline_static.py'
-            run_condition: false
-            # run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).whisper.test }}
+            run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).whisper.test }}
             timeout: 45
           - name: 'Cacheopt E2E'
             cmd: 'tests/python_tests/test_kv_cache_eviction.py'
 
@@ -23,6 +23,7 @@ OpenVINO™ GenAI library provides very lightweight C++ and Python APIs to run f
  - Image generation using Diffuser models, for example, generation using Stable Diffusion models
  - Speech recognition using Whisper family models
  - Text generation using Large Visual Models, for instance, Image analysis using LLaVa or miniCPM models family
+ - Text-to-speech generation using SpeechT5 TTS models
  - Text embedding for Retrieval-Augmented Generation (RAG). For example, compute embeddings for documents and queries to enable efficient retrieval in RAG workflows.
 
 Library efficiently supports LoRA adapters for Text and Image generation scenarios:
@@ -391,6 +392,61 @@ See [here](https://openvinotoolkit.github.io/openvino_notebooks/?search=Automati
 
 </details>
 
+## Performing text-to-speech generation
+<details>
+
+For more examples check out our [Generative AI workflow](https://docs.openvino.ai/2025/openvino-workflow-generative.html)
+
+NOTE: Currently, text-to-speech in OpenVINO GenAI supports the SpeechT5 TTS model. The generated audio signal is a single-channel (mono) waveform with a sampling rate of 16 kHz.
+ 
+### Converting text-to-speech model from Hugging Face library
+```sh
+# Download and convert to OpenVINO
+optimum-cli export openvino --model microsoft/speecht5_tts --model-kwargs "{\"vocoder\": \"microsoft/speecht5_hifigan\"}" ov_speecht5_tts
+```
+
+### Run generation using Text-to-speech API in Python
+
+NOTE: This sample is a simplified version of the full sample that is available [here](./samples/python/speech_generation/text2speech.py)
+
+```python
+import openvino_genai
+import soundfile as sf
+
+pipe = openvino_genai.Text2SpeechPipeline("ov_speecht5_tts", "CPU")
+
+# additionally, a speaker embedding can be specified as the target voice input to the generate method
+result = pipe.generate("Hello OpenVINO GenAI")
+speech = result.speeches[0]
+sf.write("output_audio.wav", speech.data[0], samplerate=16000)
+```
+
+ 
+### Run generation using Text-to-speech API in C++
+
+NOTE: This sample is a simplified version of the full sample that is available [here](./samples/cpp/speech_generation/text2speech.cpp)
+
+```cpp
+#include "audio_utils.hpp"
+#include "openvino/genai/speech_generation/text2speech_pipeline.hpp"
+
+int main(int argc, char* argv[]) {
+    ov::genai::Text2SpeechPipeline pipe("ov_speecht5_tts", "CPU");
+
+    // additionally, a speaker embedding can be specified as the target voice input to the generate method
+    auto gen_speech = pipe.generate("Hello OpenVINO GenAI");
+
+    auto waveform_size = gen_speech.speeches[0].get_size();
+    auto waveform_ptr = gen_speech.speeches[0].data<const float>();
+    auto bits_per_sample = gen_speech.speeches[0].get_element_type().bitwidth();
+    utils::audio::save_to_wav(waveform_ptr, waveform_size, "output_audio.wav", bits_per_sample);
+
+    return 0;
+}
+```
+
+</details>
+
 ## Text Embeddings
 <details>
 
 
@@ -12,10 +12,10 @@ Install [../../export-requirements.txt](../../export-requirements.txt) to conver
 
 ```sh
 pip install --upgrade-strategy eager -r ../../export-requirements.txt
-optimum-cli export openvino --model microsoft/speecht5_tts --model-kwargs '{\"vocoder\": \"microsoft/speecht5_hifigan\"}' speecht5_tts
+optimum-cli export openvino --model microsoft/speecht5_tts --model-kwargs "{\"vocoder\": \"microsoft/speecht5_hifigan\"}" speecht5_tts
 ```
 
-**Note:** Currently, text-to-speech in OpenVINO GenAI supports the `SpeechT5 TTS` model.  
+**Note:** Currently, text-to-speech in OpenVINO GenAI supports the `SpeechT5 TTS` model.
 When exporting the model, you must specify a vocoder using the `--model-kwargs` option in JSON format.
 
 ## Prepare speaker embedding file
 
@@ -2,7 +2,7 @@
 --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
 --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
 openvino-tokenizers~=2025.3.0.0.dev
-optimum-intel[nncf] @ git+https://github.com/huggingface/optimum-intel.git@dba7dced0145b539bb0563e5d5741d00daeb8025
+optimum-intel[nncf] @ git+https://github.com/huggingface/optimum-intel.git@main
 numpy<2.0.0; sys_platform == 'darwin'
 einops==0.8.1  # For Qwen
 transformers_stream_generator==0.0.5  # For Qwen
 
@@ -17,7 +17,7 @@ pip install --upgrade-strategy eager -r ../../export-requirements.txt
 Then, run the export with Optimum CLI:
 
 ```sh
-optimum-cli export openvino --model microsoft/speecht5_tts --model-kwargs '{\"vocoder\": \"microsoft/speecht5_hifigan\"}' speecht5_tts
+optimum-cli export openvino --model microsoft/speecht5_tts --model-kwargs "{\"vocoder\": \"microsoft/speecht5_hifigan\"}" speecht5_tts
 ```
 
 Alternatively, you can do it in Python code:
@@ -27,7 +27,7 @@ from optimum.exporters.openvino.convert import export_tokenizer
 from optimum.intel import OVModelForTextToSpeechSeq2Seq
 from transformers import AutoTokenizer
 
-output_dir = "tts_model"
+output_dir = "speecht5_tts"
 
 model = OVModelForTextToSpeechSeq2Seq.from_pretrained("microsoft/speecht5_tts", vocoder="microsoft/speecht5_hifigan", export=True)
 model.save_pretrained(output_dir)
@@ -36,7 +36,7 @@ tokenizer = AutoTokenizer.from_pretrained("microsoft/speecht5_tts")
 export_tokenizer(tokenizer, output_dir)
 ```
 
-**Note:** Currently, text-to-speech in OpenVINO GenAI supports the `SpeechT5 TTS` model.  
+**Note:** Currently, text-to-speech in OpenVINO GenAI supports the `SpeechT5 TTS` model. 
 When exporting the model, you must specify a vocoder using the `--model-kwargs` option in JSON format.
 
 ## Prepare speaker embedding file