You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+56Lines changed: 56 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,6 +23,7 @@ OpenVINO™ GenAI library provides very lightweight C++ and Python APIs to run f
23
23
- Image generation using Diffuser models, for example, generation using Stable Diffusion models
24
24
- Speech recognition using Whisper family models
25
25
- Text generation using Large Visual Models, for instance, Image analysis using LLaVa or miniCPM models family
26
+
- Text-to-speech generation using SpeechT5 TTS models
26
27
- Text embedding for Retrieval-Augmented Generation (RAG). For example, compute embeddings for documents and queries to enable efficient retrieval in RAG workflows.
27
28
28
29
Library efficiently supports LoRA adapters for Text and Image generation scenarios:
@@ -391,6 +392,61 @@ See [here](https://openvinotoolkit.github.io/openvino_notebooks/?search=Automati
391
392
392
393
</details>
393
394
395
+
## Performing text-to-speech generation
396
+
<details>
397
+
398
+
For more examples check out our [Generative AI workflow](https://docs.openvino.ai/2025/openvino-workflow-generative.html)
399
+
400
+
NOTE: Currently, text-to-speech in OpenVINO GenAI supports the SpeechT5 TTS model. The generated audio signal is a single-channel (mono) waveform with a sampling rate of 16 kHz.
401
+
402
+
### Converting text-to-speech model from Hugging Face library
0 commit comments