microsoft · xiaoyu-work · Apr 29, 2025 · Apr 29, 2025 · Apr 29, 2025 · Apr 29, 2025
diff --git a/.azure_pipelines/olive-ci.yaml b/.azure_pipelines/olive-ci.yaml
@@ -92,9 +92,6 @@ jobs:
       resnet_ptq_cpu:
         exampleFolder: resnet
         exampleName: resnet_ptq_cpu
-      whisper:
-        exampleFolder: whisper
-        exampleName: whisper
       mobilenet_qnn_ep:
         exampleFolder: mobilenet/qnn
         exampleName: mobilenet_qnn_ep
@@ -112,9 +109,6 @@ jobs:
       resnet_ptq_cpu:
         exampleFolder: resnet
         exampleName: resnet_ptq_cpu
-      whisper:
-        exampleFolder: whisper
-        exampleName: whisper
       mobilenet_qnn_ep:
         exampleFolder: mobilenet/qnn
         exampleName: mobilenet_qnn_ep

diff --git a/.azure_pipelines/olive-ort-nightly.yaml b/.azure_pipelines/olive-ort-nightly.yaml
@@ -61,9 +61,6 @@ jobs:
       resnet_qat:
         exampleFolder: resnet
         exampleName: resnet_qat
-      whisper:
-        exampleFolder: whisper
-        exampleName: whisper
       mobilenet_qnn_ep:
         exampleFolder: mobilenet/qnn
         exampleName: mobilenet_qnn_ep
@@ -82,9 +79,6 @@ jobs:
       resnet_ptq_cpu:
         exampleFolder: resnet
         exampleName: resnet_ptq_cpu
-      whisper:
-        exampleFolder: whisper
-        exampleName: whisper
       mobilenet_qnn_ep:
         exampleFolder: mobilenet/qnn
         exampleName: mobilenet_qnn_ep

diff --git a/docs/source/examples.md b/docs/source/examples.md
@@ -16,7 +16,6 @@
 ||deberta|[Link](https://github.com/microsoft/Olive/tree/main/examples/deberta)|`GPU`: Optimize Azureml Registry Model with ONNX Runtime optimizations and quantization
 ||gptj|[Link](https://github.com/microsoft/Olive/tree/main/examples/gptj)|`CPU`: with Intel® Neural Compressor static/dynamic quantization for INT8 ONNX model
 ||bge|[Link](https://github.com/microsoft/Olive/tree/main/examples/bge)|`NPU`: with ONNX Runtime optimizations for QNN EP
-|Audio|whisper|[Link](https://github.com/microsoft/Olive/tree/main/examples/whisper)|`CPU`: with ONNX Runtime optimizations for all-in-one ONNX model in FP32<br>`CPU`: with ONNX Runtime optimizations for all-in-one ONNX model in INT8<br>`CPU`: with ONNX Runtime optimizations and Intel® Neural Compressor Dynamic Quantization for all-in-one ONNX model in INT8<br>`GPU`: with ONNX Runtime optimizations for all-in-one ONNX model in FP32<br>`GPU`: with ONNX Runtime optimizations for all-in-one ONNX model in FP16<br>`GPU`: with ONNX Runtime optimizations for all-in-one ONNX model in INT8
 ||audio spectrogram<br>transformer|[Link](https://github.com/microsoft/Olive/tree/main/examples/ast)|`CPU`: with ONNX Runtime optimizations and quantization for optimized INT8 ONNX model
 |Vision|stable diffusion|[Link](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion)|`GPU`: with ONNX Runtime optimization for DirectML EP<br>`GPU`: with ONNX Runtime optimization for CUDA EP<br>`Intel CPU`: with OpenVINO toolkit<br>`QDQ`: with ONNX Runtime static Quantization for ONNX INT8 model with QDQ format
 ||stable diffusion XL|[Link](https://github.com/microsoft/Olive/tree/main/examples/directml/stable_diffusion_xl)|`GPU`: with ONNX Runtime optimizations with DirectML EP<br>`GPU`: with ONNX Runtime optimization for CUDA EP

diff --git a/docs/source/features/onnx-transformations.md b/docs/source/features/onnx-transformations.md
@@ -1017,16 +1017,6 @@ graph {
 }
 ```
 
-```json
-{
-    "type": "AppendPrePostProcessingOps",
-    "tool_command": "whisper",
-    "tool_command_args": {
-        "use_audio_decoder": true
-    }
-}
-```
-
 `AppendPrePostProcessingOps` also supports pre/post processing ops by leveraging the [onnxruntime-extension steps](https://github.com/microsoft/onnxruntime-extensions/tree/main/onnxruntime_extensions/tools/pre_post_processing/steps) and `PrePostProcessor`.
 You can refer to [here](https://github.com/microsoft/onnxruntime-extensions/blob/main/onnxruntime_extensions/tools/Example%20usage%20of%20the%20PrePostProcessor.md) to see how to leverage `PrePostProcessor` to customize pre and post processing ops.
 
@@ -1130,19 +1120,6 @@ Here are some examples to describe the pre/post processing which is exactly same
 }
 ```
 
-## Insert Beam Search Op
-
-`InsertBeamSearch` chains two model components (for example, encoder and decoder) together by inserting beam search op in between them.
-
-### Example Configuration
-
-```json
-{
-    "type": "InsertBeamSearch",
-    "no_repeat_ngram_size": 4
-}
-```
-
 ## ORT Performance Tuning
 
 ONNX Runtime provides high performance across a range of hardware options through its Execution Providers interface for different execution

diff --git a/docs/source/reference/options.md b/docs/source/reference/options.md
@@ -405,7 +405,6 @@ Please also find the detailed options from following table for each pass:
 | [IncQuantization](../../reference/pass.rst#_inc_quantization) | Quantize ONNX model with Intel® Neural Compressor where we can search for best parameters for static/dynamic quantization at same time. |
 | [VitisAIQuantization](../../reference/pass.rst#_vitis_ai_quantization) | AMD-Xilinx Vitis-AI Quantization Pass. |
 | [AppendPrePostProcessingOps](../../reference/pass.rst#_append_pre_post_processing) | Add Pre/Post nodes to the input model. |
-| [InsertBeamSearch](../../reference/pass.rst#_insert_beam_search) | Insert Beam Search Op. Only used for whisper models. Uses WhisperBeamSearch contrib op if ORT version >= 1.17.1, else uses BeamSearch contrib op. |
 | [ExtractAdapters](../../reference/pass.rst#_extract_adapters) | Extract adapters from ONNX model |
 | [CaptureSplitInfo](../../reference/pass.rst#_capture_split_info) | Capture the split information of the model layers. Only splits the transformer layers. |
 | [SplitModel](../../reference/pass.rst#_split_model) | Split an ONNX model into multiple smaller sub-models based on predefined assignments. |

diff --git a/docs/source/reference/pass.rst b/docs/source/reference/pass.rst
@@ -152,12 +152,6 @@ AppendPrePostProcessingOps
 --------------------------
 .. autoconfigclass:: olive.passes.AppendPrePostProcessingOps
 
-.. _insert_beam_search:
-
-InsertBeamSearch
-----------------
-.. autoconfigclass:: olive.passes.InsertBeamSearch
-
 .. _extract_adapters:
 
 ExtractAdapters

diff --git a/examples/test/local/test_whisper.py b/examples/test/local/test_whisper.py
diff --git a/examples/whisper/.gitignore b/examples/whisper/.gitignore
diff --git a/examples/whisper/README.md b/examples/whisper/README.md