Skip to content

Conversation

mingmingtasd
Copy link
Collaborator

@mingmingtasd mingmingtasd commented Apr 9, 2025

Remove FP32 support for all tensors on OV EP NPU except input, constant, and cast_input.

@mingmingtasd
Copy link
Collaborator Author

  1. The reason of excluding input, constant, and cast_input is that models which cast float32 inputs to float16 are widely used, for example, the fp16
    models at https://webmachinelearning.github.io/webnn-samples/image_classification/ and some transformer models.
  2. The reason of not allowing fp16 only for NPU is that: some wpt tests with uint8 input can pass even with CPU fallback disabled:
    image
  3. This PR impacts the preview demos sd-turbo and sd-1.5 which have fp32 instance norm. The models will be split into many pieces by WebNN EP, but it can be resolved by using whole fp16 models for these two demos which is already verified.

@huningxin @Honry @BruceDai

@huningxin
Copy link
Collaborator

The reason of not allowing fp16 only for NPU is that: some wpt tests with uint8 input can pass even with CPU fallback disabled:

https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html only mentions

Supported Inference Data Types
The NPU plugin supports the following data types as inference precision of internal primitives:
Floating-point data types: F32, F16
Quantized data types: U8 (quantized models may be INT8 or mixed FP16-INT8)
Computation precision for the HW is FP16.

How about int32/uint32?

This PR impacts the preview demos sd-turbo and sd-1.5 which have fp32 instance norm. The models will be split into many pieces by WebNN EP, but it can be resolved by using whole fp16 models for these two demos which is already verified.

Should we hold this PR until fp16 sd-turbo and sd-1.5 models are deployed?

@mingmingtasd
Copy link
Collaborator Author

Should we hold this PR until fp16 sd-turbo and sd-1.5 models are deployed?

Sure!

How about int32/uint32?

Based on this PR and current ORT OV EP, we now has no such tests that can verify. So I clear the ops_supported_only_in_model in OV EP (make it empty) to verify.
std::set<std::string> ops_supported_only_in_model = { // "Add", // "Cast", // "Celu", // "Concat", // "ConstantOfShape", // "DequantizeLinear", // "Dropout", // "Einsum", // "Exp", // "Expand", // "EyeLike", // "GatherElements", // "GatherND", // "GridSample", // "Identity", // "LayerNormalization", // "Loop", // "LSTM", // "NonMaxSuppression", // "NonZero", // "Not", // "OneHot", // "Pad", // "QuantizeLinear", // "RandomNormalLike", // "Range", // "ReduceMin", // "Resize", // "Round", // "Shape", // "Slice", // "Split", // "Tile", // "TopK", // "Trilu" };

At least, even with CPU fallback disabled,

  1. cast can pass some int32/int64/int8/uint8 tests, but uint32 tests fail.
  2. sign int32 tests crash

image
image

@mingmingtasd mingmingtasd changed the title remove fp32 in data type limits for OV NPU [DO NOT SUBMIT] Remove fp32 in data type limits for OV NPU Apr 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants