[Web] WebGPU issues tracking

This issue is for tracking WebGPU related problems. WebGPU EP is available since ONNX Runtime Web v1.15.0 as experimental feature. We are working on improving stability, operator coverage and performance.

For a list of supported/WIP operators, comments or any operator specific issues: #15952

------------------

## Can not consume
Q: How to build?
A: Building ort-web with webgpu support from source: please refer to this [gist](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce)

Q: [[Web] An error occurred during model execution: "TypeError: Cannot read properties of undefined (reading 'apply')".](https://github.com/microsoft/onnxruntime/issues/15719#top)
A: https://github.com/microsoft/onnxruntime/pull/15780 <--- this PR fixed it

Q: `no available backend found. ERR: ...`
A: Need to make sure webgpu is available in the current context. Upgrade to latest Chrome or Edge (v113), and served in a secured location ( https or localhost )

## Runtime failures

Q: `Non-zero status code returned while running Transpose node. ....`
A: #15819 <--- This PR should fix it

Q: crash in the transpose optimizer for various models (#15869: cannot load model https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/onnx/vae_encoder)
A: issue being investigated - see the PR for detailed info

## Kernel coverage or running slow

Q: General investigation tips?
A: a few tools that can be used to taking deeper look at it: ( don't do them together, it will generate too many logs )
- `env.logLevel = 'verbose'; env.debug = true;` - This will let onnxruntime-web to output some logs helpful for analysing the execution. including telling which operators are running on webgpu and which are on CPU (fallback). to improve performance caused by fallback we need to improve the operator coverage. I can help to implement the missing ops.
- `env.webgpu.profilingMode = 'default';` - This will output quite a lot of logs into console for each webgpu shaders - by aggregating and analyzing those we can know which shader is slow. Need to launch chrome/edge with flag `--disable-dawn-features=disallow_unsafe_apis`.
- set `sessionOptions.enableProfiling = true` when creating inference session. This shows which operator running on GPU, which fallback to CPU.

Q: running slow on [image classification model](https://huggingface.co/Xenova/transformers.js/tree/main/quantized/google/vit-base-patch16-224/image-classification). ([logs](https://github.com/microsoft/onnxruntime/files/11354629/localhost-1682691588176.log))
A: `jsepCopyGpuToCpu` occurred 114 times, which indicating frequent CPU <--> GPU data transfer. Adding implementation of the missing operators may improve performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Web] WebGPU issues tracking #15796

Can not consume

Runtime failures

Kernel coverage or running slow

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Web] WebGPU issues tracking #15796

Description

Can not consume

Runtime failures

Kernel coverage or running slow

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions