-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
This issue is for tracking WebGPU related problems. WebGPU EP is available since ONNX Runtime Web v1.15.0 as experimental feature. We are working on improving stability, operator coverage and performance.
For a list of supported/WIP operators, comments or any operator specific issues: #15952
Can not consume
Q: How to build?
A: Building ort-web with webgpu support from source: please refer to this gist
Q: [Web] An error occurred during model execution: "TypeError: Cannot read properties of undefined (reading 'apply')".
A: #15780 <--- this PR fixed it
Q: no available backend found. ERR: ...
A: Need to make sure webgpu is available in the current context. Upgrade to latest Chrome or Edge (v113), and served in a secured location ( https or localhost )
Runtime failures
Q: Non-zero status code returned while running Transpose node. ....
A: #15819 <--- This PR should fix it
Q: crash in the transpose optimizer for various models (#15869: cannot load model https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/onnx/vae_encoder)
A: issue being investigated - see the PR for detailed info
Kernel coverage or running slow
Q: General investigation tips?
A: a few tools that can be used to taking deeper look at it: ( don't do them together, it will generate too many logs )
env.logLevel = 'verbose'; env.debug = true;
- This will let onnxruntime-web to output some logs helpful for analysing the execution. including telling which operators are running on webgpu and which are on CPU (fallback). to improve performance caused by fallback we need to improve the operator coverage. I can help to implement the missing ops.env.webgpu.profilingMode = 'default';
- This will output quite a lot of logs into console for each webgpu shaders - by aggregating and analyzing those we can know which shader is slow. Need to launch chrome/edge with flag--disable-dawn-features=disallow_unsafe_apis
.- set
sessionOptions.enableProfiling = true
when creating inference session. This shows which operator running on GPU, which fallback to CPU.
Q: running slow on image classification model. (logs)
A: jsepCopyGpuToCpu
occurred 114 times, which indicating frequent CPU <--> GPU data transfer. Adding implementation of the missing operators may improve performance.