Skip to content

Commit 087ef42

Browse files
pull[bot]qti-yuduoAkupadhyeNingW101gedoensmax
authored
[pull] main from microsoft:main (#152)
* [QNN EP] Fix pool with reshape name conflicts (microsoft#25332) Naming conflicts when expand-pool2d-squeeze (implemented as reshape) logic is invoked during ONNX -> QNN op lowering. Model with multiple pool 1D ops would hit this issue. * Added creation of QDQ for TopK node (microsoft#25309) - Added TopK in registry.py so as to create QDQ nodes for the op - Ensure that both the input and output quantization params are equal - Added unit test to verify the creation of QDQ nodes for TopK ### Description: Added support for creation of QDQ nodes for TopK when quantized with ORT static quantization tool ### Motivation and Context: Currently there is support to form a node unit for TopK operator when QDQ nodes are present and both the input and output quantization params are equal. But there was no support to create QDQ nodes for TopK operator in the ORT static quantization tool * [WebNN] Refactor webnn op input rank check and add validation for ops (microsoft#25185) ### Description Development for webnn op input rank range check ### Motivation and Context - refactor webnn op input rank check - add validation for various ops - take `gemm` op as an example to perform inputs rank check of decomposed ops @Honry @fdwr PTAL * Make TRT plugins optional (microsoft#25261) ### Description The parser does no longer link agains the plugin library but also loads it dynamic. Due to that I think we should also make the library optional in ORT. @chilo-ms * [EP ABI] Add Graph_GetGraphView API to get a OrtGraph from a subset of nodes (microsoft#25191) Added an API that creates a sub-graph from a set of nodes in an OrtGraph. This API is needed in the GetCapability EP ABI porting when EP wants to check whether a 'sub-graph' of the graph is supported by the hardware backend. * [webgpu] a few optimization to WGSL template (microsoft#25333) ### Description This change is a follow up to microsoft#25130. - consume duktape from vcpkg if --use_vcpkg is specified - ~~add a Windows CI pipeline for dynamic WGSL template~~ (Will do in a separate PR) - upgrade wgsl-template package from 0.1.10 to 0.1.13 - support adding contribop folder as input --------- Co-authored-by: qti-yuduo <[email protected]> Co-authored-by: Akupadhye <[email protected]> Co-authored-by: Wang Ning <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Chi Lo <[email protected]> Co-authored-by: Yulong Wang <[email protected]>
1 parent 14e0ad7 commit 087ef42

38 files changed

+617
-140
lines changed

cmake/external/onnxruntime_external_deps.cmake

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -774,13 +774,24 @@ if (onnxruntime_USE_WEBGPU)
774774
endif()
775775

776776
if (NOT CMAKE_SYSTEM_NAME STREQUAL "Emscripten" AND onnxruntime_WGSL_TEMPLATE STREQUAL "dynamic")
777-
onnxruntime_fetchcontent_declare(
778-
duktape
779-
URL ${DEP_URL_duktape}
780-
URL_HASH SHA1=${DEP_SHA1_duktape}
781-
EXCLUDE_FROM_ALL
782-
)
783-
onnxruntime_fetchcontent_makeavailable(duktape)
777+
if(onnxruntime_USE_VCPKG)
778+
find_package(unofficial-duktape CONFIG REQUIRED)
779+
add_library(duktape_static ALIAS unofficial::duktape::duktape)
780+
else()
781+
onnxruntime_fetchcontent_declare(
782+
duktape
783+
URL ${DEP_URL_duktape}
784+
URL_HASH SHA1=${DEP_SHA1_duktape}
785+
EXCLUDE_FROM_ALL
786+
)
787+
onnxruntime_fetchcontent_makeavailable(duktape)
788+
789+
if(NOT TARGET duktape_static)
790+
add_library(duktape_static STATIC "${duktape_SOURCE_DIR}/src/duktape.c")
791+
target_compile_features(duktape_static PRIVATE c_std_99)
792+
target_include_directories(duktape_static INTERFACE $<BUILD_INTERFACE:${duktape_SOURCE_DIR}/src>)
793+
endif()
794+
endif()
784795
endif()
785796
endif()
786797

cmake/onnxruntime_providers_tensorrt.cmake

Lines changed: 5 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -72,26 +72,21 @@
7272
endif()
7373

7474
# TensorRT 10 GA onwards, the TensorRT libraries will have major version appended to the end on Windows,
75-
# for example, nvinfer_10.dll, nvinfer_plugin_10.dll, nvonnxparser_10.dll ...
75+
# for example, nvinfer_10.dll, nvonnxparser_10.dll ...
7676
if (WIN32 AND TRT_GREATER_OR_EQUAL_TRT_10_GA)
7777
set(NVINFER_LIB "nvinfer_${NV_TENSORRT_MAJOR}")
78-
set(NVINFER_PLUGIN_LIB "nvinfer_plugin_${NV_TENSORRT_MAJOR}")
7978
set(PARSER_LIB "nvonnxparser_${NV_TENSORRT_MAJOR}")
8079
endif()
8180

8281
if (NOT NVINFER_LIB)
8382
set(NVINFER_LIB "nvinfer")
8483
endif()
8584

86-
if (NOT NVINFER_PLUGIN_LIB)
87-
set(NVINFER_PLUGIN_LIB "nvinfer_plugin")
88-
endif()
89-
9085
if (NOT PARSER_LIB)
9186
set(PARSER_LIB "nvonnxparser")
9287
endif()
9388

94-
MESSAGE(STATUS "Looking for ${NVINFER_LIB} and ${NVINFER_PLUGIN_LIB}")
89+
MESSAGE(STATUS "Looking for ${NVINFER_LIB}")
9590

9691
find_library(TENSORRT_LIBRARY_INFER ${NVINFER_LIB}
9792
HINTS ${TENSORRT_ROOT}
@@ -101,14 +96,6 @@
10196
MESSAGE(STATUS "Can't find ${NVINFER_LIB}")
10297
endif()
10398

104-
find_library(TENSORRT_LIBRARY_INFER_PLUGIN ${NVINFER_PLUGIN_LIB}
105-
HINTS ${TENSORRT_ROOT}
106-
PATH_SUFFIXES lib lib64 lib/x64)
107-
108-
if (NOT TENSORRT_LIBRARY_INFER_PLUGIN)
109-
MESSAGE(STATUS "Can't find ${NVINFER_PLUGIN_LIB}")
110-
endif()
111-
11299
if (onnxruntime_USE_TENSORRT_BUILTIN_PARSER)
113100
MESSAGE(STATUS "Looking for ${PARSER_LIB}")
114101

@@ -120,7 +107,7 @@
120107
MESSAGE(STATUS "Can't find ${PARSER_LIB}")
121108
endif()
122109

123-
set(TENSORRT_LIBRARY ${TENSORRT_LIBRARY_INFER} ${TENSORRT_LIBRARY_INFER_PLUGIN} ${TENSORRT_LIBRARY_NVONNXPARSER})
110+
set(TENSORRT_LIBRARY ${TENSORRT_LIBRARY_INFER} ${TENSORRT_LIBRARY_NVONNXPARSER})
124111
MESSAGE(STATUS "Find TensorRT libs at ${TENSORRT_LIBRARY}")
125112
else()
126113
if (TRT_GREATER_OR_EQUAL_TRT_10_GA)
@@ -153,15 +140,15 @@
153140
endif()
154141
# Static libraries are just nvonnxparser_static on all platforms
155142
set(onnxparser_link_libs nvonnxparser_static)
156-
set(TENSORRT_LIBRARY ${TENSORRT_LIBRARY_INFER} ${TENSORRT_LIBRARY_INFER_PLUGIN})
143+
set(TENSORRT_LIBRARY ${TENSORRT_LIBRARY_INFER})
157144
MESSAGE(STATUS "Find TensorRT libs at ${TENSORRT_LIBRARY}")
158145
endif()
159146

160147
# ${TENSORRT_LIBRARY} is empty if we link nvonnxparser_static.
161148
# nvonnxparser_static is linked against tensorrt libraries in onnx-tensorrt
162149
# See https://github.com/onnx/onnx-tensorrt/blob/8af13d1b106f58df1e98945a5e7c851ddb5f0791/CMakeLists.txt#L121
163150
# However, starting from TRT 10 GA, nvonnxparser_static doesn't link against tensorrt libraries.
164-
# Therefore, the above code finds ${TENSORRT_LIBRARY_INFER} and ${TENSORRT_LIBRARY_INFER_PLUGIN}.
151+
# Therefore, the above code finds ${TENSORRT_LIBRARY_INFER}.
165152
if(onnxruntime_CUDA_MINIMAL)
166153
set(trt_link_libs ${CMAKE_DL_LIBS} ${TENSORRT_LIBRARY})
167154
else()

cmake/onnxruntime_providers_webgpu.cmake

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -172,10 +172,12 @@
172172
file(MAKE_DIRECTORY ${WGSL_GENERATED_DIR})
173173

174174
# Find all WGSL template input files
175-
file(GLOB_RECURSE WGSL_TEMPLATE_FILES "${ONNXRUNTIME_ROOT}/core/providers/webgpu/*.wgsl.template")
175+
file(GLOB_RECURSE WGSL_TEMPLATE_FILES
176+
"${ONNXRUNTIME_ROOT}/core/providers/webgpu/*.wgsl.template"
177+
"${ONNXRUNTIME_ROOT}/contrib_ops/webgpu/*.wgsl.template")
176178

177179
# Set wgsl-gen command line options as a list
178-
set(WGSL_GEN_OPTIONS "-i" "../" "--output" "${WGSL_GENERATED_DIR}" "-I" "wgsl_template_gen/" "--preserve-code-ref" "--verbose")
180+
set(WGSL_GEN_OPTIONS "-i" "${ONNXRUNTIME_ROOT}/core/providers/webgpu/" "-i" "${ONNXRUNTIME_ROOT}/contrib_ops/webgpu/" "--output" "${WGSL_GENERATED_DIR}" "-I" "wgsl_template_gen/" "--preserve-code-ref" "--verbose")
179181
if (onnxruntime_WGSL_TEMPLATE STREQUAL "static")
180182
if (CMAKE_BUILD_TYPE STREQUAL "Debug")
181183
list(APPEND WGSL_GEN_OPTIONS "--generator" "static-cpp-literal")
@@ -207,10 +209,9 @@
207209
# Add the generated directory to include paths
208210
target_include_directories(onnxruntime_providers_webgpu PRIVATE ${WGSL_GENERATED_ROOT})
209211
elseif(onnxruntime_WGSL_TEMPLATE STREQUAL "dynamic")
210-
add_library(duktape_static STATIC "${duktape_SOURCE_DIR}/src/duktape.c")
211-
target_compile_features(duktape_static PRIVATE c_std_99)
212212
target_link_libraries(onnxruntime_providers_webgpu duktape_static)
213-
target_include_directories(onnxruntime_providers_webgpu PRIVATE ${duktape_SOURCE_DIR}/src)
213+
onnxruntime_add_include_to_target(onnxruntime_providers_webgpu duktape_static)
214+
214215
# Define the path to the generated templates.js file
215216
target_compile_definitions(onnxruntime_providers_webgpu PRIVATE
216217
"ORT_WGSL_TEMPLATES_JS_PATH=\"${WGSL_GENERATED_TEMPLATES_JS}\"")

cmake/vcpkg.json

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,10 @@
9393
"webgpu-ep": {
9494
"description": "Build with WebGPU EP",
9595
"dependencies": []
96+
},
97+
"webgpu-ep-wgsl-template-dynamic": {
98+
"description": "Build with WebGPU EP with dynamic WGSL template code generator",
99+
"dependencies": ["duktape"]
96100
}
97101
},
98102
"overrides": [
@@ -103,6 +107,10 @@
103107
{
104108
"name": "flatbuffers",
105109
"version": "23.5.26"
110+
},
111+
{
112+
"name": "duktape",
113+
"version": "2.7.0#2"
106114
}
107115
]
108116
}

include/onnxruntime/core/graph/graph.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -952,9 +952,12 @@ class Graph { // NOLINT(clang-analyzer-optin.performance.Padding): preserve exi
952952
return const_cast<Graph*>(this)->GetNodeArg(name);
953953
}
954954

955-
// search this and up through any parent_graph_ instance for a NodeArg
955+
// Searches for a NodeArg in the current graph and its parent graphs, and returns the corresponding mutable NodeArg
956956
NodeArg* GetNodeArgIncludingParentGraphs(const std::string& node_arg_name);
957957

958+
// Searches for a NodeArg in the current graph and its parent graphs, and returns the corresponding const NodeArg
959+
const NodeArg* GetNodeArgIncludingParentGraphs(const std::string& node_arg_name) const;
960+
958961
/** Gets a mutable NodeArg by name. Creates a new NodeArg that is owned by this Graph if not found.
959962
@param name The NodeArg name.
960963
@param[in] p_arg_type Optional TypeProto to use if the NodeArg needs to be created.

include/onnxruntime/core/session/onnxruntime_c_api.h

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5748,6 +5748,24 @@ struct OrtApi {
57485748
*/
57495749
ORT_API2_STATUS(Graph_GetParentNode, _In_ const OrtGraph* graph, _Outptr_result_maybenull_ const OrtNode** node);
57505750

5751+
/** \brief Returns an OrtGraph that contains a subset of nodes in the source OrtGraph.
5752+
*
5753+
* Note:
5754+
* The lifetime of "dst_graph" is tied to that of "src_graph", as they both internally reference
5755+
* the same underlying graph.
5756+
*
5757+
* \param[in] src_graph The source OrtGraph instance.
5758+
* \param[in] nodes A subset of the nodes/OrtNodes in 'graph'.
5759+
* \param[in] num_nodes Number of nodes.
5760+
* \param[out] dst_sub_graph An OrtGraph created from a given set of nodes. Must be released by calling ReleaseGraph.
5761+
*
5762+
* \snippet{doc} snippets.dox OrtStatus Return Value
5763+
*
5764+
* \since Version 1.23.
5765+
*/
5766+
ORT_API2_STATUS(Graph_GetGraphView, _In_ const OrtGraph* src_graph, _In_ const OrtNode** nodes,
5767+
_In_ size_t num_nodes, _Outptr_ OrtGraph** dst_graph);
5768+
57515769
/// @}
57525770

57535771
/// \name OrtNode

onnxruntime/core/graph/ep_api_types.cc

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -505,10 +505,34 @@ void EpGraph::IndexToEpNodeMap::SetEpNode(NodeIndex node_index, EpNode* ep_node)
505505
EpGraph::EpGraph(const GraphViewer& graph_viewer, PrivateTag)
506506
: OrtGraph(OrtGraphIrApi::kEpApi), graph_viewer_(graph_viewer) {}
507507

508+
EpGraph::EpGraph(std::unique_ptr<GraphViewer> graph_viewer,
509+
std::unique_ptr<IndexedSubGraph> indexed_sub_graph,
510+
PrivateTag)
511+
: OrtGraph(OrtGraphIrApi::kEpApi),
512+
graph_viewer_(*graph_viewer.get()),
513+
owned_graph_viewer_(std::move(graph_viewer)),
514+
owned_indexed_sub_graph_(std::move(indexed_sub_graph)) {}
515+
508516
// Static class function to create a std::unique_ptr<EpGraph>.
509517
Status EpGraph::Create(const GraphViewer& graph_viewer, /*out*/ std::unique_ptr<EpGraph>& result) {
510518
auto ep_graph = std::make_unique<EpGraph>(graph_viewer, PrivateTag{});
511519

520+
return CreateImpl(std::move(ep_graph), graph_viewer, result);
521+
}
522+
523+
// Static class function to create a std::unique_ptr<EpGraph>.
524+
Status EpGraph::Create(std::unique_ptr<GraphViewer> src_graph_viewer,
525+
std::unique_ptr<IndexedSubGraph> src_indexed_sub_graph,
526+
/*out*/ std::unique_ptr<EpGraph>& result) {
527+
auto& graph_viewer = *src_graph_viewer.get();
528+
auto ep_graph = std::make_unique<EpGraph>(std::move(src_graph_viewer),
529+
std::move(src_indexed_sub_graph),
530+
PrivateTag{});
531+
532+
return CreateImpl(std::move(ep_graph), graph_viewer, result);
533+
}
534+
535+
Status EpGraph::CreateImpl(std::unique_ptr<EpGraph> ep_graph, const GraphViewer& graph_viewer, /*out*/ std::unique_ptr<EpGraph>& result) {
512536
AllocatorPtr initializer_allocator = CPUAllocator::DefaultInstance();
513537
std::unordered_map<std::string, std::unique_ptr<EpValueInfo>> value_infos_map;
514538

onnxruntime/core/graph/ep_api_types.h

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -251,15 +251,32 @@ struct EpGraph : public OrtGraph {
251251

252252
public:
253253
EpGraph(const GraphViewer& graph_viewer, PrivateTag);
254+
EpGraph(std::unique_ptr<GraphViewer> graph_viewer,
255+
std::unique_ptr<IndexedSubGraph> indexed_sub_graph,
256+
PrivateTag);
254257

255258
/// <summary>
256259
/// Creates an instance of EpGraph, which wraps a GraphViewer.
260+
/// This call is used when creating an EpGraph from a GraphViewer instance. The GraphViewer instance is not onwed by this EpGraph.
257261
/// </summary>
258262
/// <param name="graph_viewer"></param>
259263
/// <param name="result"></param>
260264
/// <returns></returns>
261265
static Status Create(const GraphViewer& graph_viewer, /*out*/ std::unique_ptr<EpGraph>& result);
262266

267+
/// <summary>
268+
/// Creates an instance of EpGraph, which wraps a GraphViewer.
269+
/// This call is used when creating an EpGraph from a subset of nodes in another EpGraph.
270+
/// In this case, due to the implementation of OrtApis::Graph_GetGraphView, the new EpGraph instance
271+
/// must take ownership of both the GraphViewer and IndexedSubGraph.
272+
/// </summary>
273+
/// <param name="graph_viewer"></param>
274+
/// <param name="result"></param>
275+
/// <returns></returns>
276+
static Status Create(std::unique_ptr<GraphViewer> graph_viewer,
277+
std::unique_ptr<IndexedSubGraph> indexed_sub_graph,
278+
/*out*/ std::unique_ptr<EpGraph>& result);
279+
263280
// Defines ToExternal() and ToInternal() functions to convert between OrtGraph and EpGraph.
264281
DEFINE_ORT_GRAPH_IR_TO_EXTERNAL_INTERNAL_FUNCS(OrtGraph, EpGraph, OrtGraphIrApi::kEpApi)
265282

@@ -331,9 +348,22 @@ struct EpGraph : public OrtGraph {
331348
const OrtValue* GetInitializerValue(std::string_view name) const;
332349

333350
private:
351+
/// <summary>
352+
/// The real implementation of creating an EpGraph instance.
353+
/// Please use one of the above 'Create' functions that internally call this function, and avoid calling this function directly.
354+
/// </summary>
355+
/// <param name="ep_graph"></param>
356+
/// <param name="graph_viewer"></param>
357+
/// <param name="result"></param>
358+
/// <returns></returns>
359+
static Status CreateImpl(std::unique_ptr<EpGraph> ep_graph, const GraphViewer& graph_viewer, /*out*/ std::unique_ptr<EpGraph>& result);
360+
334361
const GraphViewer& graph_viewer_;
335362
const EpNode* parent_node_ = nullptr;
336363

364+
std::unique_ptr<GraphViewer> owned_graph_viewer_ = nullptr;
365+
std::unique_ptr<IndexedSubGraph> owned_indexed_sub_graph_ = nullptr;
366+
337367
std::vector<std::unique_ptr<EpNode>> nodes_;
338368
IndexToEpNodeMap index_to_ep_node_;
339369

onnxruntime/core/graph/graph.cc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1818,6 +1818,10 @@ NodeArg* Graph::GetNodeArgIncludingParentGraphs(const std::string& node_arg_name
18181818
return node_arg;
18191819
}
18201820

1821+
const NodeArg* Graph::GetNodeArgIncludingParentGraphs(const std::string& node_arg_name) const {
1822+
return const_cast<Graph*>(this)->GetNodeArgIncludingParentGraphs(node_arg_name);
1823+
}
1824+
18211825
void Graph::ReverseDFSFrom(gsl::span<NodeIndex const> from,
18221826
const std::function<void(const Node*)>& enter,
18231827
const std::function<void(const Node*)>& leave,

onnxruntime/core/graph/graph_viewer.cc

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,15 @@ GraphViewer::GraphViewer(const Graph& graph, const IndexedSubGraph* filter_info)
168168
filtered_node_inputs_including_initializers_.reserve(metadef->inputs.size());
169169

170170
for (const auto& input : metadef->inputs) {
171-
const auto* nodearg = graph.GetNodeArg(input);
171+
// NodeArgs from the current scope or any outer scopes should be handled correctly.
172+
//
173+
// There is an edge case where the model consists of a graph with subgraphs nested across three levels.
174+
// In this scenario, a third-layer subgraph consumes an input from the first-layer graph (not an initializer).
175+
// When constructing a new GraphViewer for the second- and third-layer subgraphs,
176+
// the second-layer graph may not have the corresponding value_info for that first-layer input,
177+
// because the second-layer graph itself doesn't consume it.
178+
// Therefore, when working within the second-layer graph, we need to search outer scopes for the missing value_info.
179+
const auto* nodearg = graph.GetNodeArgIncludingParentGraphs(input);
172180
ORT_ENFORCE(nodearg, "Mismatch between Graph and IndexedSubGraph. Input not found:", input);
173181
filtered_node_inputs_including_initializers_.push_back(nodearg);
174182
if (!graph.IsInitializedTensor(input)) {
@@ -177,7 +185,7 @@ GraphViewer::GraphViewer(const Graph& graph, const IndexedSubGraph* filter_info)
177185
}
178186

179187
for (const auto& output : metadef->outputs) {
180-
const auto* nodearg = graph.GetNodeArg(output);
188+
const auto* nodearg = graph.GetNodeArgIncludingParentGraphs(output);
181189
ORT_ENFORCE(nodearg, "Mismatch between Graph and IndexedSubGraph. Output not found:", output);
182190
filtered_node_outputs_.push_back(nodearg);
183191
}

0 commit comments

Comments
 (0)