You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Demo applications](demo/HuggingFace) showcasing TensorRT inference of [HuggingFace Transformers](https://huggingface.co/transformers).
6
+
- Support is currently extended to GPT-2 and T5 models.
7
+
- Added support for the following ONNX operators:
8
+
-`Einsum`
9
+
-`IsNan`
10
+
-`GatherND`
11
+
-`Scatter`
12
+
-`ScatterElements`
13
+
-`ScatterND`
14
+
-`Sign`
15
+
-`Round`
16
+
- Added support for building TensorRT Python API on Windows.
17
+
18
+
### Updated
19
+
- Notable API updates in TensorRT 8.2.0.6 EA release. See [TensorRT Developer Guide](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html) for details.
20
+
- Added three new APIs, `IExecutionContext: getEnqueueEmitsProfile()`, `setEnqueueEmitsProfile()`, and `reportToProfiler()` which can be used to collect layer profiling info when the inference is launched as a CUDA graph.
21
+
- Eliminated the global logger; each `Runtime`, `Builder` or `Refitter` now has its own logger.
22
+
- Added new operators: `IAssertionLayer`, `IConditionLayer`, `IEinsumLayer`, `IIfConditionalBoundaryLayer`, `IIfConditionalOutputLayer`, `IIfConditionalInputLayer`, and `IScatterLayer`.
23
+
- Added new `IGatherLayer` modes: `kELEMENT` and `kND`
24
+
- Added new `ISliceLayer` modes: `kFILL`, `kCLAMP`, and `kREFLECT`
25
+
- Added new `IUnaryLayer` operators: `kSIGN` and `kROUND`
26
+
- Added new runtime class `IEngineInspector` that can be used to inspect the detailed information of an engine, including the layer parameters, the chosen tactics, the precision used, etc.
27
+
-`ProfilingVerbosity` enums have been updated to show their functionality more explicitly.
28
+
- Updated TensorRT OSS container defaults to cuda 11.4
29
+
- CMake to target C++14 builds.
30
+
- Updated following ONNX operators:
31
+
-`Gather` and `GatherElements` implementations to natively support negative indices
32
+
-`Pad` layer to support ND padding, along with `edge` and `reflect` padding mode support
33
+
-`If` layer with general performance improvements.
34
+
35
+
### Removed
36
+
- Removed `sampleMLP`.
37
+
- Several flags of trtexec have been deprecated:
38
+
-`--explicitBatch` flag has been deprecated and has no effect. When the input model is in UFF or in Caffe prototxt format, the implicit batch dimension mode is used automatically; when the input model is in ONNX format, the explicit batch mode is used automatically.
39
+
-`--explicitPrecision` flag has been deprecated and has no effect. When the input ONNX model contains Quantization/Dequantization nodes, TensorRT automatically uses explicit precision mode.
40
+
-`--nvtxMode=[verbose|default|none]` has been deprecated in favor of `--profilingVerbosity=[detailed|layer_names_only|none]` to show its functionality more explicitly.
* (Cross compilation for Jetson platform) [NVIDIA JetPack](https://developer.nvidia.com/embedded/jetpack) >= 4.6 (current support only for TensorRT 8.0.1)
38
38
* (For Windows builds) [Visual Studio](https://visualstudio.microsoft.com/vs/older-downloads/) 2017 Community or Enterprise edition
39
39
* (Cross compilation for QNX platform) [QNX Toolchain](https://blackberry.qnx.com/en)
3. #### (Optional - for Jetson builds only) Download the JetPack SDK
88
88
1. Download and launch the JetPack SDK manager. Login with your NVIDIA developer account.
89
-
2. Select the platform and target OS (example: Jetson AGX Xavier, `Linux Jetpack 4.4`), and click Continue.
89
+
2. Select the platform and target OS (example: Jetson AGX Xavier, `Linux Jetpack 4.6`), and click Continue.
90
90
3. Under `Download & Install Options` change the download folder and select`Download now, Install later`. Agree to the license terms and click Continue.
91
91
4. Move the extracted files into the `<TensorRT-OSS>/docker/jetpack_files` folder.
92
92
@@ -98,13 +98,13 @@ For Linux platforms, we recommend that you generate a docker container for build
98
98
1. #### Generate the TensorRT-OSS build container.
99
99
The TensorRT-OSS build container can be generated using the supplied Dockerfiles and build script. The build container is configured for building TensorRT OSS out-of-the-box.
100
100
101
-
**Example: Ubuntu 18.04 on x86-64 with cuda-11.3**
101
+
**Example: Ubuntu 18.04 on x86-64 with cuda-11.4.2 (default)**
**Example: Ubuntu 18.04 cross-compile for Jetson (aarch64) with cuda-10.2 (JetPack SDK)**
110
110
```bash
@@ -114,7 +114,7 @@ For Linux platforms, we recommend that you generate a docker container for build
114
114
2. #### Launch the TensorRT-OSS build container.
115
115
**Example: Ubuntu 18.04 build container**
116
116
```bash
117
-
./docker/launch.sh --tag tensorrt-ubuntu18.04-cuda11.3 --gpus all
117
+
./docker/launch.sh --tag tensorrt-ubuntu18.04-cuda11.4 --gpus all
118
118
```
119
119
> NOTE:
120
120
1. Use the `--tag` corresponding to build container generated in Step 1.
@@ -125,7 +125,7 @@ For Linux platforms, we recommend that you generate a docker container for build
125
125
## Building TensorRT-OSS
126
126
* Generate Makefiles or VS project (Windows) and build.
127
127
128
-
**Example: Linux (x86-64) build with default cuda-11.3**
128
+
**Example: Linux (x86-64) build with default cuda-11.4.2**
129
129
```bash
130
130
cd$TRT_OSSPATH
131
131
mkdir -p build &&cd build
@@ -156,21 +156,20 @@ For Linux platforms, we recommend that you generate a docker container for build
156
156
msbuild ALL_BUILD.vcxproj
157
157
```
158
158
> NOTE:
159
-
1. The default CUDA version used by CMake is 11.3.1. To override this, for example to 10.2, append `-DCUDA_VERSION=10.2` to the cmake command.
159
+
1. The default CUDA version used by CMake is 11.4.2. To override this, for example to 10.2, append `-DCUDA_VERSION=10.2` to the cmake command.
160
160
2. If samples fail to link on CentOS7, create this symbolic link: `ln -s $TRT_OUT_DIR/libnvinfer_plugin.so $TRT_OUT_DIR/libnvinfer_plugin.so.8`
161
161
* Required CMake build arguments are:
162
162
- `TRT_LIB_DIR`: Path to the TensorRT installation directory containing libraries.
163
163
- `TRT_OUT_DIR`: Output directory where generated build artifacts will be copied.
164
164
* Optional CMake build arguments:
165
165
- `CMAKE_BUILD_TYPE`: Specify if binaries generated are for release or debug (contain debug symbols). Values consists of [`Release`] |`Debug`
166
-
- `CUDA_VERISON`: The version of CUDA to target, for example [`11.3.1`].
166
+
- `CUDA_VERISON`: The version of CUDA to target, for example [`11.4.2`].
167
167
- `CUDNN_VERSION`: The version of cuDNN to target, for example [`8.2`].
168
168
- `PROTOBUF_VERSION`: The version of Protobuf to use, for example [`3.0.0`]. Note: Changing this will not configure CMake to use a system version of Protobuf, it will configure CMake to download and try building that version.
169
169
- `CMAKE_TOOLCHAIN_FILE`: The path to a toolchain file for cross compilation.
170
170
- `BUILD_PARSERS`: Specify if the parsers should be built, forexample [`ON`] | `OFF`. If turned OFF, CMake will try to find precompiled versions of the parser libraries to usein compiling samples. First in`${TRT_LIB_DIR}`, then on the system. If the build type is Debug, then it will prefer debug builds of the libraries before release versions if available.
171
171
- `BUILD_PLUGINS`: Specify if the plugins should be built, forexample [`ON`] | `OFF`. If turned OFF, CMake will try to find a precompiled version of the plugin library to usein compiling samples. First in`${TRT_LIB_DIR}`, then on the system. If the build type is Debug, then it will prefer debug builds of the libraries before release versions if available.
172
172
- `BUILD_SAMPLES`: Specify if the samples should be built, for example [`ON`] |`OFF`.
173
-
- `CUB_VERSION`: The version of CUB to use, for example [`1.8.0`].
174
173
- `GPU_ARCHS`: GPU (SM) architectures to target. By default we generate CUDA code for all major SMs. Specific SM versions can be specified here as a quoted space-separated list to reduce compilation time and binary size. Table of compute capabilities of NVIDIA GPUs can be found [here](https://developer.nvidia.com/cuda-gpus). Examples:
0 commit comments