-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Closed
Labels
triagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Description
Description
If the input layer of an object detector is not square, libnvinfer_plugin does not produce the correct bounding boxes. An incomplete fix was committed to master with #679 by @rajeevsrao . But parts of it are not in master anymore. The issue was also discussed in #807.
Environment
- NVIDIA Jetson TX2
- Jetpack 4.6 [L4T 32.6.1]
- NV Power Mode: MAXP_CORE_ARM - Type: 3
- jetson_stats.service: active
- Libraries:
- CUDA: 10.2.300
- cuDNN: 8.2.1.32
- TensorRT: 8.0.1.6
- Visionworks: 1.6.0.501
- OpenCV: 4.1.1 compiled CUDA: NO
- VPI: ii libnvvpi1 1.1.12 arm64 NVIDIA Vision Programming Interface library
- Vulkan: 1.2.70
Relevant Files and Fix
The following changes fix the issue.
modified plugin/common/kernels/gridAnchorLayer.cu
@@ -34,8 +34,10 @@ __launch_bounds__(nthdsPerCTA) __global__ void gridAnchorKernel(const GridAnchor
* the image Every coordinate will go back to the pixel coordinates in the input image if being multiplied by
* image_input_size Here we implicitly assumes the image input and feature map are square
*/
- float anchorStride = (1.0 / param.H);
- float anchorOffset = 0.5 * anchorStride;
+ float anchorStrideH = (1.0 / param.H);
+ float anchorOffsetH = 0.5 * anchorStrideH;
+ float anchorStrideW = (1.0 / param.W);
+ float anchorOffsetW = 0.5 * anchorStrideW;
int tid = blockIdx.x * blockDim.x + threadIdx.x;
if (tid >= dim)
@@ -47,8 +49,8 @@ __launch_bounds__(nthdsPerCTA) __global__ void gridAnchorKernel(const GridAnchor
const int h = currIndex / param.W;
// Center coordinates
- float yC = h * anchorStride + anchorOffset;
- float xC = w * anchorStride + anchorOffset;
+ float yC = h * anchorStrideH + anchorOffsetH;
+ float xC = w * anchorStrideW + anchorOffsetW;
modified plugin/gridAnchorPlugin/gridAnchorPlugin.cpp
@@ -109,11 +109,13 @@ GridAnchorGenerator::GridAnchorGenerator(const GridAnchorParameters* paramIn, in
std::vector<float> tmpWidths;
std::vector<float> tmpHeights;
+ float featMapAspectRatio = (float) (mParam[0].H) / (float) (mParam[0].W);
+ // TODO: calculate the ratio with the input layer height and width instead.
// Calculate the width and height of the prior boxes
for (int i = 0; i < mNumPriors[id]; i++)
{
float sqrt_AR = sqrt(aspect_ratios[i]);
- tmpWidths.push_back(scales[i] * sqrt_AR);
+ tmpWidths.push_back(scales[i] * sqrt_AR * featMapAspectRatio);
tmpHeights.push_back(scales[i] / sqrt_AR);
}
Steps To Reproduce
- Use a SSD object detection model with a rectangular input layer for example 400x300.
- Convert it to TensorRT like in https://github.com/NVIDIA/TensorRT/tree/main/samples/python/uff_ssd.
- Do inference using the TensorRT model on an image.
- The x coordinates of the output bounding boxes are invalid.
Example of how the plugin is used when doing graph surgeon:
gs.create_plugin_node(
name="MultipleGridAnchorGenerator",
op="GridAnchorRect_TRT",
minSize=0.2,
maxSize=0.95,
aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
variance=[0.1, 0.1, 0.2, 0.2],
featureMapShapes=[40, 23, 20, 12, 10, 6, 5, 3, 3, 2, 2, 1],
numLayers=6,
)
edumotya
Metadata
Metadata
Assignees
Labels
triagedIssue has been triaged by maintainersIssue has been triaged by maintainers