🚀 feat(model): Make coreset selection for patchcore faster #2968

rajeshgangireddy · 2025-09-24T14:09:24Z

📝 Description

Coreset selection is the most time-consuming part of patchcore. Upon profiling, it is understood that there are two problems

F.pairwise_distance() is not fast and is not required as well. It is therefore replaced with torch.linalg.norm().
CPU-GPU/XPU Transfer Overhead: urEnqueueKernelLaunch (20.94% CPU time) and urEnqueueUSMMemcpy (5.67% CPU time) indicate significant kernel launch overhead and memory copy operations. To this, i have tried to reduce the calls to .item() and using indices on cpu while features are on a different device.

Results

These are the results on different hardware with different dataset sizes.
Numbers are averaged across 5 runs with different seeds.

Additional info

There is scope for cleaning up the code in class KCenterGreedy and to make it more better suited to patchcore. For example, def select_coreset_idxs(self, selected_idxs: list[int] | None = None) is always called without any params when used in patchcore.

✨ Changes

Select what type of change your PR is:

✅ Checklist

Before you submit your pull request, please make sure you have completed the following steps:

📚 I have made the necessary updates to the documentation (if applicable).
🧪 I have written tests that support my changes and prove that my fix is effective or my feature works (if applicable).
🏷️ My PR title follows conventional commit format.

For more information about code review checklists, see the Code Review Checklist.

Signed-off-by: rajeshgangireddy <[email protected]>

…ark function

…d benchmark scripts

rajeshgangireddy · 2025-09-29T15:05:47Z

Setting it to draft yet, as I want to check and remove if we even need the case where centers are more than 1.
I am 90% sure that this will not be needed - but i haven't spent time to fully confirm this.

…rs to simplify

rajeshgangireddy · 2025-10-03T16:03:28Z

There is no difference between the accuracy metrics.
The above numbers are averaged across 5 runs with different seed numbers.

rajeshgangireddy · 2025-10-03T16:04:28Z

Setting it to draft yet, as I want to check and remove if we even need the case where centers are more than 1. I am 90% sure that this will not be needed - but i haven't spent time to fully confirm this.

I have made some changes to cleanup dead code (multiple centers case).

Copilot

Pull Request Overview

This PR optimizes the coreset selection algorithm in PatchCore by replacing inefficient distance calculations and reducing CPU-GPU memory transfer overhead. The changes aim to significantly improve performance, particularly for large datasets.

Replaced F.pairwise_distance() with torch.linalg.norm() for faster distance calculations
Reduced CPU-GPU transfer overhead by minimizing .item() calls and keeping tensor operations on device
Simplified the API by removing unused selected_idxs parameter from coreset selection methods

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-03T16:12:57Z

src/anomalib/models/components/sampling/k_center_greedy.py


-            distance = F.pairwise_distance(self.features, centers, p=2).reshape(-1, 1)
+            # Ensure centers is a 1-d tensor for broadcasting
+            centers = centers.squeeze(0) if centers.dim() == 2 else centers


The condition centers.dim() == 2 assumes centers will only have 2 dimensions when it needs squeezing, but if cluster_centers is a 1-d tensor with shape [1], centers could already be 1-d and not need squeezing. This could cause issues if centers has other dimensions. Consider using centers = centers.squeeze() to remove all dimensions of size 1, or add more specific dimension checks.

Suggested change

centers = centers.squeeze(0) if centers.dim() == 2 else centers

centers = centers.squeeze()

Copilot · 2025-10-03T16:12:57Z

src/anomalib/models/components/sampling/k_center_greedy.py

        """
        if isinstance(self.min_distances, torch.Tensor):
-            idx = int(torch.argmax(self.min_distances).item())
+            _, idx = torch.max(self.min_distances.squeeze(), dim=0)


Using squeeze() without arguments removes all dimensions of size 1, which could cause issues if min_distances has multiple dimensions that shouldn't be squeezed. Since the method expects a column vector (from keepdim=True in line 78), consider using squeeze(1) to only remove the second dimension, or verify the tensor shape before squeezing.

Suggested change

_, idx = torch.max(self.min_distances.squeeze(), dim=0)

_, idx = torch.max(self.min_distances.squeeze(1), dim=0)

Copilot · 2025-10-03T16:12:58Z

src/anomalib/models/components/sampling/k_center_greedy.py

+        selected_coreset_idxs: list[torch.Tensor] = []
        for _ in tqdm(range(self.coreset_size), desc="Selecting Coreset Indices."):
-            self.update_distances(cluster_centers=[idx])
+            self.update_distances(cluster_centers=idx.unsqueeze(0))
            idx = self.get_new_idx()
-            if idx in selected_idxs:
-                msg = "New indices should not be in selected indices."
-                raise ValueError(msg)
-            self.min_distances[idx] = 0
+            self.min_distances.scatter_(0, idx.unsqueeze(0).unsqueeze(1), 0.0)
            selected_coreset_idxs.append(idx)

-        return selected_coreset_idxs
+        return [int(tensor_idx.item()) for tensor_idx in selected_coreset_idxs]


The type annotation indicates a list of tensors, but the final return statement converts these to integers. Consider using list[int] as the type annotation and converting tensors to integers immediately when appending, or keep tensors throughout and convert only at return.

Copilot · 2025-10-03T16:12:58Z

src/anomalib/models/components/sampling/k_center_greedy.py

+        selected_coreset_idxs: list[torch.Tensor] = []
        for _ in tqdm(range(self.coreset_size), desc="Selecting Coreset Indices."):
-            self.update_distances(cluster_centers=[idx])
+            self.update_distances(cluster_centers=idx.unsqueeze(0))


The method signature accepts int | torch.Tensor | None but here you're always passing a tensor with an extra dimension. This inconsistency with the parameter handling in update_distances could lead to confusion. Consider either updating the method to handle the tensor directly or consistently passing the same type.

Suggested change

self.update_distances(cluster_centers=idx.unsqueeze(0))

self.update_distances(cluster_centers=idx)

rajeshgangireddy added 5 commits September 22, 2025 17:13

🚀 feat(model): make improvements for patchcore on xpu

cd833a9

🔄 refactor(k_center_greedy): remove profiling code.

7d637d7

Signed-off-by: rajeshgangireddy <[email protected]>

🚀 feat(benchmark): add seed setting for reproducibility in run_benchm…

55b40a9

…ark function

🔄 refactor(k_center_greedy): enhance device handling and remove unuse…

86f1488

…d benchmark scripts

Merge branch 'main' into faster_patchcore

251df36

🔄 refactor(k_center_greedy): remove the unused case of multiple cente…

58a9e1a

…rs to simplify

rajeshgangireddy marked this pull request as ready for review October 3, 2025 16:03

rajeshgangireddy requested a review from samet-akcay as a code owner October 3, 2025 16:03

rajeshgangireddy requested review from Copilot and ashwinvaidya17 October 3, 2025 16:12

Copilot AI reviewed Oct 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🚀 feat(model): Make coreset selection for patchcore faster #2968

🚀 feat(model): Make coreset selection for patchcore faster #2968

Uh oh!

rajeshgangireddy commented Sep 24, 2025 •

edited

Loading

Uh oh!

rajeshgangireddy commented Sep 29, 2025

Uh oh!

rajeshgangireddy commented Oct 3, 2025

Uh oh!

rajeshgangireddy commented Oct 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Uh oh!

	centers = centers.squeeze(0) if centers.dim() == 2 else centers
	centers = centers.squeeze()

	_, idx = torch.max(self.min_distances.squeeze(), dim=0)
	_, idx = torch.max(self.min_distances.squeeze(1), dim=0)

	self.update_distances(cluster_centers=idx.unsqueeze(0))
	self.update_distances(cluster_centers=idx)

🚀 feat(model): Make coreset selection for patchcore faster #2968

Are you sure you want to change the base?

🚀 feat(model): Make coreset selection for patchcore faster #2968

Uh oh!

Conversation

rajeshgangireddy commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Description

Results

Additional info

✨ Changes

✅ Checklist

Uh oh!

rajeshgangireddy commented Sep 29, 2025

Uh oh!

rajeshgangireddy commented Oct 3, 2025

Uh oh!

rajeshgangireddy commented Oct 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rajeshgangireddy commented Sep 24, 2025 •

edited

Loading