Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

### Added

- πŸš€ Add BMAD dataset by @code-dev05 in https://github.com/open-edge-platform/anomalib/pull/2900

### Removed

### Changed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ Image datamodules in Anomalib are designed to handle image-based anomaly detecti
```{grid} 3
:gutter: 2
:::{grid-item-card} BMAD
:link: anomalib.data.datamodules.image.bmad
:link-type: doc
BMAD dataset datamodule for medical anomaly detection.
:::
:::{grid-item-card} BTech
:link: anomalib.data.datamodules.image.BTech
:link-type: doc
Expand Down Expand Up @@ -89,7 +96,7 @@ Visual Anomaly dataset.

```{eval-rst}
.. automodule:: anomalib.data
:members: BTech, Datumaro, Folder, Kolektor, MVTecAD, MVTecAD2, MVTecLOCO, RealIAD, Tabular, VAD, Visa
:members: BMAD, BTech, Datumaro, Folder, Kolektor, MVTecAD, MVTecAD2, MVTecLOCO, RealIAD, Tabular, VAD, Visa
:undoc-members:
:show-inheritance:
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# BMAD Datamodule

```{eval-rst}
.. automodule:: anomalib.data.datamodules.image.bmad
:members:
:show-inheritance:
```
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ Anomalib provides various datamodules for handling image-based anomaly detection
```{grid} 3
:gutter: 2

:::{grid-item-card} BMAD
:link: bmad
:link-type: doc

BMAD dataset datamodule for medical anomaly detection.
:::

:::{grid-item-card} BTech
:link: btech
:link-type: doc
Expand Down Expand Up @@ -54,6 +61,7 @@ Visual Anomaly (VisA) dataset datamodule.
:hidden:
:maxdepth: 1

bmad
btech
datumaro
folder
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ Anomalib provides various datamodules for different types of data modalities. Th
```{grid} 3
:gutter: 2

:::{grid-item-card} BMAD
:link: image/bmad
:link-type: doc

BMAD dataset datamodule for medical anomaly detection.
:::

:::{grid-item-card} BTech
:link: image/btech
:link-type: doc
Expand Down
1 change: 1 addition & 0 deletions examples/configs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ The configurations in this folder are organized as follows.
configs/
β”œβ”€β”€ data
β”‚ β”œβ”€β”€ avenue.yaml
β”‚ β”œβ”€β”€ bmad.yaml
β”‚ β”œβ”€β”€ btech.yaml
β”‚ β”œβ”€β”€ folder_3d.yaml
β”‚ β”œβ”€β”€ folder.yaml
Expand Down
12 changes: 12 additions & 0 deletions examples/configs/data/bmad.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
class_path: anomalib.data.BMAD
init_args:
root: ./datasets/BMAD
category: Brain
train_batch_size: 32
eval_batch_size: 32
num_workers: 8
test_split_mode: from_dir
test_split_ratio: 0.2
val_split_mode: from_dir
val_split_ratio: null
seed: null
3 changes: 3 additions & 0 deletions src/anomalib/data/datamodules/image/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
This module contains data modules for loading and processing image datasets for
anomaly detection. The following data modules are available:

- ``BMAD``: BMAD Dataset for Medical Anomaly Detection
- ``BTech``: BTech Surface Defect Dataset
- ``Datumaro``: Dataset in Datumaro format (Intel Getiβ„’ export)
- ``Folder``: Custom folder structure with normal/abnormal images
Expand Down Expand Up @@ -50,6 +51,7 @@ class ImageDataFormat(str, Enum):

The following dataset formats are supported:

- ``BMAD``: BMAD Dataset for Medical Anomaly Detection
- ``BTECH``: BTech Surface Defect Dataset
- ``DATUMARO``: Dataset in Datumaro format
- ``FOLDER``: Custom folder structure
Expand All @@ -66,6 +68,7 @@ class ImageDataFormat(str, Enum):
- ``VISA``: Visual Anomaly Dataset
"""

BMAD = "bmad"
BTECH = "btech"
DATUMARO = "datumaro"
FOLDER = "folder"
Expand Down
37 changes: 19 additions & 18 deletions src/anomalib/data/datamodules/image/bmad.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (C) 2022-2025 Intel Corporation
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

"""BMAD Data Module.
Expand All @@ -14,20 +14,22 @@
Example:
Create a BMAD datamodule::

>>> from your_module import BMADDataModule
>>> datamodule = BMADDataModule(
>>> from anomalib.data import BMAD
>>> datamodule = BMAD(
... root="./datasets/BMAD",
... dataset="brain", # options: "Brain", "Chest", "Histopathology", "Liver", "Retina_OCT2017",
... dataset="Brain", # options: "Brain", "Chest", "Histopathology", "Liver", "Retina_OCT2017",
"Retina_RESC"
... )

Notes:
The dataset will be automatically downloaded and reorganized upon first usage.
Directory structure after preparation may look like:

.. code-block:: text

datasets/
└── BMAD/
β”œβ”€β”€ brain/
β”œβ”€β”€ Brain/
β”‚ β”œβ”€β”€ train/
β”‚ β”‚ └── good/
β”‚ β”œβ”€β”€ valid/
Expand Down Expand Up @@ -67,12 +69,10 @@
logger = logging.getLogger(__name__)

DOWNLOAD_INFO = DownloadInfo(
name="mvtecad",
url="https://www.mydrive.ch/shares/38536/3830184030e49fe74747669442f0f282/"
"download/420938113-1629952094/mvtec_anomaly_detection.tar.xz",
hashsum="cf4313b13603bec67abb49ca959488f7eedce2a9f7795ec54446c649ac98cd3d",
name="bmad",
url="https://huggingface.co/datasets/code-dev05/BMAD/resolve/main/bmad.zip",
hashsum="df655def31f3f638a91c567550de54d6e45a74b2368f666c13a6a3052c063165",
)
# TODO(Devansh Agarwal): Update the download url. # noqa: TD003


class BMAD(AnomalibDataModule):
Expand Down Expand Up @@ -149,8 +149,8 @@ def __init__(
augmentations: Transform | None = None,
test_split_mode: TestSplitMode | str = TestSplitMode.FROM_DIR,
test_split_ratio: float = 0.2,
val_split_mode: ValSplitMode | str = ValSplitMode.SAME_AS_TEST,
val_split_ratio: float = 0.5,
val_split_mode: ValSplitMode | str = ValSplitMode.FROM_DIR,
val_split_ratio: float | None = None,
seed: int | None = None,
) -> None:
super().__init__(
Expand All @@ -177,11 +177,12 @@ def _setup(self, _stage: str | None = None) -> None:
root=self.root,
category=self.category,
)
self.validation_data = BMADDataset(
split="valid",
root=self.root,
category=self.category,
)
if self.val_split_mode == ValSplitMode.FROM_DIR:
self.val_data = BMADDataset(
split="valid",
root=self.root,
category=self.category,
)
self.test_data = BMADDataset(
split=Split.TEST,
root=self.root,
Expand All @@ -208,7 +209,7 @@ def prepare_data(self) -> None:

datasets/
└── BMAD/
β”œβ”€β”€ brain/
β”œβ”€β”€ Brain/
β”œβ”€β”€ Liver/
└── ...
"""
Expand Down
1 change: 1 addition & 0 deletions src/anomalib/data/datasets/image/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
This module provides dataset implementations for various image anomaly detection
datasets:

- ``BMADDataset``: BMAD dataset containing medical images
- ``BTechDataset``: BTech dataset containing industrial objects
- ``DatumaroDataset``: Dataset in Datumaro format (Intel Getiβ„’ export)
- ``FolderDataset``: Custom dataset from folder structure
Expand Down
39 changes: 30 additions & 9 deletions src/anomalib/data/datasets/image/bmad.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (C) 2024 Intel Corporation
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

"""BMAD Dataset.
Expand Down Expand Up @@ -109,12 +109,34 @@ def make_bmad_dataset(path: Path, split: str | Split | None = None) -> DataFrame
"""Create BMAD samples by parsing the dataset structure.

The files are expected to follow the structure:
``path/to/dataset/category/split/image_filename.png``
``path/to/dataset/category/split/mask_filename.png``

.. code-block:: text

path/to/dataset
β”œβ”€β”€ category
β”‚ β”œβ”€β”€ train
β”‚ β”‚ β”œβ”€β”€ good
β”‚ β”‚ β”‚ β”œβ”€β”€ image_filename.png
β”‚ β”‚ β”œβ”€β”€ Ungood
β”‚ β”‚ β”‚ β”œβ”€β”€ img
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ image_filename.png
β”‚ β”‚ β”‚ β”‚ └── label
β”‚ β”‚ β”‚ β”‚ └── image_filename.png
β”‚ β”‚ └── test
β”‚ β”‚ β”œβ”€β”€ good
β”‚ β”‚ β”‚ β”œβ”€β”€ img
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ image_filename.png
β”‚ β”‚ β”‚ β”‚ └── label
β”‚ β”‚ β”‚ β”‚ └── image_filename.png
β”‚ β”‚ └── Ungood
β”‚ β”‚ β”œβ”€β”€ img
β”‚ β”‚ β”‚ β”œβ”€β”€ image_filename.png
β”‚ β”‚ β”‚ └── label


Args:
root (Path | str): Path to dataset root directory.
path (Path | str): Path to dataset root directory.
split (str | Split | None, optional): Dataset split (train/test/val). Defaults to ``None``.
extensions (Sequence[str] | None, optional): Valid image extensions. Defaults to ``None``.

Returns:
DataFrame with columns:
Expand Down Expand Up @@ -175,7 +197,7 @@ def make_bmad_dataset(path: Path, split: str | Split | None = None) -> DataFrame
samples["mask_path"] = None
if len(mask_samples):
samples.loc[
(samples.split == "test") | (samples.split == "valid"),
((samples.split == "test") | (samples.split == "valid")) & (samples.label_index == LabelName.ABNORMAL),
"mask_path",
] = mask_samples.image_path.to_numpy()

Expand All @@ -190,14 +212,13 @@ def make_bmad_dataset(path: Path, split: str | Split | None = None) -> DataFrame
):
msg = (
"Mismatch between anomalous images and ground truth masks. Make sure "
"mask files in 'ground_truth' folder follow the same naming "
"mask files in 'Ungood/label/' folder follow the same naming "
"convention as the anomalous images (e.g. image: '000.png', "
"mask: '000.png' or '000_mask.png')."
"mask: '000.png')."
)
raise MisMatchError(msg)

samples.attrs["task"] = "classification" if (samples["mask_path"] == "").all() else "segmentation"
split = "train"
if split:
samples = samples[samples.split == split].reset_index(drop=True)

Expand Down
30 changes: 30 additions & 0 deletions tests/helpers/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -415,6 +415,36 @@ def _generate_dummy_tabular_dataset(self) -> None:
"""Generate dummy folder structure for tabular dataset in a temporary directory."""
self._generate_dummy_folder_dataset()

def _generate_dummy_bmad_dataset(self) -> None:
"""Generate dummy BMAD dataset in directory."""
dataset_category = "dummy"
# train split. Images are in train/good
split_path = self.dataset_root / dataset_category / "train" / self.normal_category
for i in range(self.num_train):
label = LabelName.NORMAL
image_filename = split_path / f"{i:03}.png"
self.image_generator.generate_image(label=label, image_filename=image_filename)
# Good images are in subset/normal_category/img/i000.png
for split in ("test", "valid"):
split_path = self.dataset_root / dataset_category / split / self.normal_category / "img"
for i in range(self.num_test):
label = LabelName.NORMAL
image_filename = split_path / f"{i:03}.png"
self.image_generator.generate_image(label=label, image_filename=image_filename)
# Abnormal images are in subset/abnormal_category/img/i000.png
# and subset/abnormal_category/label/i000.png
for split in ("test", "valid"):
split_path = self.dataset_root / dataset_category / split / self.abnormal_category
for i in range(self.num_test):
label = LabelName.ABNORMAL
image_filename = split_path / "img" / f"{i:03}.png"
mask_filename = split_path / "label" / f"{i:03}.png"
self.image_generator.generate_image(
label=label,
image_filename=image_filename,
mask_filename=mask_filename,
)

def _generate_dummy_btech_dataset(self) -> None:
"""Generate dummy BeanTech dataset in directory using the same convention as BeanTech AD."""
# BeanTech AD follows the same convention as MVTec AD.
Expand Down
40 changes: 40 additions & 0 deletions tests/unit/data/datamodule/image/test_bmad.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

"""Unit Tests - BMAD Datamodule."""

from pathlib import Path

import pytest
from torchvision.transforms.v2 import Resize

from anomalib.data import BMAD
from anomalib.data.utils import ValSplitMode
from tests.unit.data.datamodule.base.image import _TestAnomalibImageDatamodule


class TestBMAD(_TestAnomalibImageDatamodule):
"""BMAD Datamodule Unit Tests."""

@pytest.fixture()
@staticmethod
def datamodule(dataset_path: Path) -> BMAD:
"""Create and return a BMAD datamodule."""
datamodule_ = BMAD(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this download the dataset ?
(downloading and extracting 20GB could take a while)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this will download the entire dataset

root=dataset_path / "bmad",
category="dummy",
train_batch_size=4,
eval_batch_size=4,
val_split_mode=ValSplitMode.FROM_DIR,
augmentations=Resize((256, 256)),
)
datamodule_.prepare_data()
datamodule_.setup()

return datamodule_

@pytest.fixture()
@staticmethod
def fxt_data_config_path() -> str:
"""Return the path to the test data config."""
return "examples/configs/data/bmad.yaml"
Loading