Skip to content

Commit 0193c44

Browse files
Landanjsravi-mosaicml
authored andcommitted
DeepLabv3+ README (#1066)
Adds DeepLabv3+ README
1 parent 8110f21 commit 0193c44

File tree

3 files changed

+138
-0
lines changed

3 files changed

+138
-0
lines changed
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# 🤿 DeepLabv3+
2+
[\[Example\]](#example) · [\[Architecture\]](#architecture) · [\[Training Hyperparameters\]](#training-hyperparameters) · [\[Attribution\]](#attribution) · [\[API Reference\]](#api-reference)
3+
4+
[DeepLabv3+](https://arxiv.org/abs/1802.02611) is an architecture designed for semantic segmenation i.e. per-pixel classification. DeepLabv3+ takes in a feature map from a backbone architecture (e.g. ResNet-101), then outputs classifications for each pixel in the input image. Our implementation is a simple wrapper around [torchvision’s ResNet](https://pytorch.org/vision/stable/models.html#id10) for the backbone and [mmsegmentation’s DeepLabv3+](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/deeplabv3plus) for the head.
5+
6+
## Example
7+
8+
<!--pytest-codeblocks:skip-->
9+
```python
10+
from composer.models import ComposerDeepLabV3
11+
12+
model = ComposerDeepLabV3(num_classes=150,
13+
backbone_arch="resnet101",
14+
is_backbone_pretrained=True,
15+
backbone_url="https://download.pytorch.org/models/resnet101-cd907fc2.pth",
16+
sync_bn=False
17+
)
18+
```
19+
20+
## Architecture
21+
22+
Based on [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611)
23+
24+
<div align=center>
25+
<img src="https://storage.googleapis.com/docs.mosaicml.com/images/models/deeplabv3_v2.png" alt="deeplabv3plus" width="650">
26+
</div>
27+
28+
29+
- **Backbone network**: converts the input image into a feature map.
30+
* Usually ResNet-101 with the strided convolutions converted to dilations convolutions in stage 3 and 4.
31+
* The 3x3 convolutions in stage 3 and 4 have dilation sizes of 2 and 4, respectively, to compensate for the decreased receptive field.
32+
* The average pooling and classification layer are ignored.
33+
- **Spatial Pyramid Pooling**: extracts multi-resolution features from the stage 4 backbone feature map.
34+
* The backbone feature map is processed with four parallel convolution layers with dilations {1, 12, 24, 36} and kernel sizes {1x1, 3x3, 3x3, 3x3}.
35+
* In parallel to the convolutions, global average pool the backbone feature map, then bilinearly upsample to be the same spatial dimension as the feature map.
36+
* Concatenate the outputs from the convolutions and global average pool, then process with a 1x1 convolution.
37+
* The 3x3 convolutions are implemented as depth-wise convolutions to reduce memory and computation cost.
38+
- **Decoder**: converts the output of spatial pyramid pooling (SPP) to class predictions of the same spatial dimension as the input image.
39+
* SPP output is bilinearly upsampled to be the same spatial dimension as the output from the first stage in the backbone network.
40+
* A 1x1 convolution is applied to the first stage activations, then this is concatenated with the upsampled SPP output.
41+
* The concatenation is processed by a 3x3 convolution with dropout followed by a classification layer.
42+
* The predictions are bilinearly upsampled to be the same resolution as the input image.
43+
44+
## Training Hyperparameters
45+
46+
We tested two sets of hyperparameters for DeepLabv3+ trained on the ADE20k dataset.
47+
48+
### Typical ADE20k Model Hyperparameters
49+
50+
```yaml
51+
model:
52+
deeplabv3:
53+
initializers:
54+
- kaiming_normal
55+
- bn_ones
56+
num_classes: 150
57+
backbone_arch: resnet101
58+
is_backbone_pretrained: true
59+
use_plus: true
60+
sync_bn: true
61+
optimizer:
62+
sgd:
63+
lr: 0.01
64+
momentum: 0.9
65+
weight_decay: 5.0e-4
66+
dampening: 0
67+
nesterov: false
68+
schedulers:
69+
- polynomial:
70+
alpha_f: 0.01
71+
power: 0.9
72+
max_duration: 127ep
73+
train_batch_size: 16
74+
precision: amp
75+
```
76+
77+
| Model | mIoU | Time-to-Train on 8xA100 |
78+
| --- | --- | --- |
79+
| ResNet101-DeepLabv3+ | 44.17 +/- 0.17 | 6.385 hr |
80+
81+
### Composer ADE20k Model Hyperparameters
82+
83+
```yaml
84+
model:
85+
deeplabv3:
86+
initializers:
87+
- kaiming_normal
88+
- bn_ones
89+
num_classes: 150
90+
backbone_arch: resnet101
91+
is_backbone_pretrained: true
92+
use_plus: true
93+
sync_bn: true
94+
# New Pytorch pretrained weights
95+
backbone_url: https://download.pytorch.org/models/resnet101-cd907fc2.pth
96+
optimizer:
97+
decoupled_sgdw:
98+
lr: 0.01
99+
momentum: 0.9
100+
weight_decay: 2.0e-5
101+
dampening: 0
102+
nesterov: false
103+
schedulers:
104+
- cosine_decay:
105+
t_max: 1dur
106+
max_duration: 128ep
107+
train_batch_size: 32
108+
precision: amp
109+
```
110+
111+
| Model | mIoU | Time-to-Train on 8xA100 |
112+
| --- | --- | --- |
113+
| ResNet101-DeepLabv3+ | 45.764 +/- 0.29 | 4.67 hr |
114+
115+
Improvements:
116+
117+
- New PyTorch pretrained weights
118+
- Cosine decay
119+
- Decoupled Weight Decay
120+
- Increase batch size to 32
121+
- Decrease weight decay to 2e-5
122+
123+
## Attribution
124+
125+
[Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611) by Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam
126+
127+
[OpenMMLab Semantic Segmentation Toolbox and Benchmark](https://github.com/open-mmlab/mmsegmentation)
128+
129+
[How to Train State-Of-The-Art Models Using TorchVision’s Latest Primitives](https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/) by Vasilis Vryniotis
130+
131+
## API Reference
132+
133+
```{eval-rst}
134+
.. autoclass:: composer.models.deeplabv3.ComposerDeepLabV3
135+
:noindex:
136+
```

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,7 @@ Composer is part of the broader Machine Learning community, and we welcome any c
113113
:caption: Model Library
114114

115115
model_cards/cifar_resnet.md
116+
model_cards/deeplabv3.md
116117
model_cards/efficientnet.md
117118
model_cards/GPT2.md
118119
model_cards/resnet.md
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../../composer/models/deeplabv3/README.md

0 commit comments

Comments
 (0)