Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# Benchmark and Model Zoo
## Common settings
* We use distributed training with 4 GPUs by default.
* All pytorch-style pretrained backbones on ImageNet are train by ourselves, with the same procedure in the [paper](https://arxiv.org/pdf/1812.01187.pdf).
Our ResNet style backbone are based on ResNetV1c variant, where the 7x7 conv in the input stem is replaced with three 3x3 convs.
* For the consistency across different hardwares, we report the GPU memory as the maximum value of `torch.cuda.max_memory_allocated()` for all 4 GPUs with `torch.backends.cudnn.benchmark=False`.
Note that this value is usually less than what `nvidia-smi` shows.
* We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time.
Results are obtained with the script `tools/benchmark.py` which computes the average time on 200 images with `torch.backends.cudnn.benchmark=False`.
* There are two inference modes in this framework.
* `slide` mode: The `test_cfg` will be like `dict(mode='slide', crop_size=(769, 769), stride=(513, 513))`.
In this mode, multiple patches will be cropped from input image, passed into network individually.
The crop size and stride between patches are specified by `crop_size` and `stride`.
The overlapping area will be merged by average
* `whole` mode: The `test_cfg` will be like `dict(mode='whole')`.
In this mode, the whole imaged will be passed into network directly.
* For input size of 8x+1 (e.g. 769), `align_corner=True` is adopted as a traditional practice.
Otherwise, for input size of 8x (e.g. 512, 1024), `align_corner=False` is adopted.
## Baselines
### FCN
Please refer to [FCN](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/fcn) for details.
### PSPNet
Please refer to [PSPNet](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/pspnet) for details.
### DeepLabV3
Please refer to [DeepLabV3](https://github.com/open-mmlab/mmsegmentatio/tree/master/configs/deeplabv3) for details.
### PSANet
Please refer to [PSANet](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/psanet) for details.
### DeepLabV3+
Please refer to [DeepLabV3+](https://github.com/open-mmlab/mmsegmentatio/tree/master/configs/deeplabv3plus) for details.
### UPerNet
Please refer to [UPerNet](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/upernet) for details.
### NonLocal Net
Please refer to [NonLocal Net](https://github.com/open-mmlab/mmsegmentatio/tree/master/configs/nlnet) for details.
### CCNet
Please refer to [CCNet](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/ccnet) for details.
### DANet
Please refer to [DANet](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/danet) for details.
### HRNet
Please refer to [HRNet](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/hrnet) for details.
### GCNet
Please refer to [GCNet](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/gcnet) for details.
### ANN
Please refer to [ANN](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/ann) for details.
### OCRNet
Please refer to [OCRNet](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/ocrnet) for details.
## Speed benchmark
### Hardware
- 8 NVIDIA Tesla V100 (32G) GPUs
- Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
### Software environment
- Python 3.7
- PyTorch 1.5
- CUDA 10.1
- CUDNN 7.6.03
- NCCL 2.4.08
### Training speed
For fair comparison, we benchmark all implementations with ResNet-101V1c.
The input size is fixed to 1024x512 with batch size 2.
The training speed is reported as followed, in terms of second per iter (s/iter). The lower, the better.
| Implementation | PSPNet (s/iter) | DeepLabV3+ (s/iter) |
|----------------|-----------------|---------------------|
| [MMSegmentation](https://github.com/open-mmlab/mmsegmentation) | **0.83** | **0.85** |
| [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) | 0.84 | 0.85 |
| [CASILVision](https://github.com/CSAILVision/semantic-segmentation-pytorch) | 1.15 | N/A |
| [vedaseg](https://github.com/Media-Smart/vedaseg) | 0.95 | 1.25 |
Note: The output stride of DeepLabV3+ is 8.