model_zoo.md



Benchmark and Model Zoo

Common settings


We use distributed training with 4 GPUs by default.


All pytorch-style pretrained backbones on ImageNet are train by ourselves, with the same procedure in the paper.
Our ResNet style backbone are based on ResNetV1c variant, where the 7x7 conv in the input stem is replaced with three 3x3 convs.


For the consistency across different hardwares, we report the GPU memory as the maximum value of torch.cuda.max_memory_allocated() for all 4 GPUs with torch.backends.cudnn.benchmark=False.
Note that this value is usually less than what nvidia-smi shows.


We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time.
Results are obtained with the script tools/benchmark.py which computes the average time on 200 images with torch.backends.cudnn.benchmark=False.


There are two inference modes in this framework.


slide mode: The test_cfg will be like dict(mode='slide', crop_size=(769, 769), stride=(513, 513)).
In this mode, multiple patches will be cropped from input image, passed into network individually.
The crop size and stride between patches are specified by crop_size and stride.
The overlapping area will be merged by average


whole mode: The test_cfg will be like dict(mode='whole').
In this mode, the whole imaged will be passed into network directly.
By default, we use slide inference for 769x769 trained model, whole inference for the rest.


For input size of 8x+1 (e.g. 769), align_corner=True is adopted as a traditional practice.
Otherwise, for input size of 8x (e.g. 512, 1024), align_corner=False is adopted.


Baselines

FCN
Please refer to FCN for details.

PSPNet
Please refer to PSPNet for details.

DeepLabV3
Please refer to DeepLabV3 for details.

PSANet
Please refer to PSANet for details.

DeepLabV3+
Please refer to DeepLabV3+ for details.

UPerNet
Please refer to UPerNet for details.

NonLocal Net
Please refer to NonLocal Net for details.

EncNet
Please refer to EncNet for details.