Skip to content
Snippets Groups Projects
transforms.md 5.53 KiB
Newer Older
谢昕辰's avatar
谢昕辰 committed
# Data Transforms
Jiarui XU's avatar
Jiarui XU committed

In this tutorial, we introduce the design of transforms pipeline in MMSegmentation.

The structure of this guide is as follows:

- [Data Transforms](#data-transforms)
  - [Design of Data pipelines](#design-of-data-pipelines)
  - [Customization data transformation](#customization-data-transformation)

Jiarui XU's avatar
Jiarui XU committed
## Design of Data pipelines

Following typical conventions, we use `Dataset` and `DataLoader` for data loading
with multiple workers. `Dataset` returns a dict of data items corresponding
the arguments of models' forward method.
Since the data in semantic segmentation may not be the same size,
we introduce a new `DataContainer` type in MMCV to help collect and distribute
data of different size.
See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.

MengzhangLI's avatar
MengzhangLI committed
In 1.x version of MMSegmentation, all data transformations are inherited from [`BaseTransform`](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/transforms/base.py#L6).
The input and output types of transformations are both dict. A simple example is as follows:

```python
>>> from mmseg.datasets.transforms import LoadAnnotations
>>> transforms = LoadAnnotations()
>>> img_path = './data/cityscapes/leftImg8bit/train/aachen/aachen_000000_000019_leftImg8bit.png.png'
>>> gt_path = './data/cityscapes/gtFine/train/aachen/aachen_000015_000019_gtFine_instanceTrainIds.png'
>>> results = dict(
>>>     img_path=img_path,
>>>     seg_map_path=gt_path,
>>>     reduce_zero_label=False,
>>>     seg_fields=[])
>>> data_dict = transforms(results)
>>> print(data_dict.keys())
dict_keys(['img_path', 'seg_map_path', 'reduce_zero_label', 'seg_fields', 'gt_seg_map'])
```

MengzhangLI's avatar
MengzhangLI committed
The data preparation pipeline and the dataset are decomposed. Usually a dataset
Jiarui XU's avatar
Jiarui XU committed
defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict.
MengzhangLI's avatar
MengzhangLI committed
A pipeline consists of a sequence of operations. Each operation takes a dict as input and also outputs a dict for the next transform.
Jiarui XU's avatar
Jiarui XU committed

The operations are categorized into data loading, pre-processing, formatting and test-time augmentation.

MengzhangLI's avatar
MengzhangLI committed
Here is a pipeline example for PSPNet.
Jiarui XU's avatar
Jiarui XU committed

```python
crop_size = (512, 1024)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
谢昕辰's avatar
谢昕辰 committed
    dict(
        type='RandomResize',
        scale=(2048, 1024),
        ratio_range=(0.5, 2.0),
        keep_ratio=True),
Jiarui XU's avatar
Jiarui XU committed
    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
谢昕辰's avatar
谢昕辰 committed
    dict(type='RandomFlip', prob=0.5),
Jiarui XU's avatar
Jiarui XU committed
    dict(type='PhotoMetricDistortion'),
谢昕辰's avatar
谢昕辰 committed
    dict(type='PackSegInputs')
Jiarui XU's avatar
Jiarui XU committed
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
谢昕辰's avatar
谢昕辰 committed
    dict(type='Resize', scale=(2048, 1024), keep_ratio=True),
    # add loading annotation after ``Resize`` because ground truth
MengzhangLI's avatar
MengzhangLI committed
    # does not need to resize data transform
谢昕辰's avatar
谢昕辰 committed
    dict(type='LoadAnnotations'),
    dict(type='PackSegInputs')
Jiarui XU's avatar
Jiarui XU committed
]
```

For each operation, we list the related dict fields that are `added`/`updated`/`removed`.
Before pipelines, the information we can directly obtain from the datasets are `img_path` and `seg_map_path`.
Jiarui XU's avatar
Jiarui XU committed

### Data loading

`LoadImageFromFile`: Load an image from file.
- add: `img`, `img_shape`, `ori_shape`
Jiarui XU's avatar
Jiarui XU committed

`LoadAnnotations`: Load semantic segmentation maps provided by dataset.
- add: `seg_fields`, `gt_seg_map`
Jiarui XU's avatar
Jiarui XU committed

### Pre-processing

`RandomResize`: Random resize image & segmentation map.
- add: `scale`, `scale_factor`, `keep_ratio`
- update: `img`, `img_shape`, `gt_seg_map`
Jiarui XU's avatar
Jiarui XU committed

`Resize`: Resize image & segmentation map.
- add: `scale`, `scale_factor`, `keep_ratio`
- update: `img`, `gt_seg_map`, `img_shape`
Jiarui XU's avatar
Jiarui XU committed

`RandomCrop`: Random crop image & segmentation map.
- update: `img`, `gt_seg_map`, `img_shape`.
Jiarui XU's avatar
Jiarui XU committed

`RandomFlip`: Flip the image & segmentation map.
- add: `flip`, `flip_direction`
- update: `img`, `gt_seg_map`
Jiarui XU's avatar
Jiarui XU committed

`PhotoMetricDistortion`: Apply photometric distortion to image sequentially,
every transformation is applied with a probability of 0.5.
The position of random contrast is in second or second to last(mode 0 or 1 below, respectively).
```
1. random brightness
2. random contrast (mode 0)
3. convert color from BGR to HSV
4. random saturation
5. random hue
6. convert color from HSV to BGR
7. random contrast (mode 1)
```

- update: `img`
Jiarui XU's avatar
Jiarui XU committed

### Formatting

`PackSegInputs`: Pack the inputs data for the semantic segmentation.
Jiarui XU's avatar
Jiarui XU committed

- add: `inputs`, `data_sample`
谢昕辰's avatar
谢昕辰 committed
- remove: keys specified by `meta_keys` (merged into the metainfo of data_sample), all other keys

## Customization data transformation

MengzhangLI's avatar
MengzhangLI committed
The customized data transformation must inherited from `BaseTransform` and implement `transform` function.
Here we use a simple flipping transformation as example:

```python
import random
import mmcv
from mmcv.transforms import BaseTransform, TRANSFORMS

@TRANSFORMS.register_module()
class MyFlip(BaseTransform):
    def __init__(self, direction: str):
        super().__init__()
        self.direction = direction

    def transform(self, results: dict) -> dict:
        img = results['img']
        results['img'] = mmcv.imflip(img, direction=self.direction)
        return results
```

Thus, we can instantiate a `MyFlip` object and use it to process the data dict.

```python
import numpy as np

transform = MyFlip(direction='horizontal')
data_dict = {'img': np.random.rand(224, 224, 3)}
data_dict = transform(data_dict)
processed_img = data_dict['img']
```

Or, we can use `MyFlip` transformation in data pipeline in our config file.

```python
pipeline = [
    ...
    dict(type='MyFlip', direction='horizontal'),
    ...
]
```

MengzhangLI's avatar
MengzhangLI committed
Note that if you want to use `MyFlip` in config, you must ensure the file containing `MyFlip` is imported during runtime.