Merge pull request #2088 from MengzhangLI/transforms_doc

[Doc] Update transforms Doc

Merge pull request #2088 from MengzhangLI/transforms_doc
[Doc] Update transforms Doc
124ec820 · Miao Zheng · GitHub · 0c87f7a0 · 5b5968d2 · 124ec820
Unverified Commit 124ec820 authored 2 years ago by Miao Zheng Committed by GitHub 2 years ago
--- a/docs/en/advanced_guides/transforms.md
+++ b/docs/en/advanced_guides/transforms.md
 # Data Transforms

+In this tutorial, we introduce the design of transforms pipeline in MMSegmentation.
+
+The structure of this guide is as follows:
+
+- [Data Transforms](#data-transforms)
+  - [Design of Data pipelines](#design-of-data-pipelines)
+  - [Customization data transformation](#customization-data-transformation)
+
 ## Design of Data pipelines

 Following typical conventions, we use `Dataset` and `DataLoader` for data loading
@@ -10,13 +18,31 @@ we introduce a new `DataContainer` type in MMCV to help collect and distribute
 data of different size.
 See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.

-The data preparation pipeline and the dataset is decomposed. Usually a dataset
+In 1.x version of MMSegmentation, all data transformations are inherited from [`BaseTransform`](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/transforms/base.py#L6).
+The input and output types of transformations are both dict. A simple example is as follows:
+
+```python
+>>> from mmseg.datasets.transforms import LoadAnnotations
+>>> transforms = LoadAnnotations()
+>>> img_path = './data/cityscapes/leftImg8bit/train/aachen/aachen_000000_000019_leftImg8bit.png.png'
+>>> gt_path = './data/cityscapes/gtFine/train/aachen/aachen_000015_000019_gtFine_instanceTrainIds.png'
+>>> results = dict(
+>>>     img_path=img_path,
+>>>     seg_map_path=gt_path,
+>>>     reduce_zero_label=False,
+>>>     seg_fields=[])
+>>> data_dict = transforms(results)
+>>> print(data_dict.keys())
+dict_keys(['img_path', 'seg_map_path', 'reduce_zero_label', 'seg_fields', 'gt_seg_map'])
+```
+
+The data preparation pipeline and the dataset are decomposed. Usually a dataset
 defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict.
-A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform.
+A pipeline consists of a sequence of operations. Each operation takes a dict as input and also outputs a dict for the next transform.

 The operations are categorized into data loading, pre-processing, formatting and test-time augmentation.

-Here is an pipeline example for PSPNet.
+Here is a pipeline example for PSPNet.

 ```python
 crop_size = (512, 1024)
@@ -37,53 +63,110 @@ test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='Resize', scale=(2048, 1024), keep_ratio=True),
    # add loading annotation after ``Resize`` because ground truth
-    # does not need to do resize data transform
+    # does not need to resize data transform
    dict(type='LoadAnnotations'),
    dict(type='PackSegInputs')
 ]
 ```

-For each operation, we list the related dict fields that are added/updated/removed.
-Before pipelines, the information we can directly obtain from the datasets are img_path, seg_map_path.
+For each operation, we list the related dict fields that are `added`/`updated`/`removed`.
+Before pipelines, the information we can directly obtain from the datasets are `img_path` and `seg_map_path`.

 ### Data loading

-`LoadImageFromFile`
+`LoadImageFromFile`: Load an image from file.

- add: img, img_shape, ori_shape
+- add: `img`, `img_shape`, `ori_shape`

-`LoadAnnotations`
+`LoadAnnotations`: Load semantic segmentation maps provided by dataset.

- add: seg_fields, gt_seg_map
+- add: `seg_fields`, `gt_seg_map`

 ### Pre-processing

-`RandomResize`
+`RandomResize`: Random resize image & segmentation map.

- add: scale, scale_factor, keep_ratio
- update: img, img_shape, gt_seg_map
+- add: `scale`, `scale_factor`, `keep_ratio`
+- update: `img`, `img_shape`, `gt_seg_map`

-`Resize`
+`Resize`: Resize image & segmentation map.

- add: scale, scale_factor, keep_ratio
- update: img, gt_seg_map, img_shape
+- add: `scale`, `scale_factor`, `keep_ratio`
+- update: `img`, `gt_seg_map`, `img_shape`

-`RandomCrop`
+`RandomCrop`: Random crop image & segmentation map.

- update: img, pad_shape, gt_seg_map
+- update: `img`, `gt_seg_map`, `img_shape`.

-`RandomFlip`
+`RandomFlip`: Flip the image & segmentation map.

- add: flip, flip_direction
- update: img, gt_seg_map
+- add: `flip`, `flip_direction`
+- update: `img`, `gt_seg_map`

-`PhotoMetricDistortion`
+`PhotoMetricDistortion`: Apply photometric distortion to image sequentially,
+every transformation is applied with a probability of 0.5.
+The position of random contrast is in second or second to last(mode 0 or 1 below, respectively).

- update: img
+```
+1. random brightness
+2. random contrast (mode 0)
+3. convert color from BGR to HSV
+4. random saturation
+5. random hue
+6. convert color from HSV to BGR
+7. random contrast (mode 1)
+```
+
+- update: `img`

 ### Formatting

-`PackSegInputs`
+`PackSegInputs`: Pack the inputs data for the semantic segmentation.

- add: inputs, data_sample
+- add: `inputs`, `data_sample`
 - remove: keys specified by `meta_keys` (merged into the metainfo of data_sample), all other keys
+
+## Customization data transformation
+
+The customized data transformation must inherited from `BaseTransform` and implement `transform` function.
+Here we use a simple flipping transformation as example:
+
+```python
+import random
+import mmcv
+from mmcv.transforms import BaseTransform, TRANSFORMS
+
+@TRANSFORMS.register_module()
+class MyFlip(BaseTransform):
+    def __init__(self, direction: str):
+        super().__init__()
+        self.direction = direction
+
+    def transform(self, results: dict) -> dict:
+        img = results['img']
+        results['img'] = mmcv.imflip(img, direction=self.direction)
+        return results
+```
+
+Thus, we can instantiate a `MyFlip` object and use it to process the data dict.
+
+```python
+import numpy as np
+
+transform = MyFlip(direction='horizontal')
+data_dict = {'img': np.random.rand(224, 224, 3)}
+data_dict = transform(data_dict)
+processed_img = data_dict['img']
+```
+
+Or, we can use `MyFlip` transformation in data pipeline in our config file.
+
+```python
+pipeline = [
+    ...
+    dict(type='MyFlip', direction='horizontal'),
+    ...
+]
+```
+
+Note that if you want to use `MyFlip` in config, you must ensure the file containing `MyFlip` is imported during runtime.