TinyU-Net: Lighter yet Better U-Net with Cascaded Multi-Receptive Fields
- Link to the code here
- The goal of the paper is to propose a lightweight segmentation network without compromising on the quality of the segmentation maps.
- The authors propose TinyU-Net, a network with a UNet-like architecture based on a new CMRF module (Cascade Multi-Receptive Fields).
Developping models which can run on limited resources is one of the challenges in the medical field to guarantee health equity. Those lightweight models usually try to work on a reduced number of parameters and computation complexity. For the segmentation task however, the existing methods produce results with lower quality than their non-lightweight counterparts due to a lower representation capacity.
There is therefore a need to develop lightweight models which also improve the segmentation quality.
Figure 1: Details of the CMRF (left part of the figure) and the architecture of the TinyUNet (right part of the figure).
Main idea: fuse information from multi-receptive fields with a lightweight cascading strategy
- A pointwise convolution is a convolution that uses a 1x1 kernel.
- A depthwise convolution applies a single convolutional filter for each input channel. Each channel is kept separate contrary to regular convolutions which mix the different channels. This leads to fewer parameters and number of operations.
Figure 2: Depth-wise convolution, from 1
- First step : the input with dimension \((C_{in}\times H\times W)\) is processed by a PWConv-BN-Act block (pointwise convolution + batch norm + activation)
- goal: extract feature information while regulating the number of output channels
- Second step: separation of “odd” feature maps and “even” feature maps to apply two types of operations:
- linear operations: element-wise addition of the “odd” and “even” feature maps (inspired by mixup data augmentation) to have richer features
- cascade operations: the “even” feature maps are fed to \(N-1\) cascaded blocks of DWConv-BN (depthwise convolution + batch norm) which computes features from various receptive fields. Specifically, the deeper in the network the convolution block is, the larger the receptive field is.
- Third step: the ouput of the addition and of the different DWConv-BN blocks are concatenated and processed through a PWConv-BN-Act block to fuse the information from multi-receptive fields while regulating the number of output channels
UNet-like architecture using the CMRF module:
- encoder with four CMRF-Downsampling blocks
- decoder with four Upsampling-concat-CMRF blocks
- final PWConv to output C channels (where C is the number of segmentation label) in a light way
- GELU activation function (Gaussian Error Linear Unit)
- The loss used is the sum of binary cross-entropy loss and dice loss.
- Number of feature maps at the different stages: C1 = 64, C2 = 128, C3 = 256, and C4 = 512
Other training details:
- Architecture implemented in PyTorch
- Adam Optimizer, learning rate of 0.0001
- 300 epochs
- Cosine annealing learning rate scheduler
International Skin Imaging Collaboration (ISIC 2018)
- 3694 camera-acquired dermatologic images
- binary segmentation of skin lesions
Novel Coronavirus Pneumonia (NCP)
- 750 CT slices from 150 COVID-19 patients
- multi-label segmentation: background, lung field, ground-glass opacity, consolidation
The author compare TinyU-Net with lightweight models (with a number of parameters inferior to 5M) and non-lightweight state-of-the-art models. They evaluate the models on the segmentation performance, number of parameters and computation complexity through FLOPs (floating point operations).
Main Segmentation results
Table 1: Results for ISIC2018 dataset
Table 2: Results for NCP dataset
On both datasets, TinyU-Net achieves the best or second-best mean IoU and mean Dice compared to the baselines (both lightweight and non-lightweight).
TinyU-Net also has the lightest model in terms of number of parameters.
It is only the third smallest in terms of FLOPs, behind UNeXt (see paper2 and previous post3) and U-Lite4.
However, those two lightweight networks give sub-optimal segmentation performance.
TinyU-Net is not the only lightweight model to perform better than non-lightweight models: CMUNeXt 5 gives second best performance for mean IoU and mean Dice. An explanation given by the authors is that models with high computation complexity (like the non-lightweight models) might not have an advantage when working on a limited amount of data.
Figure 3: Comparative qualitative results on ISIC2018 (top two rows) and NCP (bottom two rows) datasets.
Visually, TinyU-Net gives satisfactory segmentation performance.
Ablation study
CMRF module
A first ablation study consists in replacing the blocks from two methods (lightweigh and non-lightweight) by a CMRF block. This increases the segmentation performances of these methods while reducing their cost.
Table 3: Ablation results (IoU (%)) for CMRF
Number of DWConv
A second ablation study finds the optimal number of cascade DWConv+BN blocks to be 8.
Table 4: Ablation results (mIoU (%)) for the number of DWConv-BN blocks on NCP dataset
Thanks to its lightweight CMRF module, the proposed TinyU-Net achieves competitive segmentation performance with only 0.48M parameters. The authors also show that their CMRF block can be adapted for other networks.