Notes

The generated synthetic dataset is available at Academics Torrents
A re-implementation is available at the following Monai link

Highlights

Application of the Latent Diffusion Model framework for MR image synthesis of the human brain
An encoder/decoder model dedicated to brain MRI reconstruction is proposed
Investigation of the application of conditioning to age, gender, ventricular volumes and brain volumes
The model achieves new SOTA for brain MR image synthesis
A synthetic dataset of 100,000 volumes, along with the conditioning information, is publicly available

Introduction

The objective of the paper is to generate a realistic large scale dataset with additional related “low dimensional” information such as age, sex or volumes.
31,740 T1w 3D MR images from the UK Biobank datas are used during training
One interest of such a dataset would be to provide enough data to learn to retrieve the age of a patient based on their brain MR image while guaranteeing privacy.

Methodology

Preprocessing steps

An existing network called UnitRes was used to perform a rigid body registration to a common MNI space
The final images are resampled to a uniform resolution of \(1 \, mm^3\)
The images are all cropped to a consistent volume size of \(160 \times 224 \times 160\) voxels

LDM architecture

The method is directly inspired by the latent diffusion model whose architecture is summarized below:

The autoencoder is first trained with a combination of L1 loss, perceptual loss, a patch-based adversarial objective and a KL regularization of the latent space
The encoder maps the brain image to a latent representation with a size of \(20 \times 28 \times 20\) voxels
The diffusion model is then trained using \(1000\) steps for the Markov chain process
The model is conditioned according to age, gender, ventricular volume and brain volume
The conditioning is performed by combining the concatenation of the conditioning with the input data and the use of cross-attention mechanisms

Results

The autoencoder compressed each dimension of the input data by a factor of 8
DDIM is used during inference to reduce from \(1000\) to \(50\) the number of time steps during sampling. This reduces the average sampling time from \(142 \pm 1.6\)s to \(7.6 \pm 0.2\)s
The degree of realism of the synthetic data is measured using the Fréchet Inception Distance(FID), and the diversity of the data is measured with the Multi-Scale Structural Similarity metric (MS-SSIM) and the 4-G-R-SSIM

Quality of the synthetic data

Measures were computed from 1000 sample pairs from the UK Biobank and the synthetic data
The model achieves new SOTA for brain MR image synthesis

Figure 1. Quantitative evaluation of the synthetic images on the UK Biobank

Figure 2. Real and synthetic samples of brain MRI

Conditioning on the ventricular volumes

To quantitatively evaluate the conditioning, SynthSeg was used to measure the volumes of the ventricles of 1000 synthetic brains
The Pearson correlation was computed between the obtained volumes and the inputted conditioning values
High correlation score of \(0.972\)

Figure 3. Correlation between inputted ventricular volumes and ventricular measured with SynthSeg

Conditioning on the age

A 3D CNN proposed in [1] was trained from the same UK Biobank dataset. The model takes as input a 3D brain image and predicts chronological age
The same model is then used on the synthetic dataset to verify how closely the predicted age matches the inputted age of the synthetic dataset
Good correlation score of \(0.692\)

Figure 4. Correlation between inputted age and predicted brain age

Synthetic dataset

A synthetic dataset of 100,000 human brain images was generated and made publicly available together with the conditioning information

Conclusions

Latent diffusion model is cool ;)
The key resides in the autoencoder performance !
Is a database of 31,740 images really necessary ?
We need to think carefully about the additive value of the conditioning information chosen to simulate a useful synthetic dataset !

References

[1] Cole, J.H., Poudel, R.P., Tsagkrasoulis, D., Caan, M.W., Steves, C., Spector, T.D., Montana, G., Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker, NeuroImage 163, 115–124 (2017)