Abstract:
Automated medical image segmentation for organs or lesions plays an essential role in clinical diagnoses and treatment plannings. However, training an accurate and robust segmentation model is still a long-standing challenge due to the time-consuming and expertise-intensive annotations for training data, especially 3-D medical images. Recently, self-supervised learning emerges as a promising approach for unsupervised visual representation learning, showing great potential to alleviate the expertise annotations for medical images. Although global representation learning has attained remarkable results on iconic datasets, such as ImageNet, it can not be applied directly to medical image segmentation, because the segmentation task is non-iconic, and the targets always vary in physical scales. To address these problems, we propose a Multi-scale Visual Representation self-supervised Learning (MsVRL) model, to perform finer-grained representation and deal with different target scales. Specifically, a multi-scale representation conception, a canvas matching method, an embedding pre-sampling module, a center-ness branch, and a cross-level consistent loss are introduced to improve the performance. After pre-trained on unlabeled datasets (RibFrac and part of MSD), MsVRL performs downstream segmentation tasks on labeled datasets (BCV, spleen of MSD, and KiTS). Results of the experiments show that MsVRL outperforms other state-of-the-art works on these medical image segmentation tasks.