Abstract:
Accurately delineating individual teeth and the gingiva in the three-dimension (3D) intraoral scanned (IOS) mesh data plays a pivotal role in many digital dental applications, e.g., orthodontics. Recent research shows that deep learning based methods can achieve promising results for 3D tooth segmentation, however, most of them rely on high-quality labeled dataset which is usually of small scales as annotating IOS meshes requires intensive human efforts. In this paper, we propose a novel self-supervised learning framework, named STSNet, to boost the performance of 3D tooth segmentation leveraging on large-scale unlabeled IOS data. The framework follows two-stage training, i.e., pre-training and fine-tuning. In pre-training, three hierarchical-level, i.e., point-level, region-level, cross-level, contrastive losses are proposed for unsupervised representation learning on a set of predefined matched points from different augmented views. The pretrained segmentation backbone is further fine-tuned in a supervised manner with a small number of labeled IOS meshes. With the same amount of annotated samples, our method can achieve an mIoU of 89.88%, significantly outperforming the supervised counterparts. The performance gain becomes more remarkable when only a small amount of labeled samples are available. Furthermore, STSNet can achieve better performance with only 40% of the annotated samples as compared to the fully supervised baselines. To the best of our knowledge, we present the first attempt of unsupervised pre-training for 3D tooth segmentation, demonstrating its strong potential in reducing human efforts for annotation and verification.