Abstract:
Passable area segmentation is an essential component of navigation assistance for the visually impaired people. Pixelwise semantic segmentation of red, green, and blue (RGB) images can be advanced by exploiting informative features from depth modality. Simple fusion, however, may not lead to satisfactory results as actual depth data are generally noisy, which may worsen the accuracy as the networks go deeper. In this article, we construct a wearable system with a novel lightweight multimodality collaborative network (LM-MCN), which is performing real-time passable area segmentation to assist people walking alone more safely. First, a normal map estimation (NME) module, which can convert dense depth into a normal map with high accuracy and efficiency, is introduced. Moreover, we propose a multimodality feature fusion module (M-MFFM), which can recalibrate and fuse the features from both RGB images and the inferred depth information for accurate passable area segmentation. Furthermore, we publish a new passable area segmentation dataset, named SensingAI-Easy-to-Walk (SETW), collected from sidewalk views in urban areas. Extensive experiments demonstrate that depth information is more appealing since it provides a geometric counterpart to RGB representation. In addition, the experimental results report that our method achieves a 96.50% MaxF with real-time inference speed of 50 FPS for KITTI road, which demonstrates the validity and superiority of LM-MCN.