Visual Perception Generalization for Vision and Language Navigation via Meta Learning

Visual Perception Generalization for Vision and Language Navigation via Meta Learning

Abstract:

Precision farming robots offer the potential to reduce the amount of used agrochemicals through targeted interventions and thus are a promising step towards sustainable agriculture. A prerequisite for such systems is a robust plant classification system that can identify crops and weeds in various agricultural fields. Most vision-based systems train convolutional neural networks (CNNs) on a given dataset, i.e., the source domain, to perform semantic segmentation of images. However, deploying these models on unseen fields, i.e., in the target domain, often shows a low generalization capability. Enhancing the generalization capability of CNNs is critical to increasing their performance on target domains with different operational conditions. In this letter, we present a domain generalized semantic segmentation approach for robust crop and weed detection by effectively extending and diversifying the source domain to achieve high performance across different agricultural field conditions. We propose to leverage unlabeled images captured from various agricultural fields during training in a two-step framework. First, we suggest a method to automatically compute sparse annotations and use them to present the model more plant varieties and growth stages to enhance its generalization capability. Among others, we exploit unlabeled images from fields containing crops sown in rows. Second, we propose a style transfer method that renders the source domain images in the style of images from various fields to achieve increased diversification. We conduct extensive experiments and show that we achieve superior performance in crop-weed segmentation across various fields compared to state-of-the-art methods.