Vision and Language Navigation Based on Cross Modal Feature Fusion in Indoor Environment

Vision and Language Navigation Based on Cross Modal Feature Fusion in Indoor Environment

admin

admin

Feb 3, 2024 - 15:50

0 18

Abstract:

It is challenging for an agent to simultaneously decipher visual and language information and make decisions to perform corresponding actions. Recently, the vision-and-language navigation task has been proposed to allow the agent to navigate based on a language instruction and the currently visible visual point information in a 3-D indoor real environment. The key to this task is that the agent needs to understand the information of the two models of vision and language in an unknown environment to navigate effectively. In this study, we capture the alignment relationship between visual features and language features using a cross-modal feature fusion method. Attention is used to set up the cross-modal fusion module so that visual features contain language information and language features contain visual information, thereby allowing the model to learn more feature relationships and improving the success rate (SR) of agent navigation. Considering the practical significance of the navigation of the agent, we aim to shorten the trajectory length of the agent as much as possible while ensuring that the agent reaches the target position successfully. We employ a reinforcement learning algorithm based on the advantage actor critic to constrain the action selection of the agent to shorten the trajectory length. In order to further improve the performance of the model and reduce the difference between the performance of the agent in known environments and unknown environments, we propose the data augmentation method Cro-Speaker, and the three training methods Speaker data augmentation (SD), Cro-Speaker data augmentation (CSD), and Speaker and Cro-Speaker data augmentation (SCSD) based on this method. We evaluate the proposed method based on the Room-to-Room data set. The results show that the proposed method improves the SR of the agent navigation, shortens the length of the navigation trajectory, and exhibits a good generalization performance in known and unknown environments.

Click Here To See More

Tags:

Previous Article

Normal Stressed Electromagnetic Triaxial Fast Tool Servo for Microcutting

Internet of Things Assisted Artificial Intelligence Enabled Drowsiness Detection...

What's Your Reaction?

0

Like

0

Dislike

0

Love

0

Funny

0

Angry

0

Sad

0

Wow

Related Posts

Blood Bank Information System in Django

admin Jan 29, 2024 0 33

Federated Machine Learning for Detection of Skin Diseas...

admin Jan 30, 2024 0 23

Learning Based $Hinfty$ Path Following Controller Desig...

admin Jan 18, 2024 0 17

Clinic Mangement Drug Process in Django

admin Jan 27, 2024 0 17

A Comprehensive Investigation of the Impact of Class Ov...

admin Feb 3, 2024 0 412

Picking Up Quantization Steps for Compressed Image Clas...

admin Jan 19, 2024 0 27

Comments