Learning Policies for Automated Racing Using Vehicle Model Gradients

Learning Policies for Automated Racing Using Vehicle Model Gradients

Abstract:

Safe autonomous driving approaches should be capable of quickly and efficiently learning as professional drivers do, while also using all of the available road-tire friction for safety. Inspired by how skilled drivers learn, we demonstrate improvement from an initial optimization-generated racing trajectory using model-based reinforcement learning. By using a simple physics-based dynamics model and gradients of the performance objective, we show that a full-scale automated race car is capable of improving lap time in experiments on high- and low-friction race tracks. Using recorded vehicle data, this approach improves a twenty nine second lap time by almost two full seconds. Beyond improving upon the initial optimization-based solution, it uses only two laps worth of ice track data where conditions can constantly change from lap-to-lap. These results suggest that by combining an approximate model with simple learning techniques, significant improvement to automated racing strategies is possible.