A Comparison of Deep Reinforcement Learning Models for Isolated Traffic Signal Control

A Comparison of Deep Reinforcement Learning Models for Isolated Traffic Signal Control

Abstract:

Traditional control methods may not be adaptive enough for ever-changing traffic dynamics. Hence, extensive deep reinforcement learning (DRL) methods have been utilized to solve the traffic signal timing control problem because DRL can adaptively learn optimal policy through analyzing experience samples generated by interaction with the environment. However, there is an urgent need for researchers to answer which DRL algorithm should be adopted in practice and how to select model settings. Therefore, we introduce a reasonable simulation platform to test and compare different DRL methods. Specifically, we evaluate seven prevailing DRL algorithms under our defined model settings from two aspects: training and execution performance. Testing results indicate that the soft actor–critic (SAC) outperforms other DRL algorithms and the maximum pressure method in most cases. To our best knowledge, this is also the first study to apply SAC and value distribution methods for traffic signal control. To answer how to select model settings, we compare the execution performance of the DRL algorithms with different state, action, and reward settings. Experimental results reveal the superiority of our model-setting choices. All these findings have enlightening effects on other traffic decision management problems, such as ramp and multi-intersection control.