RLCharge Imitative Multi Agent Spatiotemporal Reinforcement Learning for Electric Vehicle Charging S

RLCharge Imitative Multi Agent Spatiotemporal Reinforcement Learning for Electric Vehicle Charging S

Abstract:

Electric Vehicle (EV) has become a preferable choice in the modern transportation system due to its environmental and energy sustainability. However, in many large cities, EV drivers often fail to find the proper spots for charging, because of the limited charging infrastructures and the spatiotemporally unbalanced charging demands. Indeed, the recent emergence of deep reinforcement learning provides great potential to improve the charging experience from various aspects over a long-term horizon. In this paper, we propose an Imitative Multi-Agent Spatio-Temporal Reinforcement Learning ( RlCharge ) framework for intelligently recommending public accessible charging stations by jointly considering various long-term spatio-temporal factors. Specifically, by regarding each charging station as an individual agent, we formulate the problem as a multi-objective multi-agent reinforcement learning task. We first develop a multi-agent actor-critic framework with centralized training decentralized execution. Particularly, we propose a tailor-designed centralized attentive critic to coordinate the recommendation between geo-distributed agents, and introduce a delayed access strategy to exploit the knowledge of future charging competition during centralized training. Moreover, to handle the partial observability problem during decentralized execution in the large-scale multi-agent system, we propose the spatio-temporal heterogeneous graph convolution module, including (1) a dynamic graph convolution block to generate real-time representations for observable forthcoming EVs, and (2) a spatial graph convolution block to share the agent observations by message propagation between spatially adjacent agents. After that, to effectively optimize multiple divergent learning objectives, we extend the centralized attentive critic to multi-critics, and develop a dynamic gradient re-weighting strategy to adaptively guide the optimization direction. In addition, we propose an adaptive imitation learning scheme to further accelerate and stabilize the policy convergence. Finally, extensive experiments on two real-world datasets demonstrate that RlCharge achieves the best comprehensive performance compared with ten baseline approaches.