Abstract:
To reduce the skyrocketing energy consumption of data centers, the prevailing approaches adopt the time-driven manner to control IT and cooling subsystems. These methods suffer from highly dynamic system states, complex action spaces and the risk of instability caused by frequent and unnecessary control operations. To tackle these problems, we propose a novel event-driven control paradigm and an optimization algorithm, under the deep reinforcement learning (DRL) framework. The principle is to make decisions based on certain critical events (e.g., overheating), rather than fixed periodic control. Specifically, we design an event-driven optimization framework to trigger control operations. Then, we present several models to describe IT and cooling subsystems, and mathematically define events to capture four types of prior factors that impact system performance. Furthermore, we develop an event-driven DRL (E-DRL) optimization algorithm to dispatch jobs and regulate cooling facilities for energy efficiency. Using two different types of real workload traces, we conduct extensive experiments to demonstrate that: 1) E-DRL reduces the number of regulating decisions by 70% ∼ 95% while achieving a comparable or even better energy efficiency in comparison with the state-of-the-art algorithm; and 2) E-DRL can adapt the control frequency to the changing operational conditions and diverse workloads.