Intelligent Access to Unlicensed Spectrum A Mean Field Based Deep Reinforcement Learning Approach

Intelligent Access to Unlicensed Spectrum A Mean Field Based Deep Reinforcement Learning Approach

Abstract:

As the demand for mobile data traffic continues to grow, offloading data traffic to unlicensed spectrum is a promising approach that can relieve the pressure on cellular systems. Therefore, it is an urgent need to propose an unlicensed spectrum access method to guarantee the harmonious and efficient coexistence between cellular network technologies such as LTE and incumbent users such as WiFi in the unlicensed spectrum. However, existing coexistence schemes such as licensed assisted access (LAA) and LTE-unlicensed (LTE-U) still suffer from inefficient spectrum utilization and unsatisfactory fairness. In the paper, we formulate the optimization problem of the unlicensed spectrum access among multiple small bases (SBSs) as a game, and then solve the Nash Equilibrium (NE) with cooperative and distributed multi-agent deep reinforcement learning (MADRL). Specifically, a two level access framework for the coexistence scenario, which consists of feedback cycle and executive cycle, is first proposed, and then the key elements of MADRL including state, action, reward and Q-network are designed in detail based on the proposed access framework. To overcome the problems of learning divergence and prohibitive computation overhead in the coexistence scenario with multiple SBSs due to the non-stability phenomena, we adopt the mean field technology to solve the NE, which can simplify the process of solving NE by converting the interaction of an agent with the remaining multiple agents into an action with the average effect of them. Simulation results show that 1) the proposed algorithm can overcome the learning divergence problem and converge to the NE quickly, and 2) the proposed algorithm can achieve the bi-objective optimization of total throughput and fairness of the coexistence network, and can achieve better performance in terms of throughput and fairness compared with the baseline methods such as Cat-4 LBT, Cooperative LBT and Random schemes.