Abstract:
Scheduling for multiple sensors to observe multiple systems is investigated. Only one sensor can transmit a measurement to the remote estimator over a Markovian fading channel at each time instant. A stochastic scheduling protocol is proposed, which first chooses the system to be observed via a probability distribution, and then chooses the sensor to transmit the measurement via another distribution. The stochastic sensor scheduling is modeled as a Markov decision process (MDP). A sufficient condition is derived to ensure the stability of remote estimation error covariance by a contraction mapping operator. In addition, the existence of an optimal deterministic and stationary policy is proved. To overcome the curse of dimensionality, the deep deterministic policy gradient, a recent deep reinforcement learning algorithm, is utilized to obtain an optimal policy for the MDP. Finally, a practical example is given to demonstrate that the developed scheduling algorithm significantly outperforms other policies.