Controlling blood glucose levels in diabetic patients is important for managing their health and quality of life. Several algorithms based on model predictive control and reinforcement learning (RL) have been proposed so far, most of which use prior knowledge of physiological systems, the mathematical structure of blood glucose dynamics, and many episodes including failures for training the policy network in RL. To be smoothly adopted in clinical settings, we propose a fast online learning method underlining safety and interpretability. A random forest regressor and a dual attention network were exploited for glucose prediction and extension of state variables. The soft actor-critic network to determine insulin dosing was guided by proportional-integral-derivative (PID) control in the early phase, and an adaptive safe actor with suspension and additional insulin dosing was incorporated. The performance of the models was validated using an FDA-approved type 1 diabetes simulator. The results showed comparable outcomes with PID control. Using this system, glucose dynamics could be captured despite minimal prior knowledge. The extended state variables were correlated with basic states such as glucose, insulin, and meal intake, their derivatives, and their integrals, which can be fundamental elements of mathematical modeling of physiological responses. Attention scores and attribution scores in the prediction and control models represented the focused features and the internal operation of the models with interpretability. We expect this study to provide some insights on how RL can be practically adopted in clinical environments and how interpretability can provide hints of machines' thoughts for clinical applications.