Abstract:
Urban flow monitoring and forecasting systems play important roles in smart city management. However, due to the long-lasting and enormous deployment cost of ubiquitous traffic monitoring devices (e.g., loop detectors, traffic video detection), it is very difficult to predict flow in high-resolution (HR) with limited monitoring devices. The existing spatiotemporal network based methods usually predict the urban flow in the same spatial scale without considering the spatial correlations between coarse-grained and fine-grained urban flows. To tackle these issues, we propose a HR spatiotemporal transformer network (HRSTT) to predict fine-grained urban flow. Specifically, residual convolution units are employed to constructs the high-level features of three-view temporal (e.g., closeness, period, and trend) flow data. Then, a transformer block is designed to jointly learn the spatiotemporal dynamic features of each temporal flow with self-attention mechanism. For the external factors (e.g., holidays, weather conditions) are extracted by embedding dense networks, which are fused with high level coarse-grained flow feature maps with gated-fusion scheme. Finally, the coarse-grained fusion feature maps are transferred to the distributional upsampling module, generating the fined-grained flow map of target predicted time. Furthermore, the proposed model is evaluated with several baselines on real-world TaxiBJ datasets, demonstrating the state-of-the-art performance of our approach on the fine-grained urban flow forecasting problem.