Outlier Concerned Data Completion Exploiting Intra and Inter Data Correlations in Sparse CrowdSensi

Outlier Concerned Data Completion Exploiting Intra and Inter Data Correlations in Sparse CrowdSensi

Abstract:

Mobile CrowdSensing (MCS) is a popular data collection paradigm which usually faces the problem of sparse sensed data because of the limited sensing cost. In order to address the situation of sparse data, sparse MCS recruits users to sense important areas and infers completed data by data completion, which is crucial in sparse MCS for urban sensing applications (e.g. enhancing data expression, improving urban analysis, guiding city planning, etc.) To achieve accurate completion results, previous methods usually utilize the universal similarity and conventional tendency while incorporating only a single dataset to infer the full map. However, in real-world scenarios, there may exist many kinds of data (inter-data), that could help to complement each other. Moreover, for each kind of data (intra-data), there usually exist a few but important outliers caused by the special events (e.g., parking peak, traffic congestion, or festival parade), which may behave in a different way as the statistical patterns. These outliers cannot be ignored, while it is difficult to detect and recover them in data completion because of the following challenges: 1) the infrequency and unpredictability of outliers’ occurrence, 2) the large deviations against the means compared to normal values, and 3) the complex spatiotemporal relations among inter-data. To this end, focusing on spatiotemporal data with both intra- and inter-data correlations, we propose a matrix completion method that takes outliers’ effects into consideration and exploits both intra- and inter-data correlations for enhancing performance. Specifically, we first conduct the Deep Matrix Factorization (DMF) with multiple auxiliary Neural Networks, which named Stacked Deep Matrix Factorization (SDMF). Note that the loss function of SDMF is no longer the previous MSE loss function, but replaced with an Outlier Value Loss (OVL) function to effectively detect and recover the outliers. Moreover, a spatiotemporal outlier value memory network is added for further enhancing the outlier inference. Finally, we take extensive qualitative and quantitative experiments on two popular datasets each with two types of sensing data, and the experimental results indicate the advantages of our method that outperforms the state-of-the-art methods.