YUAN Wenyan, DU Hongchuan, LI Jieyi, LI Ling
In the age of big data, the Internet big data can finely reflect public attention to air pollution, which greatly impact ambient PM$_{2.5}$ concentrations; however, it has not been applied to PM$_{2.5}$ prediction yet. Therefore, this study introduces such informative Internet big data as an effective predictor for PM$_{2.5}$, in addition to other big data. To capture the multi-scale relationship between PM$_{2.5}$ concentrations and multi-source big data, a novel multi-source big data and multi-scale forecasting methodology is proposed for PM$_{2.5}$. Three major steps are taken: 1) Multi-source big data process, to collect big data from different sources (e.g., devices and Internet) and extract the hidden predictive features; 2) Multi-scale analysis, to address the non-uniformity and nonalignment of timescales by withdrawing the scale-aligned modes hidden in multi-source data; 3) PM$_{2.5}$ prediction, entailing individual prediction at each timescale and ensemble prediction for the final results. The empirical study focuses on the top highly-polluted cities and shows that the proposed multi-source big data and multi-scale forecasting method outperforms its original forms (with neither big data nor multi-scale analysis), semi-extended variants (with big data and without multi-scale analysis) and similar counterparts (with big data but from a single source and multi-scale analysis) in accuracy.