• • 上一篇    下一篇

基于多源数据特征驱动和多尺度分析的PM$_{2.5}$混合预测研究

袁文燕1, 杜鸿川1, 李洁仪2, 李玲3, 汤铃4   

  1. 1. 北京化工大学数理学院, 北京 100029;
    2. 北京化工大学经济管理学院, 北京 100029;
    3. 首都经济贸易大学国际经济管理学院, 北京 100070;
    4. 北京航空航天大学经济管理学院, 北京 100191
  • 收稿日期:2022-04-09 修回日期:2022-07-14 出版日期:2023-02-25 发布日期:2023-03-16
  • 通讯作者: 李玲,Email:liling890119@163.com
  • 基金资助:
    国家自然科学基金(72004144,71971007),北京自然科学基金(JQ21033)资助课题.

袁文燕,杜鸿川,李洁仪,李玲,汤铃. 基于多源数据特征驱动和多尺度分析的PM$_{2.5}$混合预测研究[J]. 系统科学与数学, 2023, 43(2): 399-416.

YUAN Wenyan, DU Hongchuan, LI Jieyi, LI Ling, TANG Ling. A Hybrid PM$_{2.5}$ Prediction Model Based on Multi-Source Data Features and Multi-Scale Analysis[J]. Journal of Systems Science and Mathematical Sciences, 2023, 43(2): 399-416.

A Hybrid PM$_{2.5}$ Prediction Model Based on Multi-Source Data Features and Multi-Scale Analysis

YUAN Wenyan1, DU Hongchuan1, LI Jieyi2, LI Ling3, TANG Ling4   

  1. 1. School of Science, Beijing University of Chemical Technology, Beijing 100029;
    2. School of Economics and Management, Beijing University of Chemical Technology, Beijing 100029;
    3. International School of Economics and Management, Capital University of Economics and Business, Beijing 100070;
    4. School of Economics and Management, Beihang University, Beijing 100191
  • Received:2022-04-09 Revised:2022-07-14 Online:2023-02-25 Published:2023-03-16
精准把握PM$_{2.5}$污染的动态演变规律对政府和企业的大气污染防治决策至关重要.因此,文章提出了基于多源数据特征驱动及多尺度分析的混合预测建模框架,以提高PM$_{2.5}$预测精度.预测建模框架分为:1)多源数据分析,有效融合与PM$_{2.5}$污染相关的气象、污染、舆情等多源数据;2)多尺度分析,通过多元经验模态分解技术(MEMD)将多源数据分解成不同模态下的预测特征;3)混合预测分析,有序结合计量和机器学习模型,集成各模态预测值为最终结果.文章以北京市PM$_{2.5}$为研究案例结果表明:1)文章提出的混合模型的预测精度优于所有的基准模型;2)微博个数和情感能够叠加提升PM$_{2.5}$预测精度,且优于单因素预测结果;3)引入MEMD分解的模型精度显著高于基准模型.
Understanding the dynamic evolution of PM$_{2.5}$ concentration and making accurate prediction can provide effective decision-making support for the government and enterprises to control air pollution. Therefore, this study aims to develop a hybrid prediction framework based on multi-source data features and multi-scale analysis, to improve the prediction accuracy of PM$_{2.5}$. In particular, the prediction framework include: Multi-source data analysis, multi-source data (e.g., meteorological data, environmental pollution data and public opinion) are introduced into model; multi-scale analysis, MEMD is used to decompose raw multi-source data into informative features; PM$_{2.5}$ prediction, the independent prediction values of each mode are integrated into the final prediction result by orderly combining the traditional statistical models and machine learning models. The empirical study focuses on Beijing and shows that: 1) The proposed model outperforms all the benchmarking models in accuracy; 2) the numbers and sentiment of microblog can effectively improve the prediction accuracy of PM$_{2.5}$; 3) the accuracy of MEMD-based models is superior to that of the benchmark models.

MR(2010)主题分类: 

()
[1] 尹建光, 彭飞, 谢连科, 等. 基于小波分解与自适应多级残差修正的最小二乘支持向量回归预测模型的PM$_{2.5}$浓度预测. 环境科学学报, 2018, 38(5):2090-2098. (Yin J G, Peng F, Xie L K, et al. The study on the prediction of the PM$_{2.5}$ concentration based on model of the least squares support vector regression under wavelet decomposition and adaptive multiple layer residuals correction. Acta Scientiae Circumstantiae, 2018, 38(5):2090-2098.)
[2] 姚红岩, 施润和. 基于周边站点优化选取的随机森林PM$_{2.5}$小时浓度预测研究. 环境科学学报, 2021, 41(4):1565-1573. (Yao H Y, Shi R H. Research on hourly PM$_{2.5}$ concentration prediction of random forest based on optimal selection of surrounding stations. Acta Scientiae Circumstantiae, 2021, 41(4):1565-1573.)
[3] Lelieveld J, Evans J S, Fnais M, et al. The contribution of outdoor air pollution sources to premature mortality on a global scale. Nature, 2015, 525(7569):367-371.
[4] 康俊锋, 谭建林, 方雷, 等. XGBoost-LSTM变权组合模型支持下的短期PM$_{2.5}$浓度预测——以上海为例. 中国环境科学, 2021, 41(9):4016-4025. (Kang J F, Tan J L, Fang L, et al. Short-term PM$_{2.5}$ concentration prediction based on XGBoost and LSTM variable weight combination model:A case study of Shanghai. China Environmental Science, 2021, 41(9):4016-4025.)
[5] 赵彦明. 基于时空相关性的LSTM算法及PM$_{2.5}$浓度预测应用. 计算机应用与软件, 2021, 38(6):249-255, 323. (Zhao Y M. LSTM algorithm based on spatio-temporal correlation and its application of PM$_{2.5}$ concentration prediction. Computer Applications and Software, 2021, 38(6):249-255, 323.)
[6] Hu X F, Waller L A, Al-Hamdan M Z, et al. Estimating ground-level PM$_{2.5}$ concentrations in the southeastern US using geographically weighted regression. Environmental Research, 2013, 121}:1-10.
[7] 刘孟琴. 中国PM$_{2.5}$时空分布特征与污染风险评估. 硕士论文.西南交通大学, 成都, 2018. (Liu M Q. Spatial-temporal characteristics of PM$_{2.5}$ and pollution risk assessment in China. Master's Thesis. Southwest Jiaotong University, Chengdu, 2018.)
[8] 孙俊玲, 王鹏焱, 张庆华. 交通限行对大气颗粒物及PM$_{2.5}$中二\includegraphics[width=0.3cm,height=0.3cm]{a.eps}英的影响. 中国环境监测, 2019, 35(5):37-46. (Sun J L, Wang P Y, Zhang Q H. Impact of traffic restriction on polychlorinated dibenzo-p-dioxins and dibenzofurans (PCDD/Fs) in atmospheric PM$_{2.5}$ and ambient particulates. Environmental Monitoring in China, 2019, 35(5):37-46.)
[9] Ni X Y, Huang H, Du W P. Relevance analysis and short-term prediction of PM$_{2.5}$ concentrations in Beijing based on multi-source data. Atmospheric Environment, 2017, 150}:146-161.
[10] Zhai W, Cheng C. A long short-term memory approach to predicting air quality based on social media data. Atmospheric Environment, 2020, 237}:117411.
[11] Li C, Ma X, Fu T, et al. Does public concern over haze pollution matter? Evidence from Beijing-Tianjin-Hebei region, China. Science of the Total Environment, 2021, 755}:142397.
[12] Kathuria V. Informal regulation of pollution in a developing country:Evidence from India. Ecological Economics, 2007, 63}:403-417.
[13] Langpap C, Shimshack J P. Private citizen suits and public enforcement:Substitutes or complements? Journal of Environmental Economics and Management, 2010, 59(3):235-249.
[14] Forsyth T. Public concerns about transboundary haze:A comparison of Indonesia, Singapore, and Malaysia. Global Environmental Change, 2014, 25}:76-86.
[15] Xu S, Sun K, Yang B, et al. Can public participation in haze governance be guided by government?——Evidence from large-scale social media content data mining. Journal of Cleaner Production, 2021, 318}:128401.
[16] 沈劲, 钟流举, 何芳芳,等. 基于聚类与多元回归的空气质量预报模型开发. 环境科学与技术, 2015, 38(2):63-66. (Shen J, Zhong L J, He F F, et al. Development of air quality forecast model based on clustering and multiple regression. Environmental Science & Technology, 2015, 38(2):63-66.)
[17] 付倩娆. 基于多元线性回归的雾霾预测方法研究. 计算机科学, 2016, 43(S1):526-528. (Fu Q R. Research on haze prediction based on multivariate linear regression. Computer Science, 2016, 43(S1):526-528.)
[18] 向昌盛, 周子英. ARIMA与SVM组合模型在害虫预测中的应用. 昆虫学报, 2010, 53(9):1055-1060. (Xiang C S, Zhou Z Y. Application of ARIMA and SVM hybrid model in pest forecast. Acta Entomologica Sinica, 2010, 53(9):1055-1060.)
[19] 侯俊雄, 李琦, 朱亚杰, 等. 基于随机森林的PM$_{2.5}$实时预报系统. 测绘科学, 2017, 42(1):1-6. (Hou J X, Li Q, Zhu Y J, et al. Real-time forecasting system of PM$_{2.5}$ concentration based on spark framework and random forest model. Science of Surveying and Mapping, 2017, 42(1):1-6.)
[20] Wang P, Zhang H, Qin Z, et al. A novel hybrid-Garch model based on ARIMA and SVM for PM$_{2.5}$ concentrations forecasting. Atmospheric Pollution Research, 2017, 8(5):850-860.
[21] Niu B, Ren J, Zhao A, et al. Lender trust on the P2P lending:Analysis based on sentiment analysis of comment text. Sustainability, 2020, 12(8):1-14.
[22] Zhang S, Wei Z, Wang Y, et al. Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary. Future Generation Computer Systems, 2018, 81}:395-403.
[23] Liu S, He T, Dai J. A survey of CRF algorithm based knowledge extraction of elementary mathematics in Chinese. Mobile Networks and Applications, 2021, 26}:1891-1903.
[24] 蒋洪迅, 闫超超, 张立峰. 基于时序分解和神经网络的PM$_{2.5}$浓度预测研究——以沈阳市为例. 系统科学与数学, 2021, 41(12):3446-3460. (Jiang H X, Yan C C, Zhang L F. PM$_{2.5}$ concentration prediction based on time series decomposition and neural network-Take Shenyang as an example. Journal of Systems Science and Mathematical Sciences, 2021, 41(12):3446-3460.)
[25] 张浩, 邹金慧, 冯早. 基于MEMD的管道阻塞声信号特征提取与识别方法. 传感器与微系统, 2017, 36(12):57-60. (Zhang H, Zou J H, Feng Z. Feature extraction and recognition method of pipeline blockage acoustic signal based on MEMD. Transducer and Microsystem Technologies, 2017, 36(12):57-60.)
[26] 崔建国, 徐舲宇, 于明月, 等. 基于MEMD和ELM的飞机机翼健康状态预测技术. 北京航空航天大学学报, 2017, 43(8):1501-1508. (Cui J G, Xu H Y, Yu M Y, et al. Health state prediction technique for aircraft wing based on MEMD and ELM. Journal of Beijing University of Aeronautics and Astronautics, 2017, 43(8):1501-1508.)
[27] 陈强强, 戴邵武, 毕新乐. 基于EEMD的滚动轴承故障诊断. 计算机仿真, 2021, 38(2):361-364, 369. (Chen Q Q, Dai S W, Bi X L. A rolling bearing fault diagnosis method based on EEMD. Computer Simulation, 2021, 38(2):361-364, 369.)
[28] Chaloulakou A, Grivas G, Spyrellis N. Neural network and multiple regression models for PM$_{10}$ prediction in Athens:A comparative assessment. Journal of the Air & Waste Management Association, 2003, 53(10):1183-1190.
[29] Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3):273-297.
[30] 李建新, 刘小生, 刘静, 等. 基于MRMR-HK-SVM模型的PM$_{2.5}$浓度预测. 中国环境科学, 2019, 39(6):2304-2310. (Li J X, Liu X S, Liu J, et al. Prediction of PM$_{2.5}$ concentration based on MRMR-HK-SVM model. China Environmental Science, 2019, 39(6):2304-2310.)
[31] 谢永华, 张鸣敏, 杨乐,等. 基于支持向量机回归的城市PM$_{2.5}$浓度预测. 计算机工程与设计, 2015, 36(11):3106-3111. (Xie Y H, Zhang M M, Yang L, et al. Prediction urban PM$_{2.5}$ concentration in Chian using support vector machine regression. Computer Engineering and Design, 2015, 36(11):3106-3111.)
[32] Huang G B, Zhu Q Y, Siew C K, Extreme learning machine:Theory and applications. Neurocomputing, 2006, 70(1-3):489-501.
[33] Liu B, Yan S, Li J, et al. Forecasting PM$_{2.5}$ concentration using spatio-temporal extreme learning machine. 201615th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2016, 950-953.
[34] Lu G, Yu E, Wang Y, et al. A novel hybrid machine learning method (OR-ELM-AR) used in forecast of PM$_{2.5}$ concentrations and its forecast performance evaluation. Atmosphere, 2021, 12(1):1-13.
[35] 赵\includegraphics[width=0.3cm,height=0.3cm]{b.eps, 沈玲宏, 马健霄, 等. 综合小波分解和BP神经网络的交通小区生成交通短时预测. 重庆交通大学学报(自然科学版), 2021, 40(11):60-66. (Zao Y, Shen L H, Ma J X, et al. Traffic short-term prediction generated by wavelet decomposition and BP neural network of traffic zone. Journal of Chongqing Jiaotong University $($Natural Science}), 2021, 40(11):60-66.)
[36] 张勇, 黎云祥, 权秋梅. 基于属性简约和BP神经网络的PM$_{2.5}$预测模型. 环境科学与技术, 2017, 40(S1):341-346. (Zhang Y, Li Y X, Quan Q M. Prediction model of PM$_{2.5}$ based on attribute reduction and BP neural network. Environmental Science & Technology, 2017, 40(S1):341-346.)
[37] 郑含博, 王伟, 李晓纲, 等. 基于多分类最小二乘支持向量机和改进粒子群优化算法的电力变压器故障诊断方法. 高电压技术, 2014, 40(11):3424-3429. (Zheng H B, Wang W, Li X G, et al. Fault diagnosis method of power transformers using multi-class LSSVM and improved PSO. High Voltage Engineering, 2014, 40(11):3424-3429.)
[38] Prasad R, Ali M, Xiang Y, et al. A double decomposition-based modelling approach to forecast weekly solar radiation. Renewable Energy, 2020, 152}:9-22. \newpage
[39] Deng C, Huang Y, Hasan N, et al. Multi-step-ahead stock price index forecasting using long short-term memory model with multivariate empirical mode decomposition. Information Sciences, 2022, 607}:297-321.
[40] 刘金培, 陈丽娟, 汪漂, 等. 基于MEMD和空间层次聚类的PM$_{2.5}$三角模糊序列多因子组合预测研究. 控制与决策, 2021, https://doi.org/10.13195/j.kzyjc.2021.1163. (Liu J P, Chen L J, Wang P, et al. Multi-factor combination prediction of PM$_{2.5}$ triangular fuzzy series based on MEMD and spatial hierarchical clustering. Control and Decision, 2021, https://doi.org/10.13195/j.kzyjc.2021.1163.)
[41] Hong J, Mao F, Min Q, et al. Improved PM$_{2.5}$ predictions of WRF-Chem via the integration of Himawari-8 satellite data and ground observations. Environmental Pollution, 2020, 263}:114451.
[42] Tang L, Dai W, Yu L, et al. A novel CEEMD-based EELM ensemble learning paradigm for crude oil price forecasting. International Journal of Information Technology & Decision Making, 2015, 14(1):141-169.
[43] 欧阳红兵, 黄亢, 闫洪举. 基于LSTM神经网络的金融时间序列预测. 中国管理科学, 2020, 28(4):27-35. (Ouyang H B, Huang K, Yan H J. Prediction of financial time series based on LSTM neural network. Chinese Journal of Management Science, 2020, 28(4):27-35.)
[1] 吴胤昊, 陈荣达, 汪圣楠, 俞静婧. 随机利率随机波动率混合指数跳扩散模型下的期权定价[J]. 系统科学与数学, 2022, 42(8): 2207-2234.
[2] 高丽, 蒋雨芯, 盛培根, 魏先华. 多源异构数据图像整合预测方法研究——以黄金价格预测为例[J]. 系统科学与数学, 2022, 42(11): 3073-3093.
[3] 吕丽, 金百锁. 线性模型中多变点的置信区间估计[J]. 系统科学与数学, 2021, 41(8): 2310-2326.
[4] 吴宝, 池仁勇. 融入情感分析与用户热度的社交网络用户可信度量方法[J]. 系统科学与数学, 2021, 41(4): 1091-1107.
[5] 邱泽国, 贺百艳. 基于PCA-Spectral-LDA的网络舆情聚类和情感演进分析[J]. 系统科学与数学, 2021, 41(10): 2906-2918.
[6] 李振鹏,黄帅. 基于LDA主题模型的网络舆情研究[J]. 系统科学与数学, 2020, 40(3): 434-447.
[7] 许诺,唐锡晋. 基于天涯论坛球迷情感分析与行为挖掘[J]. 系统科学与数学, 2017, 37(9): 1915-1929.
[8] 林瑞全,王俊. 一种基于能量检测的鲁棒频谱感知方法[J]. 系统科学与数学, 2017, 37(3): 674-684.
[9] 陈晓红,彭宛露,田美玉. 基于投资者情绪的股票价格及成交量预测研究[J]. 系统科学与数学, 2016, 36(12): 2294-2306.
[10] ALI Fadhaa,张健. 基因组疾病风险单倍型段的置换筛选检验[J]. 系统科学与数学, 2015, 35(12): 1402-1417.
[11] 费宇;潘建新;王力宾. 基于似然函数的纵向数据线性混合模型影响分析[J]. 系统科学与数学, 2009, 29(2): 271-279.
[12] 郭田德;高自友;吴士泉. 基于双尺度方程近似解的适合任何连续信号的近似采样定理[J]. 系统科学与数学, 2001, 21(1): 64-071.
阅读次数
全文


摘要