• • 上一篇    

非对称误差分布的贝叶斯累加回归树模型研究及应用

曹桃云1,2, 张日权3   

  1. 1. 广东财经大学统计与数学学院, 广州 510320;
    2. 广东财经大学大数据与教育统计应用实 验室, 广州 510320;
    3. 上海对外经贸大学统计与信息学院, 上海 201620
  • 收稿日期:2021-10-09 修回日期:2022-05-29 发布日期:2022-12-13
  • 通讯作者: 张日权, Email: rqzhang@suibe.edu.cn
  • 基金资助:
    广东省自然科学基金面上项目(2020A1515011580),国家自然科学基金面上项目(11971171) 资助课题.

曹桃云, 张日权. 非对称误差分布的贝叶斯累加回归树模型研究及应用[J]. 系统科学与数学, 2022, 42(11): 3119-3133.

CAO Taoyun, ZHANG Riquan. Research and Application of Bayesian Additive Regression Trees Model for Asymmetric Error Distribution[J]. Journal of Systems Science and Mathematical Sciences, 2022, 42(11): 3119-3133.

Research and Application of Bayesian Additive Regression Trees Model for Asymmetric Error Distribution

CAO Taoyun1,2, ZHANG Riquan3   

  1. 1. School of Statistics and Mathematics, Guangdong University of Finance & Economics, Guangzhou 510320;
    2. Big Data and Educationl Statistics Application Laboratory, Guangdong University of Finance and Economics, Guangzhou 510320;
    3. School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai 201620
  • Received:2021-10-09 Revised:2022-05-29 Published:2022-12-13
贝叶斯累加回归树(BART)模型是一种非参数贝叶斯回归方法,在预测和变量重要性度量方面具有强大的功能.BART假设随机误差项服从正态分布,文章针对非对称数据,提出BART推广模型.所提模型首先根据BART树结构特点,基于中心极限定理,得到终节点响应变量均值的渐近分布;接着基于U统计量性质,得到响应变量方差的渐近分布;最后基于Backfitting MCMC算法进行抽样迭代和参数估计.通过数值模拟并和随机森林算法的比较,展示了所提模型的可行性和优越性.实例分析说明了所提模型的实用性.
Bayesian additive regression trees (BART) is a nonparametric Bayesian regression approach, which is powerful in prediction and measurement of variable importance. Assuming the random error in BART is normally distributed, this paper proposes a BART generalization model for data with asymmetric distribution. According to the tree structure of BART, we first derive asymptotic distribution for mean of response of leaves by central limit theorem. Then the asymptotic distribution of variance of response is obtained based on the property of U-statistics. The sampling iterations and parameter estimation are finally realized based on Backfitting MCMC algorithm. The results of simulation studies and comparison with random forest illustrate the feasibility and superiority of the proposed model. We finally apply the proposed model to a real data analysis.

MR(2010)主题分类: 

()
[1] Collins J R. Robust estimation of a location parameter in the presence of asymmetry. Annals of Statistics, 1976, 4(1): 68-85.
[2] Jaeckel L A. Robust estimates of location: Symmetry and asymmetric contamination. Annals of Mathematical Statistics, 1971, 42: 1020-1034.
[3] Carroll R J. On estimating variances of robust estimators when the errors are asymmetric. Journal of the American Statistical Association, 1979, 74(367): 674-679.
[4] Roger T F, Bolfarine H, Guillermo M F. The asymmetric alpha-power skew-t distribution. Symmetry, 2020, 12(1): 1-21.
[5] Fu L Y, Wang Y G. Robust regression with asymmetric loss functions. Statistical Methods in Medical Research, 2021, 30(8): 1800-1815.
[6] Chipman H A, George E I, McCulloch R E. BART: Bayesian additive regression trees. The Annals of Applied Statistics, 2010, 4(1): 266-298.
[7] Wang G W, Zhang C X, Yin Q Y. RS-BART: A novel technique to boost the prediction ability of Bayesian additive regression trees. Chinese Journal of Engineering Mathematics, 2019, 36(4): 461-477.
[8] Linero A R. Bayesian regression trees for high dimensional prediction and variable selection. Journal of the American Statistical Association, 2018, 113(522): 626-636.
[9] Murray J S. Log-linear Bayesian additive regression trees for multinomial logistic and count regression models. Journal of the American Statistical Association, 2021, 116(534): 756-769.
[10] Breiman L. Random forests. Machine Learning, 2001, 45: 5-32.
[11] Kapelner A, Bleich J. Bartmachine: Machine learning with Bayesian additive regression trees. Journal of Statistical Software, 2016, 70(4): 1-40.
[12] Friedman J H. Multivariate adaptive regression splines. Annals of Statistics, 1991, 19(1): 1-67.
[13] 曹桃云, 陈敏琼. 基于学生化极差分布的随机森林变量选择研究. 统计 与信息论坛, 2021, 36(8): 15-22. (Cao T Y, Chen M Q. Variable selection in randon forests based on studentized range distribution. Journal of Statistics & Information, 2021, 36(8): 15-22.)
[14] Mattos T, Garay A M, Lachos V H. Likelihood-based inference for censored linear regression models with scale mixtures of skew-normal distributions. Journal of Applied Statistics, 2018, 45(11): 2019-2066.
[1] 王冠鹏, 秦双燕, 崔恒建. 员工流失的影响因素分析与预测[J]. 系统科学与数学, 2022, 42(6): 1616-1632.
[2] 李山海, 吴艳雄, 王蓓, 徐岩, 刘玉龙. 基于GA-BP神经网络的信息技术业上市公司的成长性预测[J]. 系统科学与数学, 2022, 42(4): 854-866.
[3] 冯盼峰,温永仙. 基于随机森林算法的两阶段变量选择研究[J]. 系统科学与数学, 2018, 38(1): 119-130.
阅读次数
全文


摘要