• • 上一篇    下一篇

基于岭回归模型大数据最优子抽样算法研究

李莉莉,靳世檑,周楷贺   

  1. 青岛大学经济学院,青岛  266100
  • 出版日期:2021-12-28 发布日期:2021-12-28

李莉莉, 靳世檑, 周楷贺. 基于岭回归模型大数据最优子抽样算法研究[J]. 系统科学与数学, 2022, 42(1): 50-63.

LI Lili, JIN Shilei, ZHOU Kaihe. Optimal Subsampling Algorithm for Big Data  Ridge Regression[J]. Journal of Systems Science and Mathematical Sciences, 2022, 42(1): 50-63.

Optimal Subsampling Algorithm for Big Data  Ridge Regression

LI Lili, JIN Shilei, ZHOU Kaihe   

  1. School of Economics, Qingdao University, Qingdao 266100
  • Online:2021-12-28 Published:2021-12-28
随着大数据时代的来临, 为了提高计算效率, Wang等(2018)提出 基于logistic 回归的最优子抽样算法, 在保证参数估计精度的前提下, 节省了大量的 运算时间. 为解决变量间的多重共线性, 文章提出基于岭回归模型的最优子抽样算法, 并 证明岭回归模型中参数估计的一致性与渐近正态性. 利用数值模拟与实证分析对最优子抽 样算法进行评估, 结果表明, 利用最优子抽样构建的模型与全样本构建的模型在参数估计 的精度相近, 并大幅减少了运算时间.
With the advent of the big data era, in order to improve computational efficiency, Wang, et al.(2018) proposed an optimal subsampling algorithm for logistic regression, which provides a better tradeoff between estimation efficiency and computational efficiency. To solve the problem of multicollinearity among variables, this paper proposes an optimal subsampling algorithm in the context of ridge regression, and proves the consistency and asymptotic normality of the estimator from optimal subsampling algorithm. Numerical experiments are carried out on both simulated and real data to evaluate the proposed methods. Results show that the optimal subsampling algorithm produces similar results compared with the full data analysis, while significantly reducing the computational costs.
()
No related articles found!
阅读次数
全文


摘要