探讨充分降维算法SIR、SAVE、CP-SAVE的适用范围,从两种角度对充分降维算法进行稳健改进:构建SIR与SAVE混合算法,从而融合二者优点,以适应更广数据类型与连接函数;当观测数据受污染时,利用软修剪方法估计的稳健均值、协方差代替传统估计,构建稳健充分降维算法.数值实验显示:在连接函数关于自变量均值对称时,一阶算法SIR的降维效果较差,但它对自变量分布、切片数较稳健;相比SIR,二阶算法SAVE、CP-SAVE的要求更苛刻,对切片数、自变量分布都敏感,但可找到SIR探索不到的方向;当自变量为厚尾分布时,CP-SAVE通常优于SAVE;SIR与SAVE混合算法对自变量分布、连接函数的适应性更好,在多种场合下可改进降维效果;软修剪稳健估计对截断参数稳健,建议截断参数略大于异常点比例;相对稳健SAVE,稳健SIR只需要在切片内估计稳健均值,适应条件宽松,更符合实际,推荐优先使用.
Abstract
This paper discusses the application scope of sufficient dimension reduction algorithms SIR, SAVE, and CP-SAVE, and makes robust improvements to sufficient dimension reduction algorithms from two perspectives: First, a hybrid algorithm of SIR and SAVE is constructed to integrate the advantages of both to adapt to a wider range of data types and connection functions; Second, when the observation data is polluted, the robust mean and covariance estimated by soft pruning method are used to replace their traditional estimators to construct a robust sufficient dimension reduction algorithm. Numerical experiments show that: The first-order algorithm SIR has a poor dimension reduction effect, when the connection function is symmetric about the mean of the independent variable, but it is relatively robust to the distribution of the independent variable and the number of slices; Compared with SIR, the second-order algorithms SAVE and CP-SAVE have more stringent requirements and are sensitive to the number of slices and the distribution of independent variables, but they can find the directions that SIR cannot explore; When the independent variable follows a heavy tail distribution, CP-SAVE is usually better than SAVE; The hybrid algorithm of SIR and SAVE has better adaptability to the distribution of independent variables and connection functions, and can improve the dimension reduction effect in various situations; The soft pruning robust estimation is robust to the truncation parameter, and it is suggested that the truncation parameter is slightly larger than the proportion of outliers; Compared with robust SAVE, robust SIR is recommended first since it only needs to estimate the robust mean in the slice and the adaptive conditions are loose and more practical.
关键词
充分降维 /
切片逆回归 /
切片平均方差估计 /
混合算法 /
稳健估计
{{custom_keyword}} /
Key words
Sufficient dimension reduction /
slice inverse regression /
sliced average variance estimation /
hybrid algorithm /
robust estimation
{{custom_keyword}} /
中图分类号:
62G05
62G20
{{custom_clc.code}}
({{custom_clc.text}})
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 王丙参, 魏艳华, 张贝贝. 函数型数据聚类算法的评价与比较. 统计与决策, 2021, 37(16):38-42. (Wang B C, Wei Y H, Zhang B B. Evaluation and comparison of functional data clustering algorithms. Statistics and Decision, 2021, 37(16):38-42.)
[2] Lian H, Li G R. Series expansion for functional sufficient dimension reduction. Journal of Multivariate Analysis, 2014, 124(1):150-165.
[3] Li B. Sufficient Dimension Reduction Methods and Applications with R. New York:Chapman and Hall/CRC, 2018.
[4] Li K C. Sliced inverse regression for dimension reduction. Journal of the American Statistical Associatio, 1991, 86(414):316-327.
[5] 於州. 充分降维理论和方法的拓展研究. 博士论文.华东师范大学, 上海, 2010. (Yu Z. Extending the scope of sufficient dimension reduction theory and its related methods. Doctoral thesis. East China Normal University, Shanghai, 2010.)
[6] 赵俊龙, 徐兴忠. 基于方差估计的降维. 中国科学A辑:数学, 2008, 38(9):1046-1065. (Zhao J L, Xu X Z. Dimension reduction based on variance estimation. Chinese Science Series A:Mathematics, 2008, 38(9):1046-1065.)
[7] 赵俊龙, 徐兴忠. 已有降维方法的推广. 数学年刊A辑, 2008, 29(2):231-240. (Zhao J L, Xu X Z. Generalization of some dimension reduction methods. Chinese Annals of Mathematics Series A, 2008, 29(2):231-240.)
[8] Ma Y Y, Zhu L P. A semiparametric approach to dimension reduction. Journal of the American Statistical Association, 2012, 107(497):168-179.
[9] Ferré L, Yao F. Functional sliced inverse regression analysis. Statistics, 2003, 37(6):475-488.
[10] Wang G C, Zhou Y, Feng X N, et al. The hybrid method of FSIR and FSAVE for functional effective dimension reduction. Computational Statistics & Data Analysis, 2015, 91:64-77.
[11] Wang G C, Zhou J J, Wu W Q, et al. Robust functional sliced inverse regression. Statistical Papers, 2017, 58(1):227-245..
[12] Wang G C, Song X Y. Functional sufficient dimension reduction for functional data classification. Journal of Classification, 2018, 35(2):250-272.
[13] Wang B C, Zhang B X, Yan H B. Functional sufficient dimension reduction based on weighted method. Communications in Statistics-Simulation and Computation, 2022, 51(11):6902-6923.
[14] 甘胜进, 涂开仁, 游文杰. 一类多元响应降维子空间的估计及其应用. 统计与信息论坛, 2017, 33(10):18-23. (Gan S L, Tu K R, You W J. A class of estimators and their modified versions for dimension reduction subspace with multivariate responses and its application. Journal of Statistics and Information, 2017, 33(10):18-23.)
[15] 甘胜进, 游文杰. 基于矩生成函数的多元响应降维子空间估计. 东北师大学报(自然科学版), 2017, 49(1):43-47. (Gan S J, You W J. A class of estimators and their modified versions for dimension reduction subspace wiht multivariate responses and its application. Journal of Northeast Normal University (Natural Science Edition), 2017, 49(1):43-47.)
[16] Zhu L X, Ohtaki M, Li Y X. On hybrid methods of inverse regression-based algorithms. Computational Statistics & Data Analysis, 2007, 51(5):2621-2635.
[17] 李向杰, 吴燕燕, 张景肖. 基于切片逆回归的稳健降维方法. 统计研究, 2018, 35(7):116-124. (Li X J, Wu Y Y, Zhang J X. Robust dimension reduction method based on sliced inverse regression. Statistical Research, 2018, 35(7):116-124.)
[18] Eaton M L. A characterization of spherical distributions. Journal of Multivariate Analysis, 1986, 20(2):272-276.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金 (12071308)资助课题.
{{custom_fund}}