Variable Selection for High-dimensional Cox Model with Error Rate Control

HE Baihua, SHI Hongwei, GUO Xu, ZOU Changliang, ZHU Lixing

Journal of Systems Science & Complexity ›› 2025, Vol. 38 ›› Issue (3) : 1162-1185.

PDF(446 KB)
PDF(446 KB)
Journal of Systems Science & Complexity ›› 2025, Vol. 38 ›› Issue (3) : 1162-1185. DOI: 10.1007/s11424-024-3484-6

Variable Selection for High-dimensional Cox Model with Error Rate Control

Author information +
History +

Abstract

Simultaneously finding active predictors and controlling the false discovery rate (FDR) for high-dimensional survival data is an important but challenging statistical problem. In this paper, the authors propose a novel variable selection procedure with error rate control for the high-dimensional Cox model. By adopting a data-splitting strategy, the authors construct a series of symmetric statistics and then utilize the symmetry property to derive a data-driven threshold to achieve error rate control. The authors establish finite-sample and asymptotic FDR control results under some mild conditions. Simulation results as well as a real data application show that the proposed approach successfully controls FDR and is often more powerful than the competing approaches.

Key words

Data-splitting / false discovery rate / high-dimensional survival data / symmetry

Cite this article

Download Citations
HE Baihua , SHI Hongwei , GUO Xu , ZOU Changliang , ZHU Lixing. Variable Selection for High-dimensional Cox Model with Error Rate Control. Journal of Systems Science & Complexity, 2025, 38(3): 1162-1185 https://doi.org/10.1007/s11424-024-3484-6

References

[1] Tibshirani R, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58(1): 267-288.
[2] Fan J and Li R, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 2001, 96(456): 1348-1360.
[3] Zhang C H, Nearly unbiased variable selection under minimax concave penalty, Annals of Statistics, 2010, 38(2): 894-942.
[4] Fan J, Li R, Zhang C H, et al., Statistical Foundations of Data Science, CRC Press, Boca Raton, 2020.
[5] Cox D R, Regression models and life-tables, Journal of the Royal Statistical Society: Series B (Methodological), 1972, 34(2): 187-202.
[6] Tibshirani R, The LASSO method for variable selection in the Cox model, Statistics in Medicine, 1997, 16(4): 385-395.
[7] Fan J and Li R, Variable selection for Cox’s proportional hazards model and frailty model, Annals of Statistics, 2002, 30(1): 74-99.
[8] Bradic J, Fan J, and Jiang J, Regularization for Cox’s proportional hazards model with NPdimensionality, Annals of Statistics, 2011, 39(6): 3092-3120.
[9] Huang J, Sun T, Ying Z, et al., Oracle inequalities for the lasso in the cox model, Annals of Statistics, 2013, 41(3): 1142-1165.
[10] Kong S and Nan B, Non-asymptotic oracle inequalities for the high-dimensional Cox regression via lasso, Statistica Sinica, 2014, 24: 25-42.
[11] Benjamini Y and Hochberg Y, Controlling the false discovery rate: A practical and powerful approach to multiple testing, Journal of the Royal Statistical Society: Series B (Methodological), 1995, 57(1): 289-300.
[12] Candès E, Fan Y, Janson L, et al., Panning for gold: ‘Model-X’ knockoffs for high dimensional controlled variable selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2018, 80(3): 551-577.
[13] Wasserman L and Roeder K, High dimensional variable selection, Annals of Statistics, 2009, 37(5A): 2178-2201.
[14] Meinshausen N, Meier L, and Buehlmann P, P-values for high-dimensional regression, Journal of the American Statistical Association, 2009, 104(488): 1671-1681.
[15] Barber R F and Candès E J, Controlling the false discovery rate via knockoffs, Annals of Statistics, 2015, 43(5): 2055-2085.
[16] Du L, Guo X, Sun W, et al., False discovery rate control under general dependence by symmetrized data aggregation, Journal of the American Statistical Association, 2023, 118(541): 607-621.
[17] Fang E X, Ning Y, and Liu H, Testing and confidence intervals for high dimensional proportional hazards models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2017, 79(5): 1415-1437.
[18] Cox D R, Partial likelihood, Biometrika, 1975, 62(2): 269-276.
[19] Fan J, Han X, and Gu W, Estimating false discovery proportion under arbitrary covariance dependence, Journal of the American Statistical Association, 2012, 107(499): 1019-1035.
[20] Nicolau M, Levine A J, and Carlsson G, Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival, Proceedings of the National Academy of Sciences, 2011, 108(17): 7265-7270.
[21] Cai T T and Liu W, Large-scale multiple testing of correlations, Journal of the American Statistical Association, 2016, 111(513): 229-240.
[22] Meinshausen N and Bühlmann P, Stability selection, Journal of the Royal Statistical Society Series B: Statistical Methodology, 2010, 72(4): 417-473.

Funding

This research was supported by the National Natural Science Foundation of China under Grant Nos. 12301364, 12322112, 12071038, 11925106, 12231011, 11931001, 12226007, 12326325, and 12131006, the National Key R&D Program of China under Grant Nos. 2022YFA1003703 and 2022YFA1003800, the Natural Science Foundation of Anhui Province under Grant No. 2308085QA09, the Fundamental Research Funds for the Central Universities under Grant No. 2243200006, the Scientific and Technological Innovation Project of China Academy of Chinese Medical Sciences under Grant No. CI2023C063YLL, and the University Grant Council of Hong Kong.
PDF(446 KB)

Accesses

Citation

Detail

Sections
Recommended

/