在贝叶斯框架下, 文章发展一类半参数Tweedie复合泊松部分线性混合效应模型来分析半连续纵向数据, 并用贝叶斯P-样条来逼近模型的非参数函数. 由于 Tweedie复合泊松分布的密度函数没有显示表达式, 这通常给计算带来困难, 文章利用数据扩充法的思想,引入一个潜变量, 可得到半连续随机变量和潜变量的联合概率密度函数, 并基于这个联合概率密度函数进行贝叶斯统计推断. 进一步, 结合Gibbs抽样与Metropolis-Hastings (MH)算法的混合算法可得到模型的参数、 随机效应以及非参数函数的联合贝叶斯估计以及潜变量的预测值. 最后, 通过模拟研究与实例分析来验证所提出方法的有效性.
Abstract
Under the Bayesian framework, this paper develops a Tweedie compound Poisson partial linear mixed model on the basis of Bayesian P-spline approximation to nonparametric function for longitudinal semicontinuous data. It is quite difficult to directly implement Bayesian computation because the probability density function for Tweedie compound poisson distribution is not analytically tractable. Therefore, inspired by the data-augmentation strategy, we introduce a latent variable to obtain the joint probability density function of a semi-continuous random variable and the latent variable, and conduct the Bayesian statistical inference based on this joint probability density function. Furthermore, a hybrid algorithm combining the block Gibbs sampler and the Metropolis-Hastings algorithm is proposed for producing the joint Bayesian estimates of unknown parameters, random effects and nonparametric function, as well as the predicted value of latent variables. Finally, several simulation studies and a real example are presented to illustrate the proposed methodologies.
关键词
纵向数据 /
Tweedie复合泊松分布 /
Gibbs抽样 /
MH算法 /
贝叶斯P-样条
{{custom_keyword}} /
Key words
Longitudinal data /
Tweedie compound Poisson distribution /
Gibbs sampler /
MH algorithm /
Bayes P-spline
{{custom_keyword}} /
中图分类号:
60J22
62F15
{{custom_clc.code}}
({{custom_clc.text}})
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Diggle P, Liang K L, Zeger S. Analysis of Longitudinal Data. Oxford:Oxford University Press, 2002.
[2] Olsen M K, Schafer J L. A two-part random-effects model for semicontinuous longitudinal data. Journal of the American Statistical Association, 2001, 96(454):730-745.
[3] Smith V A, Neelon B, Preisser J S, et al. A marginalized two-part model for longitudinal semicontinuous data. Statistical Methods in Medical Research, 2017, 26(4):1949-1968.
[4] Merlo L, Maruotti A, Petrella L. Two-part quantile regression models for semi-continuous longitudinal data:A finite mixture approach. Statistical Modelling, 2022, 22(6):485-508.
[5] Xing D, Huang Y, Chen H, et al. Bayesian inference for two-part mixed-effects model using skew distributions, with application to longitudinal semicontinuous alcohol data. Statistical Methods in Medical Research, 2017, 26(4):1838-1853.
[6] Farewell V, Long D, Tom B, et al. Two-part and related regression models for longitudinal data. Annual Review of Statistics and Its Application, 2017, 4:283-315.
[7] Yan G H, Ma R J. Modelling occurrence and quantity of longitudinal semicontinuous data simultaneously with nonparametric unobserved heterogeneity. Canadian Journal of Statistics, 2024, 52(3):855-872.
[8] Zhang Y. Likelihood-based and Bayesian methods for Tweedie compound poisson linear mixed models. Statistics and Computing, 2013, 23(6):743-757.
[9] Swallow B, Buckland S T, King R, et al. Bayesian hierarchical modelling of continuous nonnegative longitudinal data with a spike at zero:An application to a study of birds visiting gardens in winter. Biometrical Journal, 2016, 58(2):357-371.
[10] Ye T, Lachos V H, Wang X, et al. Comparisons of zero-augmented continuous regression models from a Bayesian perspective. Statistics in Medicine, 2021, 40(5):1073-1100.
[11] Dunn P K, Smyth G K. Series evaluation of Tweedie exponential dispersion model densities. Statistics and Computing, 2005, 15(4):267-280.
[12] Dunn P K, Smyth G K. Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 2008, 18(1):73-86.
[13] Dunn P K, Smyth G K. Generalized Linear Models with Examples in R. New York:Springer, 2018.
[14] Zhang Y. Cplm:Compound Poisson linear models. 2022, https://cran.r-project.org/web/packages/cplm/vignettes/cplm.pdf.
[15] Halder A, Mohammed S, Chen K, et al. Spatial Tweedie exponential dispersion models:An application to insurance rate-making. Scandinavian Actuarial Journal, 2021, 2021(10):1017-1036.
[16] 段星德, 伍震寰, 张钟妮, 等. 半参数双重Tweedie复合泊松回归模型的贝叶斯分析. 应用数学, 2024, 37(1):272-279. (Duan X D, Wu Z H, Zhang Z N, et al. A semiparametric Bayesian approach to double Tweedie compound Poisson regression model. Mathematica Applicata, 2024, 37(1):272-279.)
[17] Ma R, Jørgensen B. Nested generalized linear mixed models:An orthodox best linear unbiased predictor approach. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 2007, 69(4):625-641.
[18] Lang S, Brezger A. Bayesian P-splines. Journal of Computational and Graphical Statistics, 2004, 13(1):183-212.
[19] Wood S N. Generalized Additive Models:An Introduction with R (Second Edition). New York:CRC Press, 2022.
[20] T伍震寰. 复杂半连续纵向数据下Tweedie混合效应模型的贝叶斯分析. 硕士论文. 贵州财经大学, 贵阳, 2022. (Wu Z H. Bayesian analysis of Tweedie mixed effects model for complex longitudinal semicontinuous data. Master's Thesis. Guizhou University of Finance and Economy, Guiyang, 2022.)
[21] Kurz C F. Tweedie distributions for fitting semicontinuous health care utilization cost data. BMC Medical Research Methodology, 2017, 17(1):171-178.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金项目(12161014),全国统计科学研究项目(2021LY011),贵州省省级科技计划项目(黔科合基础[2020]1Y009)资助课题.
{{custom_fund}}