中国科学院数学与系统科学研究院期刊网

2026年, 第39卷, 第1期 刊出日期:2026-01-21
  

  • 全选
    |
  • ZHU Liping, XU Wangli, LI Yingxing
    系统科学与复杂性(英文). 2026, 39(1): 1-2. https://doi.org/10.1007/s11424-026-6000-3
    摘要 ( ) PDF全文 ( )   可视化   收藏
  • DONG Yuexiao, LI Lei
    系统科学与复杂性(英文). 2026, 39(1): 3-16. https://doi.org/10.1007/s11424-026-5408-0
    摘要 ( ) PDF全文 ( )   可视化   收藏
    The authors extend the marginal coordinate test for predictor contribution (Cook, 2004) to the case with multivariate responses. Instead of explicitly specifying the link functions between the responses and the predictors, an asymptotic test is proposed under the normality assumption of the predictors as well as an asymmetry assumption about the unknown regression mean function. When these assumptions are violated, the asymptotic test with elliptical trimming and clustering is still valid with desirable numerical performances.
  • WANG Chuhan, HUANG Jiaqi, LI Xuerui
    系统科学与复杂性(英文). 2026, 39(1): 17-37. https://doi.org/10.1007/s11424-026-4608-y
    摘要 ( ) PDF全文 ( )   可视化   收藏
    This paper examines whether the parametric regression model is correctly specified for both source and target data and whether the regression pattern in the source domain aligns with that of the target domain. This evaluation is a critical prerequisite for applying model-based transfer learning methods under covariate shift assumptions. Traditional regression model checks and two-sample regression tests are insufficient to address this issue. To overcome these limitations, the authors propose a novel adaptive-to-regression test statistic that is asymptotically distribution-free. Under the null hypothesis, the test follows a chi-square weak limit, preserving the significance level and enabling critical value determination without resampling techniques. Additionally, the authors systematically analyze the test’s power performance, highlighting its sensitivity to different sub-local alternatives that deviate from the null hypothesis. Numerical studies, including simulations, assess finite-sample performance, and a real-world data example is provided for illustration.
  • WANG Shuailin, LIN Lu
    系统科学与复杂性(英文). 2026, 39(1): 38-78. https://doi.org/10.1007/s11424-025-5049-8
    摘要 ( ) PDF全文 ( )   可视化   收藏
    In this article, the authors explore the online updating estimation for general estimating equations (EEs) in heterogeneous streaming data settings. The framework is based on more conservative model assumptions, leading to more robust estimations and preventing misspecification. The authors establish the standard renewable estimation under blockwise heterogeneity assumption, which can correctly specify model in some sense. To mitigate heterogeneity and enhance estimation accuracy, the authors propose two novel online detection and fusion strategies, with corresponding algorithms provided. Theoretical properties of the proposed methods are demonstrated in the context of small block sizes. Extensive numerical experiments validate the theoretical findings. Real data analysis of the Ford Gobike docked bike-sharing dataset verifies the feasibility and robustness of the proposed methods.
  • SOALE Abdul-Nasah, DONG Yuexiao
    系统科学与复杂性(英文). 2026, 39(1): 79-87. https://doi.org/10.1007/s11424-026-4626-9
    摘要 ( ) PDF全文 ( )   可视化   收藏
    Classical linear discriminant analysis (LDA) (Fisher, 1936) implicitly assumes the classification boundary depends on only one linear combination of the predictors. This restriction can lead to poor classification in applications where the decision boundary depends on multiple linear combinations of the predictors. To overcome this challenge, the authors first project the predictors onto an envelope central space and then perform LDA based on the sufficient predictor. The performance of the proposed method in improving classification accuracy is demonstrated in both synthetic data and real applications.
  • LIANG Jia, SONG Weixing, SHI Jianhong
    系统科学与复杂性(英文). 2026, 39(1): 88-114. https://doi.org/10.1007/s11424-026-5100-4
    摘要 ( ) PDF全文 ( )   可视化   收藏
    In this paper, the authors propose a class of test procedures to check the fitness of parametric forms of the variance function in regression models when the mean function is unknown. By evaluating the unknown mean function with the classical kernel estimator, the proposed test statistics are built upon a modified minimum distance between a nonparametric fit and a parametric estimator under the null hypothesis for the variance function. Asymptotic properties of the estimator of the parameters in the variance function are discussed, and the large sample distribution of the test statistics under the null hypothesis is established, as well as the consistency and the power under some local alternative hypotheses. Extensive numerical studies demonstrate that the proposed test procedures have satisfactory finite sample performance. Finally, two real data examples further showcase the effectiveness of the proposed test in real applications.
  • LIU Yanhong, JIA Yinxu, WANG Guanghui, WANG Zhaojun, ZOU Changliang
    系统科学与复杂性(英文). 2026, 39(1): 115-135. https://doi.org/10.1007/s11424-026-5075-1
    摘要 ( ) PDF全文 ( )   可视化   收藏
    Model checking evaluates whether a statistical model faithfully captures the underlying data-generating process. Classical tests—such as local-smoothing and empirical-process methods—break down in high dimensions. More recent approaches use predictiveness comparisons with flexible machine-learning model fitting procedures to yield algorithm-agnostic tests, yet they require large labeled samples. The authors introduce a prediction-powered, semi-supervised framework that: 1) Imputes responses for unlabeled data via a pretrained model; 2) Corrects imputation bias with a rectifier calibrated on labeled data; 3) Adaptively balances these components through a data-driven power-tuning parameter. Building on algorithm-agnostic out-of-sample predictiveness comparisons, the proposed method integrates unlabeled information to enhance power. Theoretical analyses and numerical results demonstrate that the proposed test controls Type I error and substantially improves power over fully supervised counterparts, even under imputation-model misspecification.
  • YANG Xiaojie, WANG Qihua
    系统科学与复杂性(英文). 2026, 39(1): 136-157. https://doi.org/10.1007/s11424-026-5188-6
    摘要 ( ) PDF全文 ( )   可视化   收藏
    Within the sufficient dimension reduction framework, research on nonignorable missing data remains relatively scarce, primarily due to the associated identifiability issues. This paper considers the problem of sufficient dimension reduction when the response is subject to nonignorable missingness. By adopting a flexible semiparametric missingness mechanism to ensure identifiability, the authors construct three classes of estimating equations based on inverse probability weighting, regression imputation and augmented inverse probability weighting. The novel aspects of the proposed methods also include the incorporation of sufficient dimension reduction techniques in the implementation of these estimating equations to mitigate the high-dimensional effect, and the construction of the estimator for the conditional expectation of the estimating functions given both the covariates and the missingness indicator. The authors prove that the resulting three estimators are asymptotically normally distributed. Comprehensive simulation studies are conducted to assess the finite-sample performance of the proposed methods, and an application to PM2.5 concentration data is also presented.
  • ZENG Bilin, ADEKPEDJOU Akim, WEN Xuerong Meggie
    系统科学与复杂性(英文). 2026, 39(1): 158-179. https://doi.org/10.1007/s11424-026-5072-4
    摘要 ( ) PDF全文 ( )   可视化   收藏
    Multi-dimensional arrays are referred to as tensors. Tensor-valued predictors are commonly encountered in modern biomedical applications, such as electroencephalogram (EEG), magnetic resonance imaging (MRI), functional MRI (fMRI), diffusion-weighted MRI, and longitudinal health data. In survival analysis, it is both important and challenging to integrate clinically relevant information, such as gender, age, and disease state along with medical imaging tensor data or longitudinal health data to predict disease outcomes. Most existing higher-order sufficient dimension reduction regressions for matrix- or array-valued data focus solely on tensor data, often neglecting established clinical covariates that are readily available and known to have predictive value. Based on the idea of Folded-Minimum Average Variance Estimation (Folded-MAVE: Xue and Yin, 2014), the authors propose a new method, Partial Dimension Folded-MAVE (PF-MAVE), to address regression mean functions with tensor-valued covariates while simultaneously incorporating clinical covariates, which are typically categorical variables. Theorems and simulation studies demonstrate the importance of incorporating these categorical clinical predictors. A survival analysis of a longitudinal study of primary biliary cirrhosis (PBC) data is included for illustration of the proposed method.
  • WANG Xiaofeng, LIU Xingwei, XU Wangli
    系统科学与复杂性(英文). 2026, 39(1): 180-202. https://doi.org/10.1007/s11424-026-5104-0
    摘要 ( ) PDF全文 ( )   可视化   收藏
    The support vector machine, a widely used binary classification method, may expose sensitive information during training. To address this, the authors propose a personalized differential privacy method that extends differential privacy. Specifically, the authors introduce personalized differentially private support vector machines to meet different individuals’ privacy requirements, using a reweighting strategy and the Laplace mechanism. Theoretical analysis demonstrates that the proposed methods simultaneously satisfy the requirements of personalized differential privacy and ensure model prediction accuracy at these privacy levels. Extensive experiments demonstrate that the proposed methods outperform the existing methods.
  • YANG Lin, GAO Yuzhao, QU Lianqiang
    系统科学与复杂性(英文). 2026, 39(1): 203-229. https://doi.org/10.1007/s11424-026-5463-6
    摘要 ( ) PDF全文 ( )   可视化   收藏
    The authors consider the issue of hypothesis testing in varying-coefficient regression models with high-dimensional data. Utilizing kernel smoothing techniques, the authors propose a locally concerned U-statistic method to assess the overall significance of the coefficients. The authors establish that the proposed test is asymptotically normal under both the null hypothesis and local alternatives. Based on the locally concerned U-statistic, the authors further develop a globally concerned U-statistic to test whether the coefficient function is zero. A stochastic perturbation method is employed to approximate the distribution of the globally concerned test statistic. Monte Carlo simulations demonstrate the validity of the proposed test in finite samples.
  • HUANG Xueyan, LI Yunchen, YING Chao, YU Zhou
    系统科学与复杂性(英文). 2026, 39(1): 230-254. https://doi.org/10.1007/s11424-026-5102-2
    摘要 ( ) PDF全文 ( )   可视化   收藏
    In this paper, the authors propose a nonlinear dimension reduction technique based on Fréchet inverse regression to achieve sufficient dimension reduction for responses in metric spaces and predictors in Riemannian manifolds. The authors rigorously establish statistical properties of the estimators, providing formal proofs of their consistency and asymptotic behaviors. The effectiveness of our method is demonstrated through extensive simulations and applications to real-world datasets which highlight its practical utility for complex data with non-Euclidean structures.
  • JIA Xinru, ZHU Xuehu, ZHANG Jun
    系统科学与复杂性(英文). 2026, 39(1): 255-283. https://doi.org/10.1007/s11424-026-5101-3
    摘要 ( ) PDF全文 ( )   可视化   收藏
    Model checking is crucial in statistical analyses and has garnered significant attention in the academic literature. However, certain challenges persist in scenarios that involve large-scale datasets and limited resource allocations. This research introduces a novel subsampling methodology for testing regression models with continuous and categorical predictors, referred to as the Subsampling Adaptive Projection-Test (SAPT). This innovative approach demonstrates substantial improvements in test power for both local and global alternatives, outperforming conventional uniform subsampling mechanisms. The authors rigorously establish the asymptotic properties of SAPT and delineate its maximum achievable power under asymptotic conditions. Comprehensive simulations and real-world dataset applications provide robust validation of the proposed theoretical propositions.
  • ZENG Jing, WANG Ning, ZHANG Xin
    系统科学与复杂性(英文). 2026, 39(1): 284-308. https://doi.org/10.1007/s11424-026-5097-8
    摘要 ( ) PDF全文 ( )   可视化   收藏
    In this note, the authors revisit the envelope dimension reduction, which was first introduced for estimating a sufficient dimension reduction subspace without inverting the sample covariance. Motivated by the recent developments in envelope methods and algorithms, the authors refresh the envelope inverse regression as a flexible alternative to the existing inverse regression methods in dimension reduction. The authors discuss the versatility of the envelope approach and demonstrate the advantages of the envelope dimension reduction through simulation studies.
  • SONG Minghui, QU Tianyao, ZHAO Zhihao, ZOU Guohua
    系统科学与复杂性(英文). 2026, 39(1): 309-333. https://doi.org/10.1007/s11424-025-5054-y
    摘要 ( ) PDF全文 ( )   可视化   收藏
    In the era of massive data, the study of distributed data is a significant topic. Model averaging can be effectively applied to distributed data by combining information from all machines. For linear models, the model averaging approach has been developed in the context of distributed data. However, further investigation is needed for more complex models. In this paper, the authors propose a distributed optimal model averaging approach based on multivariate additive models, which approximates unknown functions using B-splines allowing each machine to have a different smoothing degree. To utilize the information from the covariance matrix of dependent errors in multivariate multiple regressions, the authors use the Mahalanobis distance to construct a Mallows-type weight choice criterion. The criterion can be computed by transmitting information between the local machines and the center machine in two steps. The authors demonstrate the asymptotic optimality of the proposed model averaging estimator when the covariates are subject to uncertainty, and obtain the convergence rate of the weight vector to the theoretically optimal weights. The results remain novel even for additive models with a single response variable. The numerical examples show that the proposed method yields good performance.
  • CHEN Kangan, LIU Jian, HU Qingpei, XIE Min
    系统科学与复杂性(英文). 2026, 39(1): 334-362. https://doi.org/10.1007/s11424-026-4556-6
    摘要 ( ) PDF全文 ( )   可视化   收藏
    While parametric Software Reliability Growth Models (SRGMs) serve as a cornerstone in software reliability assessment, their reliance on known fault-detection time distributions often presents a significant limitation in practical software testing. In this study, the authors develop a novel shape-restricted spline estimator for quantifying software reliability. Compared with parametric SRGMs, the proposed estimator not only shares a key characteristic with parametric SRGMs, but also obviates the need for specifying fault-detection time distributions. More importantly, it effectively utilizes the critical shape information of the mean value function (MVF) of fault-detection process, a detail seldom considered in prior work. Moreover, the authors investigate the predictive performance of the proposed methods by employing the so-called one-step look-ahead prediction method. Furthermore, the authors show that under certain conditions, the shape-restricted spline estimator will attain the point-wise convergence rate ${{O}_{P}}({{n}^{-3/7}})$. In numerical experiment, the authors show that spline estimators under restriction demonstrate competitive performance compared to parametric and certain non-parametric models.
  • QIAO Xinhui, YE Peng, HE Hua, FENG Han, FANG Xiangzhong
    系统科学与复杂性(英文). 2026, 39(1): 363-384. https://doi.org/10.1007/s11424-026-4622-0
    摘要 ( ) PDF全文 ( )   可视化   收藏
    Smartphone-based electrocardiograms (ECGs) are increasingly utilized for monitoring atrial fibrillation (AF) recurrence after catheter ablation (CA), referred to as smartphone AF burden (SMURDEN). The SMURDEN data often exhibit complex patterns of zero AF episodes, which may arise from either true AF-free status (structural zeros) or missed AF episodes due to intermittent monitoring (random zeros). Such a mixture of AF-free and at-risk patients can lead to zero-inflation in the data. The authors propose a novel zero-inflation test for binomial regression models to identify recurrence-free AF populations. Unlike traditional approaches requiring fully specified zero-inflated models, the proposed test utilizes a weighted average of the discrepancies between observed and expected zero proportions, with weights determined by binomial sizes. A closed-form test statistic is developed, and its asymptotic distribution is derived using estimating equations. Simulations demonstrate superior performance over existing methods, and real-world AF monitoring data validate the practical utility of our proposed test.
  • CHEN Dan, CHEN Ruijing, TANG Jiarui, LI Huimin
    系统科学与复杂性(英文). 2026, 39(1): 385-409. https://doi.org/10.1007/s11424-026-4566-4
    摘要 ( ) PDF全文 ( )   可视化   收藏
    Quantile regression (QR) has become an important tool to measure dependence of response variable’s quantiles on a number of predictors for heterogeneous data, especially heavy-tailed data and outliers. However, it is quite challenging to make statistical inference on distributed high-dimensional QR with missing data due to the distributed nature, sparsity and missingness of data and non-differentiable quantile loss function. To overcome the challenge, this paper develops a communication-efficient method to select variables and estimate parameters by utilizing a smooth function to approximate the non-differentiable quantile loss function and incorporating the idea of the inverse probability weighting and the penalty function. The proposed approach has three merits. First, it is both computationally and communicationally efficient because only the first- and second-order information of the approximate objective function are communicated at each iteration. Second, the proposed estimators possess the oracle property after a limited number of iterations without constraint on the number of machines. Third, the proposed method simultaneously selects variables and estimates parameters within a distributed framework, ensuring robustness to the specified response probability or propensity score function of the missing data mechanism. Simulation studies and a real example are used to illustrate the effectiveness of the proposed methodologies.
  • GONG Fuzhou, XIA Zigeng
    系统科学与复杂性(英文). 2026, 39(1): 410-431. https://doi.org/10.1007/s11424-026-4315-8
    摘要 ( ) PDF全文 ( )   可视化   收藏
    Synthesizing images or texts automatically becomes a useful research area in the artificial intelligence nowadays. Generative adversarial networks (GANs), proposed by Goodfellow, et al. in 2014, make this task to be done more efficiently by using deep neural networks (DNNs). The authors consider generating corresponding images from a single-sentence input text description using a GAN. Specifically, the authors analyze the GAN-CLS algorithm, which is a kind of advanced method of GAN proposed by Reed, et al. in 2016. In this paper the authors show the theoretical problem with this algorithm and correct it by modifying the objective function of the model. Experiments are performed on the Oxford-102 dataset and the CUB dataset to support the theoretical results. Since the proposed modification can be seen as an idea which can be used to improve all such kind of GAN models, the authors try two models, GAN-CLS and AttnGANGPT. As a result, in both of the two models, the proposed modified algorithm is more stable and can generate images which are more plausible than the original algorithm. Also, some of the generated images match the input texts better, and the proposed modified algorithm has better performance on the quantitative indicators including FID and Inception Score. Finally, the authors propose some future application prospect of the modification idea, especially in the area of large language models.
  • YE Gen, ZHAO Puying, TANG Niansheng
    系统科学与复杂性(英文). 2026, 39(1): 432-455. https://doi.org/10.1007/s11424-026-4619-8
    摘要 ( ) PDF全文 ( )   可视化   收藏
    This paper aims to develop a unified Bayesian approach for clustered data analysis when observations are subject to missingness at random. The authors consider a general framework in which the parameters of interest are defined through estimating equations, and the probability of missingness follows a general parametric form. The generalized method of moments framework is employed to derive an optimal combination of inverse-probability-weighted estimating equations for the parameters of interest and score equations for propensity score. Using this framework, the authors develop a quasi-Bayesian analysis for clustered samples with missing values. A unified model selection approach is also proposed to compare models characterized by different moment conditions. The authors systematically evaluate the large-sample properties of the proposed quasi-posterior density with both fixed and shrinking priors and establish the selection consistency of the proposed model selection criterion. The proposed results are valid under very mild conditions and offer significant advantages for parameters defined through non-smooth estimating functions. Extensive numerical studies demonstrate that the proposed method performs exceptionally well in finite samples.
  • LIU Xirui, WU Mixia, LIU Bangshu
    系统科学与复杂性(英文). 2026, 39(1): 456-480. https://doi.org/10.1007/s11424-025-4539-z
    摘要 ( ) PDF全文 ( )   可视化   收藏
    Distributed learning is a well-established method for estimation tasks over extensively distributed datasets. However, non-randomly stored data can introduce bias into local parameter estimates, leading to significant performance degradation in classical distributed algorithms. In this paper, the authors propose a novel Distributed Quasi-Newton Pilot (DQNP) method for distributed learning with non-randomly distributed data. The proposed approach accommodates both randomly and non-randomly distributed data settings and imposes no constraints on the uniformity of local sample sizes. Additionally, it avoids the need to transfer the Hessian matrix or compute its inversion, thereby greatly reducing computational and communication complexity. The authors theoretically demonstrate that the resulting estimator achieves statistical efficiency under mild conditions. Extensive numerical experiments on synthetic and real-world data validate the theoretical findings and illustrate the effectiveness of the proposed method.