Previous Articles     Next Articles

Distance-Based Regression Analysis for Measuring Associations

SHI Yuke1,2, ZHANG Wei1, LIU Aiyi3, LI Qizhai1,2   

  1. 1. LSC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China;
    3. Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health;Human Development, National Institutes of Health, Bethesda, MD 20847, USA
  • Received:2022-01-24 Revised:2022-02-13 Online:2023-01-25 Published:2023-02-09
  • Supported by:
    This work was partially supported by Beijing Natural Science Foundation under Grant No. Z180006.

SHI Yuke, ZHANG Wei, LIU Aiyi, LI Qizhai. Distance-Based Regression Analysis for Measuring Associations[J]. Journal of Systems Science and Complexity, 2023, 36(1): 393-411.

Distance-based regression model, as a nonparametric multivariate method, has been widely used to detect the association between variations in a distance or dissimilarity matrix for outcomes and predictor variables of interest in genetic association studies, genomic analyses, and many other research areas. Based on it, a pseudo-$F$ statistic which partitions the variation in distance matrices is often constructed to achieve the aim. To the best of our knowledge, the statistical properties of the pseudo-$F$ statistic has not yet been well established in the literature. To fill this gap, the authors study the asymptotic null distribution of the pseudo-$F$ statistic and show that it is asymptotically equivalent to a mixture of chi-squared random variables. Given that the pseudo-$F$ test statistic has unsatisfactory power when the correlations of the response variables are large, the authors propose a square-root $F$-type test statistic which replaces the similarity matrix with its square root. The asymptotic null distribution of the new test statistic and power of both tests are also investigated. Simulation studies are conducted to validate the asymptotic distributions of the tests and demonstrate that the proposed test has more robust power than the pseudo-$F$ test. Both test statistics are exemplified with a gene expression dataset for a prostate cancer pathway.
[1] Han F and Pan W, Powerful multi-marker association tests:Unifying genomic distance-based regression and logistic regression, Genetic Epidemiology, 2010, 34(7):680-688.
[2] Nievergelt C M, Libiger O, and Schork N J, Generalized analysis of molecular variance, PLoS Genet, 2007, 3(4):e51.
[3] Zapala M A and Schork N J, Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables, Proceedings of the National Academy of Sciences, 2006, 103(51):19430-19435.
[4] Liang X, Bushman F D, and FitzGerald G A, Rhythmicity of the intestinal microbiota is regulated by gender and the host circadian clock, Proceedings of the National Academy of Sciences, 2015, 112(33):10479-10484.
[5] Norman J M, Handley S A, Baldridge M T, et al., Disease-specific alterations in the enteric virome in inflammatory bowel disease, Cell, 2015, 160(3):447-460.
[6] Wang T, Yang C, and Zhao H, Prediction analysis for microbiome sequencing data, Biometrics, 2019, 75(3):875-884.
[7] Wu G D, Chen J, Hoffmann C, et al., Linking long-term dietary patterns with gut microbial enterotypes, Science, 2011, 334(6052):105-108.
[8] Molari M, Guilini K, Lott C, et al., CO2 leakage alters biogeochemical and ecological functions of submarine sands, Science Advances, 2018, 4(2):eaao2040.
[9] White L, O'Connor N, Yang Q, et al., Individual species provide multifaceted contributions to the stability of ecosystems, Nature Ecology & Evolution, 2020, 12(4):1594-1601.
[10] Bertocci I, Araújo R, Incera M, et al., Benthic assemblages of rock pools in northern portugal:Seasonal and between-pool variability, Scientia Marina, 2012, 76(4):781-789.
[11] Consoli P, Romeo T, Ferraro M, et al., Factors affecting fish assemblages associated with gas platforms in the Mediterranean Sea, Journal of Sea Research, 2013, 77:45-52.
[12] McArdle B and Anderson M, Fitting multivariate models to community data:A comment on distance-based redundancy analysis, Ecology, 2001, 82:290-297.
[13] Wessel J and Schork N J, Generalized genomic distance-based regression methodology for multilocus association analysis, The American Journal of Human Genetics, 2006, 79(5):792-806.
[14] Chen J, Bittinger K, Charlson E S, et al., Associating microbiome cmposition with environmental covariates using generalized UniFrac distances, Bioinformatics, 2012, 28(16):2106-2113.
[15] Gambi C, Canals M, Corinaldesi C, et al., Impact of historical sulfide mine tailings discharge on meiofaunal assemblages (Portmán Bay, Mediterranean Sea), Science of The Total Environment, 2020, 736:139641.
[16] Reiss P T, Stevens M H H, Shehzad Z, et al., On distance-based permutation tests for betweengroup comparisons, Biometrics, 2010, 66(2):636-643.
[17] Li J, Zhang W, Zhang S, et al., A theoretic study of a distance-based regression model, Science in China Series A:Mathematics, 2019, 62(5):979-998.
[18] Li Q, Wacholder S, Hunter D J, et al., Genetic background comparison using distance-based regression, with applications in population stratification evaluation and adjustment, Genetic Epidemiology, 2009, 33(5):432-441.
[19] Gretton A, Fukumizu K, Harchaoui Z, et al., A fast, consistent kernel two-sample test, Advances in Neural Information Processing Systems, 2009, 23:673-681.
[20] Zhang K, Peters J, Janzing D, et al., Kernel-based conditional independence test and application in causal discovery, Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2012, 804-813.
[21] Gower J C, Some distance properties of latent root and vector methods used in multivariate analysis, Biometrika, 1966, 53, 325-338.
[22] Li Q, Hu J, Ding J, et al., Fisher's method of combining dependent statistics using generalizations of the gamma distribution:With applications to genetic pleiotropic associations, Biostatistics, 2014, 15:284-295.
[23] Singh D, Febbo P G, Ross K, et al., Gene expression correlates of clinical prostate cancer behavior, Cancer Cell, 2002, 1(2):203-209.
[24] Wu G, Intestinal mucosal amino acid catabolism, Journal of Nutrition, 1998, 128(8):1249-1252.
[25] Zihni C, Mills C, Matter K, et al., Tight junctions:From simple barriers to multifunctional molecular gates, Nature Reviews Molecular Cell Biology, 2016, 17(9):564-580.
[26] Pinaud L, Sansonetti P J, and Phalipon A, Host cell targeting by enteropathogenic bacteria T3SS effectors, Trends in Microbiology, 2018, 26(4):266-283.
[27] Box G E P, Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification, The Annals of Mathematical Statistics, 1954, 25:290-302.
[28] Xu G, Lin L, Wei P, et al., An adaptive two-sample test for high-dimensional means, Biometrika, 2016, 103(3):609-624.
[1] Xiaohui LIU, Guofu WANG, Xuemei HU ,Bo LI. ZERO FINITE-ORDER SERIAL CORRELATION TEST IN A PARTIALLY LINEAR SINGLE-INDEX MODEL [J]. Journal of Systems Science and Complexity, 2012, 25(6): 1185-1201.
[2] Hexin ZHANG, Xiangzhong FANG, Xiaojing MA. GROUP CONTINGENCY TEST FOR  TWO OR SEVERAL INDEPENDENT SAMPLES [J]. Journal of Systems Science and Complexity, 2011, 24(6): 1183-1192.
[3] Yanqin FENG, Jinde WANG, Guoxin ZUO. A NONPARAMETRIC TEST AGAINST AN UMBRELLA ALTERNATIVE FOR STRATIFIED DATA [J]. Journal of Systems Science and Complexity, 2011, 24(4): 738-752.
[4] Junjian ZHANG;Guoying LI. INTEGRAL-TYPE TESTS FOR GOODNESS-OF-FIT [J]. Journal of Systems Science and Complexity, 2010, 23(4): 784-795.
[5] Qingzhu LEI;Yongsong QIN. A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN BIVARIATE NORMAL MIXTURES OF TWO SAMPLES [J]. Journal of Systems Science and Complexity, 2009, 22(3): 460-468.
[6] Min CHEN;Guo Fu WU;Gemai Chen. A NEW TEST FOR NORMALITY IN LINEAR AUTOREGRESSIVE MODELS [J]. Journal of Systems Science and Complexity, 2002, 15(4): 423-435.
[7] Wei Xing SONG. ONE SIDE ASYMPTOTIC EFFICIENCY IN UNIFORM DISTRIBUTIONS [J]. Journal of Systems Science and Complexity, 2001, 14(2): 159-164.
Full text