• •    下一篇

大数据背景下的抽样调查

金勇进1,2,3,刘晓宇2   

  1. 1. 中国人民大学应用统计科学研究中心, 北京 100872; 2. 中国人民 大学统计学院, 北京 100872; 3. 中国人民大学调查技术研究所, 北京 100872
  • 出版日期:2021-12-28 发布日期:2021-12-28

金勇进, 刘晓宇. 大数据背景下的抽样调查[J]. 系统科学与数学, 2022, 42(1): 2-16.

JIN Yongjin, LIU Xiaoyu. Sampling Survey in the Context of Big Data[J]. Journal of Systems Science and Mathematical Sciences, 2022, 42(1): 2-16.

Sampling Survey in the Context of Big Data

JIN Yongjin1,2,3 ,LIU Xiaoyu2   

  1. 1. Center for Applied Statistics, Renmin University of China, Beijing 100872; 2. School of Statistics, Renmin University of China, Beijing 100872; 3. Institute of Survey Technology, Renmin University of China, Beijing 100872
  • Online:2021-12-28 Published:2021-12-28
大数据具有体量大、种类丰富、增长速度快等特点,同时也 存在价值密度低、代表性差等问题,为抽样调查带来了机遇与挑战.大数 据背景下的抽样如何适应新的变化、具有怎样的发展和应用? 文章从三个 角度进行了讨论.一是在数据流环境下产生了一些适应性强的新型抽样方 法,能够高效、准确地获得有代表性样本,并兼顾存储空间、处理的时 间与能力.二是借助网络开展调查或进行社交网络数据的收集,发展出 一些无抽样框的非概率抽样方法,能够以低廉的成本在短时间内获得大量分析 样本.三是综合大数据与抽样调查的优势,进行线上、线下调查数据的融合,文章 针对线上样本是非概率样本、线下样本是概率样本的情况,提出了融合的 基本思路:一方面,通过概率样本对非概率样本进行``概率性检验'',另一 方面,通过提取概率样本的信息,基于模型或基于伪随机化对总体进行推断.
{Big data is characterized by large volume, rich types, and rapid growth, but it also has problems such as low value density and poor representativeness, which brings opportunities and challenges to sampling survey. In the context of big data, how does sampling survey adapt to new changes and what kind of development and application does it have? This paper discusses it from three perspectives. First, there are some new sampling methods with strong adaptability in the data stream environment, which can obtain representative samples efficiently and accurately, and take into account the storage space, processing time and ability. Secondly, some non-probability sampling methods without sampling frame have been developed by means of internet survey or social network data collection, which can obtain a large number of analysis samples in a short time at low cost. Third, the advantages of big data and sampling survey are integrated to integrate online and offline survey data. In the case that online sample is non-probability sample and offline sample is probability sample, this article puts forward the basic idea of data integration: On the one hand, probability samples are used to carry out the ``probability test'' for non-probability samples; on the other hand, the information of probability samples is extracted and make inferences based on model or pseudo-randomization.
()
No related articles found!
阅读次数
全文


摘要