Analysis of the Retweet Mechanism of Social Medi—Based on Topic Filtering and Causal Inference

HUANG Xiaohui, YAN Zhihua, TANG Xijin

Journal of Systems Science and Mathematical Sciences ›› 2024, Vol. 44 ›› Issue (6) : 1534-1549.

PDF(992 KB)
PDF(992 KB)
Journal of Systems Science and Mathematical Sciences ›› 2024, Vol. 44 ›› Issue (6) : 1534-1549. DOI: 10.12341/jssmsKSS23868

Analysis of the Retweet Mechanism of Social Medi—Based on Topic Filtering and Causal Inference

  • HUANG Xiaohui1,2, YAN Zhihua3, TANG Xijin1,2
Author information +
History +

Abstract

Recognizing the primary factors that influence information diffusion on social media platforms holds significant importance in the containment of harmful information spread. Previous research has primarily utilized regression analysis to identify variables that have a significant impact on retweets. However, these approaches have been limited in terms of interpretability. Using statistical modeling and causal inference, this study analyzes the variables that affect retweets from user and text features. Subsequently, the dose-response function is generated to elucidate the causal relationship of the text sentiment to retweets. Additionally, considering the potential collection bias in observed social media datasets, this study uses topical clustering for data filtration. In the experimental analysis of Twitter dataset related to the Vaccine discussion and presidential election, we have identified the variables that impact the retweets, and investigated the causal impact of text sentiment to retweets.

Key words

Causal inference / topic filtering / Poisson regression / information diffusion

Cite this article

Download Citations
HUANG Xiaohui , YAN Zhihua , TANG Xijin. Analysis of the Retweet Mechanism of Social Medi—Based on Topic Filtering and Causal Inference. Journal of System Science and Mathematical Science Chinese Series, 2024, 44(6): 1534-1549 https://doi.org/10.12341/jssmsKSS23868

References

[1] Pulido C M, Villarejo-Carballido B, Redondo-Sama G, et al. COVID-19 infodemic: More retweets for science-based information on coronavirus than for false information. International Sociology, 2020, 35(4): 377-392.
[2] Thelwall M, Thelwall S. A thematic analysis of highly retweeted early COVID-19 tweets: Consensus, information, dissent and lockdown life. Aslib Journal of Information Management, 2020, 72(6): 945-962.
[3] Graham T, Bruns A, Zhu G, et al. Like a virus: The coordinated spread of coronavirus disinformation. The Australia Institute, Canberra, A.C.T. 2020, https://api.semanticscholar.org/Corpus ID: 225902434.
[4] Yang K C, Torres-Lugo C, Menczer F. Prevalence of low-credibility information on twitter during the COVID-19 outbreak. 2020, arXiv: 2004.14484.
[5] 黄晓辉,卢焱,唐锡晋. 基于在线媒体的新冠疫情社会舆情多视角分析. 系统科学与数学, 2021, 41(8): 2182-2198. (Huang X H, Lu Y, Tang X J. Multi-perspective analysis of public opinion related to COVID-19 based on online media. Journal of Systems Science and Mathematical Sciences, 2021, 41(8): 2182-2198.)
[6] Boyd D, Golder S, Lotan G. Tweet, tweet, retweet: Conversational aspects of retweeting on Twitter. 43rd Hawaii International Conference on System Sciences, IEEE, 2010, 1-10.
[7] Suh B, Hong L, Pirolli P, et al. Want to be retweeted? Large scale analytics on factors impacting retweet in Twitter network. 2010 IEEE Second International Conference on Social Computing, IEEE, 2010, 177-184.
[8] Liu G, Shi C, Chen Q, et al. A two-phase model for retweet number prediction. The 15th International Conference on Web-Age Information Management (WAIM2014, Macau, June 16-18), Springer, 2014, 781-792.
[9] Xiong F, Liu Y, Zhang Z, et al. An information diffusion model based on retweeting mechanism for online social media. Physics Letters A, 2012, 376(30-31): 2103-2108.
[10] Jost J T, Barberá P, Bonneau R, et al. How social media facilitates political protest: Information, motivation, and social networks. Political Psychology, 2018, 39: 85-118.
[11] Wang Y, Vasilakos A V, Ma J, et al. On studying the impact of uncertainty on behavior diffusion in social networks. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2014, 45(2): 185-197.
[12] Xie J, Meng F, Sun J, et al. Detecting and modelling real percolation and phase transitions of information on social media. Nature Human Behaviour, 2021, 5(9): 1161-1168.
[13] Baumeister R F, Vohs K D, Nathan DeWall C, et al. How emotion shapes behavior: Feedback, anticipation, and reflection, rather than direct causation. Personality and Social Psychology Review, 2007, 11(2): 167-203.
[14] Stieglitz S, Dang-Xuan L. Emotions and information diffusion in social media-sentiment of microblogs and sharing behavior. Journal of Management Information Systems, 2013, 29(4): 217-248.
[15] Caglayan M, Xu B. Sentiment volatility and bank lending behavior. International Review of Financial Analysis, 2016, 45: 107-120.
[16] Choi K H, Yoon S M. Investor sentiment and herding behavior in the Korean stock market. International Journal of Financial Studies, 2020, 8(2): 1-14.
[17] 孙亚菲,王春艳,苏木亚. 投资者情绪与股价崩盘风险——基于企业过度投资的中介效应. 系统科学与数学, 2020, 40(4): 657-685. (Sun Y F, Wang C Y, Su M Y. Research on the influence of investor sentiment on stock price crash risk—Based on the mesomeric effect of excessive investment by enterprises. Journal of Systems Science and Mathematical Sciences, 2020, 40(4): 657-685.)
[18] 蔡毅,唐振鹏,吴俊传, 等.异质投资者情绪对股市的影响研究——基于文本语义分析. 系统科学与数学, 2021, 41(11): 3093-3108. (Cai Y, Tang Z P, Wu J C, et al. Research on the influence of heterogeneous investor emotion on stock market: Based on text semantic analysis. Journal of Systems Science and Mathematical Sciences, 2021, 41(11): 3093-3108.)
[19] Chakrabarty S, Chopin M, Darrat A. Predicting future buyer behavior with consumers' confidence and sentiment indexes. Marketing Letters, 1998, 9: 349-360.
[20] Munezero M, Montero C S, Sutinen E, et al. Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text. IEEE Transactions on Affective Computing, 2014, 5(2): 101-111.
[21] Rubin D B. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 2005, 100: 322-331.
[22] Breunig M M, Kriegel H P, Ng R T, et al. LOF: Identifying density-based local outliers. Proceedings of the ACM SIGMOD International Conference on Management of Data, 2000, 93-104.
[23] Na G S, Kim D, Yu H. Dilof: Effective and memory efficient local outlier detection in data streams. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, 1993-2002.
[24] Alghushairy O, Alsini R, Soule T, et al. A review of local outlier factor algorithms for outlier detection in big data streams. Big Data and Cognitive Computing, 2020, 5(1), DOI: 10.3390/bdcc 5010001.
[25] Bansal R, Gaur N, Singh S N. Outlier detection: Applications and techniques in data mining. 6th International Conference on Cloud System and Big Data Engineering, IEEE, 2016, 373-377.
[26] Yin J, Wang J. A model-based approach for text clustering with outlier detection. 2016 IEEE 32nd International Conference on Data Engineering (ICDE), IEEE, 2016, 625-636.
[27] 唐洪婷,蔡秀定,张延林, 等. 基于深度学习的企业开放社区用户创意挖掘方法研究. 系统工程理论与实践, 2021, 41(10): 2488-2500. (Tang H T, Cai X D, Zhang Y L, et al. Extracting users' ideas in open innovation community using deep learning methods. Systems Engineering—Theory] & Practice, 2021, 41(10): 2488-2500.)
[28] Park C H. A comparative study for outlier detection methods in high dimensional text data. Journal of Artificial Intelligence and Soft Computing Research, 2023, 13(1): 5-17.
[29] Grootendorst M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. 2022, arXiv: 2203.05794.
[30] Stewart L G, Arif A, Starbird K. Examining trolls and polarization with a retweet network. Workshop on Misinformation and Misbehavior Mining on the Web (Proc. ACM WSDM), 2018, 70-75.
[31] Zhang Q, Gong Y, Guo Y, et al. Retweet behavior prediction using hierarchical dirichlet process. Proceedings of the AAAI Conference on Artificial Intelligence, 2015, 29(1): 403-409.
[32] Suh B, Hong L, Pirolli P, et al. Want to be retweeted? Large scale analytics on factors impacting retweet in Twitter network. 2010 IEEE Second International Conference on Social Computing, IEEE, 2010, 177-184.
[33] Peng H K, Zhu J, Piao D, et al. Retweet modeling using conditional random fields. 2011 IEEE 11th International Conference on Data Mining Workshops, IEEE, 2011, 336-343.
[34] Zhang Q, Gong Y, Wu J, et al. Retweet prediction with attention-based deep neural network. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 2016, 75-84.
[35] Chen S, Mao J, Li G, et al. Uncovering sentiment and retweet patterns of disaster-related tweets from a spatiotemporal perspective—A case study of Hurricane Harvey. Telematics and Informatics, 2020, 47: 101326.
[36] Wang L, Niu J, Yu S. SentiDiff: Combining textual information and sentiment diffusion patterns for Twitter sentiment analysis. IEEE Transactions on Knowledge and Data Engineering, 2019, 32(10): 2026-2039.
[37] 廖琳, 黄涛. 信源、信息内容、情绪特征对微博转发的影响探究. 现代情报, 2020, 40(9): 42-52. (Liao L, Huang T. Research on the influence of source, information content and emotional characteristics on Weibo reposting. Journal of Modern Information, 2020, 40(9): 42-52)
[38] Solovey K, Pröochs N. Moral emotions shape the virality of COVID-19 misinformation on social media. Proceedings of the ACM Web Conference 2022, 2022, 3706-3717.
[39] Jin Z, Peng Z, Vaidhya T, et al. Mining the cause of political decision-making from social media: A case study of COVID-19 policies across the US states. Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, 288-301.
[40] Yuan Y, Saha K, Keller B, et al. Mental health coping stories on social media: A causal-inference study of papageno effect. Proceedings of the 2023 ACM Web Conference, 2023, 2677-2685.
[41] Plutchik R. Emotions: A general psychoevolutionary theory. Approaches to Emotion, 1984, 1984 (197-219): 2-4.
[42] Plutchik R. The Psychology and Biology of Emotion. New York: HarperCollins College Publishers, 1994.
[43] Liu Y, Ott M, Goyal N, et al. Roberta: A robustly optimized bert pretraining approach. 2019, arXiv: 1907.11692.
[44] Mohammad S M, Turney P D. NRC emotion lexicon. National Research Council, Canada, 2013, 2: 234, DOI: 10.4224/21270984.
[45] Reimers N, Gurevych I. Sentence-bert: Sentence embeddings using siamese bert-networks. 2019, arXiv: 1908.10084.
[46] Vermeulen M, Smith K, Eremin K, et al. Application of uniform manifold approximation and projection (UMAP) in spectral imaging of artworks. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2021, 252: 119547.
[47] McInnes L, Healy J, Astels S. Hdbscan: Hierarchical density based clustering. J. Open Source Softw, 2017, 2(11): 205, DOI: 10.21105/joss.00205.
[48] Consul P C, Famoye F. Generalized Poisson regression model. Communications in Statistics-Theory and Methods, 1992, 21(1): 89-109.
[49] Imbens G W, Rubin D B. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge: Cambridge University Press, 2015.
[50] Hirano K, Imbens G W. The propensity score with continuous treatments. Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives, 2004, 226164: 73-84.
[51] Moodie E E M, Stephens D A. Estimation of dose-response functions for longitudinal data using the generalised propensity score. Statistical Methods in Medical Research, 2012, 21(2): 149-166.
[52] Galagate D. Causal inference with a continuous treatment and outcome: Alternative estimators for parametric dose-response functions with applications. Doctor Thesis. University of Maryland, College Park, Maryland, 2016.
PDF(992 KB)

316

Accesses

0

Citation

Detail

Sections
Recommended

/