Preview

MIR (Modernization. Innovation. Research)

Advanced search

Methodology for extracting narratives from social media big data

https://doi.org/10.18184/2079-4665.2024.15.3.404-420

Abstract

Purpose: of the article is to present the experience in developing and testing the methodology for extracting a system of narratives on a socially significant phenomenon from authentic social network big data (using the example of narratives about COVID-19 vaccination in the Russian social network VKontakte during the pandemic).

Methods: of automated data analysis were used by the tools of the PolyAnalyst analytical platform: topic modeling (PLSA method), text indexing algorithms with the sentence identification stage, clustering, data aggregation, data normalization, calculation of a quantitative index. The calculation of the measure of proximity of keywords using the Python, partial manual markup and data validation were also carried out.

Results: 4.5 million messages relevant to the topic of COVID-19 vaccination published in VKontakte from 01.01.2020 to 01.03.2023 were reduced to 237 stable narratives. A popularity index was calculated for each narrative. For example, the following narrative turned out to be the most popular: “Employers put pressure on people to get vaccinated” (it was supported by 76,118 texts). As a result of the study, a dataset was obtained, including 237 narratives.

Conclusions and Relevance: the developed toolkit is universal: the methodology can be adapted to any relevant topic, requiring only adjustments to the input parameters of thematic modeling. The obtained dataset is planned to be introduced into scientific circulation as an up-to-date material for studying public opinion on vaccination in Russia. The results contribute to international research on public opinion and communication in crises and can serve as a basis for practical actions aimed at improving the quality of public communications and decision-making at all levels of government.

About the Authors

E. Yu. Petrov
National Research Tomsk State University
Russian Federation

Evgeny Yu. Petrov, Technician of the Supercomputer Center

Scopus ID: 57224334888 

Tomsk



A. Yu. Sarkisova
Lomonosov Moscow State University
Russian Federation

Anna Yu. Sarkisova, Candidate of Philological Sciences, Associate Professor, Research Associate of the School of Public Administration

Researcher ID: ABF-4692-2020, Scopus ID: 58125063500

Moscow



D. O. Dunaeva
Lomonosov Moscow State University
Russian Federation

Daria O. Dunaeva, Research Associate of the School of Public Administration

Researcher ID: ADT-1114-2022, Scopus ID: 57328403000 

Moscow



A. S. Voronov
Lomonosov Moscow State University
Russian Federation

Aleksandr S. Voronov, Doctor of Economic Sciences, Associate Professor, Professor of the School of Public Administration

Moscow



M. G. Myagkov
Lomonosov Moscow State University
Russian Federation

Mikhail G. Myagkov, PhD, Leading Researcher of the School of Public Administration

Researcher ID: G-6049-2017, Scopus ID: 6602445231

Moscow



References

1. Zhang Q., Gao J., Wu J.T., Cao Z., Zeng D.D. Data science approaches to confronting the COVID-19 pandemic: a narrative review. Philosophical Transactions. Series A, Mathematical, physical, and engineering sciences. 2021; 380:e20210127. https://doi.org/10.1098/rsta.2021.0127 (In Eng.)

2. Bozkurt A., Karakaya K., Turk M., Karakaya Ö., Castellanos-Reyes D. The impact of COVID-19 on education: A metanarrative review. TechTrends. 2022; 66:883–896. https://doi.org/10.1007/s11528-022-00759-0 (In Eng.)

3. Mennella C., Maniscalco U., De Pietro G., Esposito M. Ethical and regulatory challenges of AI technologies in healthcare: A narrative review. Heliyon Volume. 2024; 10(4):e26297. https://doi.org/10.1016/j.heliyon.2024.e26297 (In Eng.)

4. Kim J., Monroy-Hernandez A. Storia: Summarizing social media content based on narrative theory using crowdsourcing. In: CSCW '16: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work and Social Computing (February 27 – March 2, 2016). San Francisco, 2016. P. 1018–1027. https://doi.org/10.1145/2818048.2820072 (In Eng.)

5. Rudakova G.M., Korchevskaya O.V Delopment of a system for processing narrative data. ITNOU: Information technologies in science, education and management. 2018; (5(9)):33–38. EDN: https://elibrary.ru/yofcnn (In Russ.)

6. Boichenko A.E., Zhuchkova S.V. What is inside Russian rap? Topic modeling of the texts of the Russian-speaking hip-hop stage. The Journal of Sociology and Social Anthropology. 2020; 23(2):130–165. EDN: https://elibrary.ru/rqypza. https://doi.org/10.31119/jssa.2020.23.2.6 (In Russ.)

7. Ghodratnama S., Beheshti A., Zakershahrak M., Sobhanmanesh F. Intelligent narrative summaries: From indicative to informative summarization. Big Data Research. 2021; 26:1–13. https://doi.org/10.1016/j.bdr.2021.100257 (In Eng.)

8. Messaoudi C., Guessoum Z., Ben Romdhane L. Opinion mining in online social media: a survey. Social Network Analysis and Mining. 2022; 12:25. https://doi.org/10.1007/s13278-021-00855-8 (In Eng.)

9. Jaidka K. Chapter 17: Public opinion analytics with social media. In: Research Handbook on Social Media and Society / ed. Skoric M.M., Pang N. 2024. P. 224–239. https://doi.org/10.4337/9781800377059.00028 (In Eng.)

10. Oghaz T.A., Mutlu E.C., Jasser J., Yousefi N., Garibay I. Probabilistic model of narratives over topical trends in social media: A discrete time model. In: Proceedings of the 31st ACM Conference on Hypertext and Social Media (HT '20). New York, 2020. P. 281–290. https://doi.org/10.1145/3372923.3404790 (In Eng.)

11. Shahsavari S., Holur P., Wang T., Tangherlini T.R., Roychowdhury V. Conspiracy in the time of corona: Automatic detection of emerging COVID-19 conspiracy theories in social media and the news. Journal of Computational Social Science. 2020; 3:279–317. https://doi.org/10.1007/s42001-020-00086-5 (In Eng.)

12. Sharma K., Zhang Y., Liu Y. COVID-19 vaccine misinformation campaigns and social media narratives. In: Proceedings of the International AAAI Conference on Web and Social Media. 2022; 16(1):920–931. https://doi.org/10.1609/icwsm.v16i1.19346 (In Eng.)

13. Edinger A., Valdez D., Walsh-Buhi E., Trueblood J.S., Lorenzo-Luaces L., Rutter L.A., Bollen J. Misinformation and public health messaging in the early stages of the MPOX outbreak: Mapping the Twitter narrative with deep learning. Journal of Medical Internet Research. 2023; 25:e43841. https://doi.org/10.2196/43841 (In Eng.)

14. Shafiq W. Optimizing organizational performance: A data-driven approach in management science. Bulletin of Management Revew. 2024; 1(2):31–40. URL: https://bulletinofmanagement.com/index.php/Journal/article/view/48 (accessed: 05.09.2024) (In Eng.)

15. Saura J.R., Ribeiro-Soriano D., Palacios-Marqués D. Data-driven strategies in operation management: Mining usergenerated content in Twitter. Annals of Operations Research. 2024; 333:849–869. https://doi.org/10.1007/s10479-022-04776-3 (In Eng.)

16. Sarioguz O., Miser E. Data-driven decision-making: Revolutionizing management in the information era. Journal of Artificial Intelligence General Science. 2023; 4(1):179–194. https://doi.org/10.60087/jaigs.v4i1.131 (In Eng.)

17. Adegoke B.A., Odugbose T., Adeyemi C. Harnessing big data for tailored health communication: A systematic review of impact and techniques. International Journal of Biology and Pharmacy Research Updates. 2024; 03(02):001–010. https://doi.org/10.53430/ijbpru.2024.3.2.0024 (In Eng.)

18. Johnson N.F., Velásquez N., Restrepo N.J., Leahy R., Gabriel N., El Oud S., Zheng M., Manrique P., Wuchty S., Lupu Y. The online competition between pro-and anti-vaccination views. Nature. 2020; 582:230–233. https://doi.org/10.1038/s41586-020-2281-1 (In Eng.)

19. Germani F., Biller-Andorno N. The anti-vaccination infodemic on social media: A behavioral analysis. PLoS One. 2021; 16(3):e0247642. https://doi.org/10.1371/journal.pone.0247642 (In Eng.)

20. Mønsted B., Lehmann S. Characterizing polarization in online vaccine discourse – A large-scale study. PLoS One. 2022; 17(2):e0263746. https://doi.org/10.1371/journal.pone.0263746 (In Eng.)

21. Nguyen A., Catalan-Matamoros D. Anti-vaccine discourse on social media: an exploratory audit of negative tweets about vaccines and their posters. Vaccines. 2022; 10(12):2067. https://doi.org/10.3390/vaccines10122067 (In Eng.)

22. Vorontsov K.V. Problems and approaches of natural language understanding for media monitoring. In: Mathematical methods of pattern recognition: Book of abstract of the 20th Russian National Conference with International Participation, Moscow, 2021. Moscow: Russian Academy of Sciences, 2021. P. 362–367. URL: http://machinelearning.ru/wiki/images/0/02/Mmpr_2021.pdf (accessed: 05.09.2024) (In Russ.)

23. Danto A. Narrative sentences. History and Theory. 1962; 2(2):146–179. URL: https://abuss.narod.ru/Biblio/eng/danto_narrsentences.htm (accessed: 05.09.2024) (In Eng.)

24. Genette G. Narrative Discourse: An essay in method. New York: Cornell University Press, 1983. 285 p. URL: https://ia802908.us.archive.org/24/items/NarrativeDiscourseAnEssayInMethod/NarrativeDiscourse-AnEssayInMethod.pdf (accessed: 05.09.2024). (In Eng.)

25. Kempen G. Sentence parsing. In: Language Comprehension: A Biological Perspective. Berlin, Heidelberg: Springer, 1998. P. 213–228. https://doi.org/10.1007/978-3-642-97734-3_7 (In Eng.)

26. Giniyatullin V.М., Salikhova M.A., Khlybov А.V., Churilov D.A., Churilova E.A. Evaluation of the semantic similarity between assessment criteria in the educational programs of the university. Modern High Technologies. 2021; (1):12–19. EDN: https://elibrary.ru/rfttvv. https://doi.org/10.17513/snt.38464 (In Russ.)

27. Belova K.M., Sudakov V.A. Effectiveness of methods for assessing the texts relevance. In: Preprints of the Keldysh Institute of Applied Mathematics of the Russian Academy of Sciences. 2020; (68):16. http://doi.org/10.20948/prepr-2020-68 (In Russ.)


Review

For citations:


Petrov E.Yu., Sarkisova A.Yu., Dunaeva D.O., Voronov A.S., Myagkov M.G. Methodology for extracting narratives from social media big data. MIR (Modernization. Innovation. Research). 2024;15(3):404-420. (In Russ.) https://doi.org/10.18184/2079-4665.2024.15.3.404-420

Views: 255


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-4665 (Print)
ISSN 2411-796X (Online)