Blog 8: Linguistic and quantitative processing of the ‘vaccination’ corpus (English, part.2)

6 janvier 2022


Research Notebook

'Mixology' is an open research project, which aims to extract opinions in times of crisis, here from a corpus collected via the Twitter API, from December 12 to 31, 2021.

The analysis of the bigrams and trigrams (with the R tidytext package) reminds us that the vaccination campaign is as much about health as politicals. In the subsection of the corpus relating to the United Kingdom, the names of Prime Minister Boris Johnson and Minister of Health Sajid Javid come among the first occurrences. These are notably linked to the vote for a vaccination passport. However, health is also at the center of Twitter users’ concerns, particularly regarding the vaccines’ side effects. Therefore, tweets are both conditioned by the news and also by the well-being of the person (« sore arm », « serious illness », « freezing sick scared », « feel better soon »). These findings are transversal to the entire « vaccination » corpus.



In the 58,425 observations relating to all countries excluding the United Kingdom (Luxembourg, France, the Netherlands, Belgium, Ireland, Germany, Switzerland – i.e. 20% of this first corpus), the topic of mandatory vaccination is also pregnant. The first politician to be cited in the six EU countries is Ursula von der Leyen, President of the European Commission. Twitter users questioned her relationships with Pfizer/Biontech. Note that one news media is at the top of the trigrams of the European sub-corpus: the New York Times.



In this second level of analysis, the debate seems less polarized between pro and anti vaccines: the bigrams « unvaccinated people » and « antivax people » obtain respectively 667 and 252 occurrences, while « booster jab » gets 7,776. The trigrams confirm this observation but show some animosity towards those vaccinated, which are associated with the terms denial, extremism, propaganda, activism, and idiots (to be continued).



# # #

Read more

Blog 21: Politicians, experts, and journalists

Blog 20: For vaccination, against restrictions

Blog 19: Comparative Sentiment Analysis

Blog18: A health and political crisis

Blog 17: Anatomy of the “political/sanitary measures” sub-corpus (en)

Blog 16: Sentiment analysis of the ‘vaccination’ sub-corpus (en, part.2)

Blog 15: Comparative sentiment analysis of the ‘vaccination’ sub-corpus (en, part.1)

Blog 14: An adapted dictionary for the Covid crisis and sentiment analysis

Blog 13: Building a stop words list

Blog 12: Main Dictionaries for Sentiment Analysis

Blog 11: Statistical description of the corpus #RStats

Blog 10: Sentiment analysis or the assessment of subjectivity

Blog 9: Topic modeling of the ‘vaccination’ corpus (English)

Blog 8: Linguistic and quantitative processing of the ‘vaccination’ corpus (English, part.2)

Blog 7: Linguistic and quantitative processing of the ‘vaccination’ corpus (English, part.1)

Blog 6: Collecting the corpus and preparing the lexical analysis

Blog 5: The textclean package

Blog 4: Refining the queries

Blog 3: The rtweet package

Blog 2: Collecting the corpus

Blog 1: An open research project

The challenges of research on media use in times of crisis