Blog 17: Anatomy of the “political/sanitary measures” sub-corpus (en)

18 février 2022


Research Notebook

'Mixology' is an open research project, which aims to extract opinions in times of crisis, here from a corpus collected via the Twitter API, from December 12 to 31, 2021.

This sub-corpus includes 153,558 observations published between December 12 and 21, 2021, by 80,039 unique users. Statistically, the results do not differ from the previous corpus with a similar proportion of terms (« not », « no », « never », « without ») (1.3%), and equivalent scores in terms of concerns the Coleman-Yau Index and the Automated Readability Index. However, the readability of the text samples is not very clear. The Flesch-Kincaid Grade Level score is very low (but let’s agree that a tweet is not literature). At the same time, the Flesch-Kincaid Reading Ease shows a better performance than that of the « vaccination » sub-corpus.


The breakdown by country shows that the UK dominates this sub-corpus (85.49%), as was observed in the previous sub-corpus.


The analysis of the n-grams frequency shows that this corpus deals with the health measures and policies put in place to fight against the pandemic and that the question of vaccination is also significant. The themes observed in the « vaccination » sub-corpus intersect in this second sub-corpus, raising the question of the relevance of the separation of these two sub-corpuses, the crisis bearing on both health and political aspects. However, this sub-corpus emphasises the role of information and media.


The analysis of the bigrams shows results influenced mainly by British politics and current events. At the time, the shadow of a new lockdown hung over the country. In the top 30 are three political figures (Boris Johnson, the Secretary of State for Health and Social Care, Sajid Javid, and the First Minister of Scotland Nicola Sturgeon) and the journalist Piers Morgan. Excluding the UK, the bigrams show an evident influence from the Netherlands, where a strict lockdown came into effect on Monday, December 20, which included the closure of non-essential businesses. The theme of the vaccine passport (or Covid certificate, involving vaccination, a negative PCR test or proof of recovery depending on national policy) appears clearly with or without the UK. In contrast, that of the side effects of anti-Covid vaccines comes in fifth place of concern, excluding the UK.


The examination of the trigrams provides a semantic layer demonstrating a clear opposition to vaccine passport and lockdown policies. Another trend is that of criticism of vaccines that would not work or prevent transmissions, as well as a debate around the side effects of vaccines, which opposes « pro » and « anti » Covid vaccine. In addition, a sign of civil disobedience appears very clearly: « breaking lockdown rules ». Excluding tweets posted from the UK, the most robust trend is the rules imposed on tourists abroad, particularly in Thailand. It is followed by the questions of the effectiveness of vaccines, and the side effects, which are transversal to the two sub-corpora in English and all the countries examined.


The v3 of stop words, the list of negative terms, as well as an anonymized sample of this sub-corpus (4,371 lines) are available on the project’s Github page.
# # #

Read more

Blog 21: Politicians, experts, and journalists

Blog 20: For vaccination, against restrictions

Blog 19: Comparative Sentiment Analysis

Blog18: A health and political crisis

Blog 17: Anatomy of the “political/sanitary measures” sub-corpus (en)

Blog 16: Sentiment analysis of the ‘vaccination’ sub-corpus (en, part.2)

Blog 15: Comparative sentiment analysis of the ‘vaccination’ sub-corpus (en, part.1)

Blog 14: An adapted dictionary for the Covid crisis and sentiment analysis

Blog 13: Building a stop words list

Blog 12: Main Dictionaries for Sentiment Analysis

Blog 11: Statistical description of the corpus #RStats

Blog 10: Sentiment analysis or the assessment of subjectivity

Blog 9: Topic modeling of the ‘vaccination’ corpus (English)

Blog 8: Linguistic and quantitative processing of the ‘vaccination’ corpus (English, part.2)

Blog 7: Linguistic and quantitative processing of the ‘vaccination’ corpus (English, part.1)

Blog 6: Collecting the corpus and preparing the lexical analysis

Blog 5: The textclean package

Blog 4: Refining the queries

Blog 3: The rtweet package

Blog 2: Collecting the corpus

Blog 1: An open research project

The challenges of research on media use in times of crisis