Blog 2: Collecting the corpus

11 décembre 2021


Research Notebook

'Mixology' is an open research project, which aims to extract opinions in times of crisis, here from a corpus collected via the Twitter API, from December 12 to 31, 2021.

The data gathering from the social network Twitter is carried out via the rtweet package (#Rstats). The first requests followed the following code:

The geographical perimeter is calculated according to a radius of 1.367 kilometers from Juliandorp in the Netherlands (in the center) to Rozan in Poland. Despite the initial obstacles related to completing the requests (blocking at 10%, 1%, and 5% content retrieval), it was possible to identify 90 variables.


All these variables are not used in the constitution of the corpus, and this is also for ethical reasons:

  1. The users are not informed of this research.
  2. Users cannot exercise their right to withdraw.
  3. Respect for the privacy of users is fundamental.
  4. This research is less interested in who says it (even if it is also scientifically interesting) than what is said.

Also, the variables used are: text (the content of the tweet), lang (language), location, and country (location of the user, not necessarily always filled in), while user_id indicates the number of different users.

# # #

Read more

Blog 21: Politicians, experts, and journalists

Blog 20: For vaccination, against restrictions

Blog 19: Comparative Sentiment Analysis

Blog18: A health and political crisis

Blog 17: Anatomy of the “political/sanitary measures” sub-corpus (en)

Blog 16: Sentiment analysis of the ‘vaccination’ sub-corpus (en, part.2)

Blog 15: Comparative sentiment analysis of the ‘vaccination’ sub-corpus (en, part.1)

Blog 14: An adapted dictionary for the Covid crisis and sentiment analysis

Blog 13: Building a stop words list

Blog 12: Main Dictionaries for Sentiment Analysis

Blog 11: Statistical description of the corpus #RStats

Blog 10: Sentiment analysis or the assessment of subjectivity

Blog 9: Topic modeling of the ‘vaccination’ corpus (English)

Blog 8: Linguistic and quantitative processing of the ‘vaccination’ corpus (English, part.2)

Blog 7: Linguistic and quantitative processing of the ‘vaccination’ corpus (English, part.1)

Blog 6: Collecting the corpus and preparing the lexical analysis

Blog 5: The textclean package

Blog 4: Refining the queries

Blog 3: The rtweet package

Blog 2: Collecting the corpus

Blog 1: An open research project

The challenges of research on media use in times of crisis