Several tests were necessary to calibrate the queries, which seem to perform poorly when a # is used. A watch of the trends posted on Twitter also led to the addition of the keywords ARN and mRNA, since the corpus analysis will be carried out in French and English.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
corpus_vac <- search_tweets( | |
q = "vaccination OR vaccine OR ARN OR mRNA OR bootser", retryonratelimit = TRUE, geocode = "52.897449,4.753000,850mi", type="mixed", include_rts = FALSE | |
) | |
corpus_pass2 <- search_tweets( | |
q = "covid OR sanitary AND pass OR \"safe ticket\"", retryonratelimit = TRUE, geocode = "52.897449,4.753000,850mi", type="mixed", include_rts = FALSE | |
) | |
corpus_protest <- search_tweets( | |
q = "anti-pass OR coronaprotest OR covid AND manifestation OR sanitary AND pass OR covid AND manifestation", retryonratelimit = TRUE, geocode = "52.897449,4.753000,850mi", type="mixed", include_rts = FALSE | |
) | |
corpus_gen <- search_tweets( | |
q = "vaccination OR vaccine OR anti-pass OR coronaprotest OR MRNa OR ARN OR Pfizer OR Moderna", retryonratelimit = TRUE, type="mixed", include_rts = FALSE | |
) |
Each retrieved dataset is first cleaned with Open Refine: column mergers are sometimes necessary because the “text” column is sometimes split into several columns (recording with comma separator). The three corpora with a defined geographical area present fewer quality problems than the general corpus, which targets all directions: big data does not necessarily mean good data.

Academic readings:
- Deng, S., Sinha, A. P., & Zhao, H. (2017). Adapting sentiment lexicons to domain-specific social media texts. Decision Support Systems, 94, 65-76.
- Mowlaei, M. E., Abadeh, M. S., & Keshavarz, H. (2020). Aspect-based sentiment analysis using adaptive aspect-based lexicons. Expert Systems with Applications, 148, 113234.