Blog 4: Refining the queries

Several tests were necessary to calibrate the queries, which seem to perform poorly when a # is used. A watch of the trends posted on Twitter also led to the addition of the keywords ARN and mRNA, since the corpus analysis will be carried out in French and English.

	corpus_vac <- search_tweets(
	q = "vaccination OR vaccine OR ARN OR mRNA OR bootser", retryonratelimit = TRUE, geocode = "52.897449,4.753000,850mi", type="mixed", include_rts = FALSE
	)

	corpus_pass2 <- search_tweets(
	q = "covid OR sanitary AND pass OR \"safe ticket\"", retryonratelimit = TRUE, geocode = "52.897449,4.753000,850mi", type="mixed", include_rts = FALSE
	)

	corpus_protest <- search_tweets(
	q = "anti-pass OR coronaprotest OR covid AND manifestation OR sanitary AND pass OR covid AND manifestation", retryonratelimit = TRUE, geocode = "52.897449,4.753000,850mi", type="mixed", include_rts = FALSE
	)

	corpus_gen <- search_tweets(
	q = "vaccination OR vaccine OR anti-pass OR coronaprotest OR MRNa OR ARN OR Pfizer OR Moderna", retryonratelimit = TRUE, type="mixed", include_rts = FALSE
	)

view raw queries.R hosted with ❤ by GitHub

Each retrieved dataset is first cleaned with Open Refine: column mergers are sometimes necessary because the “text” column is sometimes split into several columns (recording with comma separator). The three corpora with a defined geographical area present fewer quality problems than the general corpus, which targets all directions: big data does not necessarily mean good data.

Academic readings:

Deng, S., Sinha, A. P., & Zhao, H. (2017). Adapting sentiment lexicons to domain-specific social media texts. Decision Support Systems, 94, 65-76.
Mowlaei, M. E., Abadeh, M. S., & Keshavarz, H. (2020). Aspect-based sentiment analysis using adaptive aspect-based lexicons. Expert Systems with Applications, 148, 113234.

Blog 4: Refining the queries

Research Notebook

Read more

Blog 21: Politicians, experts, and journalists

Blog 20: For vaccination, against restrictions

Blog 19: Comparative Sentiment Analysis

Blog18: A health and political crisis

Blog 17: Anatomy of the “political/sanitary measures” sub-corpus (en)

Blog 16: Sentiment analysis of the ‘vaccination’ sub-corpus (en, part.2)

Blog 15: Comparative sentiment analysis of the ‘vaccination’ sub-corpus (en, part.1)

Blog 14: An adapted dictionary for the Covid crisis and sentiment analysis

Blog 13: Building a stop words list

Blog 12: Main Dictionaries for Sentiment Analysis

Blog 11: Statistical description of the corpus #RStats

Blog 10: Sentiment analysis or the assessment of subjectivity

Blog 9: Topic modeling of the ‘vaccination’ corpus (English)

Blog 8: Linguistic and quantitative processing of the ‘vaccination’ corpus (English, part.2)

Blog 7: Linguistic and quantitative processing of the ‘vaccination’ corpus (English, part.1)

Blog 6: Collecting the corpus and preparing the lexical analysis

Blog 5: The textclean package

Blog 4: Refining the queries

Blog 3: The rtweet package

Blog 2: Collecting the corpus

Blog 1: An open research project

The challenges of research on media use in times of crisis