As already noted (see Blog 10, Blog 12 and Blog 13), the sentiment analysis results based on a lexicon approach can vary considerably depending on the lexicon used. In this research, two observations were confirmed: the more a dictionary is in adequacy with the domain, the more the results will be precise. Let us add that the number of terms included in the lexicon is also as significant because the language is not limited to a handful of words (that’s its richness too). In the comparative sentiment analysis for the entire « sanitary measures » sub-corpus (in English), the observations cannot lead to the assertion that one type of sentiment prevails more widely than another, regardless of the lexicon used. If we stick to the average scores obtained by sentiment type, the negative one totals 37.43%, against 44.75% for the positive one. This lack of significant difference confirms a trend towards the polarization of debates already observed concerning the Covid crisis, whether approached from a political or health angle. However, this must be refined according to the country of origin of the tweets but also to the topic addressed.
Below is the R function used to perform a comparative sentiment analysis, which includes seven English-language lexicons, including the two lexicons (Mixology Lexicon and Mixology Covid Lexicon) developed as part of this research (see Blog 14).