Characterizing polarization in online vaccine discourse—A large-scale study

Abstract Vaccine hesitancy is currently recognized by the WHO as a major threat to global health. Recently, especially during the COVID-19 pandemic, there has been a growing interest in the role of social media in the propagation of false information and fringe narratives regarding vaccination. Using a sample of approximately 60 billion tweets, we conduct a large-scale analysis of the vaccine discourse on Twitter. We use methods from deep learning and transfer learning to estimate the vaccine sentiments expressed in tweets, then categorize individual-level user attitude towards vaccines. Drawing on an interaction graph representing mutual interactions between users, we analyze the interplay between vaccine stances, interaction network, and the information sources shared by users in vaccine-related contexts. We find that strongly anti-vaccine users frequently share content from sources of a commercial nature; typically sources which sell alternative health products for profit. An interesting aspect of this finding is that concerns regarding commercial conflicts of interests are often cited as one of the major factors in vaccine hesitancy. Further, we show that the debate is highly polarized, in the sense that users with similar stances on vaccination interact preferentially with one another. Extending this insight, we provide evidence of an epistemic echo chamber effect, where users are exposed to highly dissimilar sources of vaccine information, depending the vaccination stance of their contacts. Our findings highlight the importance of understanding and addressing vaccine mis- and dis-information in the context in which they are disseminated in social networks.

Citation: Mønsted B, Lehmann S (2022) Characterizing polarization in online vaccine discourse—A large-scale study. PLoS ONE 17(2): e0263746. https://doi.org/10.1371/journal.pone.0263746 Editor: Christopher M. Danforth, University of Vermont, UNITED STATES Received: August 23, 2021; Accepted: January 23, 2022; Published: February 9, 2022 Copyright: © 2022 Mønsted, Lehmann. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: Data cannot be shared publicly, as this was a condition for IRB approval. For inquiries regarding data access, contact compute@compute.dtu.dk. Funding: This study was funded entirely by the Danish Council for Independent Research (Project: Microdynamics of Social Interactions, grant number 4184-00556a). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction Vaccine hesitancy, defined as the reluctance or refusal to vaccinate [1], is a growing threat to global health, and is believed to be driven mainly by the ‘three C’s’: Confidence, Complacency, and Convenience [2]. Social media platforms may potentially influence vaccine hesitancy through the former two, for example by enabling easy and wide-spread sharing of content that exaggerates the risks of vaccination and/or understating the risk of vaccine-preventable diseases [3]. Vaccine hesitancy exist on a continuous spectrum [4], where the extreme positions of rejecting or accepting all vaccines tends to be overrepresented in online settings [5]. While vaccine hesitancy is a nuanced and context-dependent phenomenon, some general factors influencing hesitancy have been identified in the literature [6]. Key among those factors are the availability of information regarding vaccines [4, 7], the accuracy of beliefs about the risks and benefits of vaccines and vaccine-preventable diseases [8, 9], social norms regarding vaccination, i.e. whether or not vaccinating is perceived as a ‘normal’ thing to do [5, 10], and trust in health authorities and/or the pharmaceutical industry, particularly concerns regarding commercial conflicts of interest [4, 7, 8]. The list above is by no means exhaustive, rather a few central factors which are well-described in the literature and of particular relevance for this paper. These factors are strongly connected to the topic of social networks and online misinformation: Anti-vaccine messages on Twitter typically aim to alter the reader’s perception of risks and benefits regarding vaccination, often drawing on conspiracy theories [9]. In addition to an inaccurate risk picture, anti-vaccine content on Twitter, especially during the COVID-19 pandemic, has focused on commercial interests in the pharmaceutical sector [11], and often rely on conspiracy theories in doing so [12]. The detrimental effects of reduced vaccine uptakes on public health are well described in the literature [13–18]. Somewhat paradoxically, vaccination rates have declined in part due to the success of vaccines in preventing disease, leading to complacency [19, 20]. However, online misinformation has also been linked to decreasing vaccine uptake [21–24], and outbreaks of vaccine-preventable diseases have been observed in areas where anti-vaccine activists have organized disinformation campaigns [25]. In this sense, the growing amount of online misinformation [26, 27] can be characterized as a threat to public health [28]. Countering medical misinformation in online systems is no easy task. While the scientific literature is rife with evidence which disproves the narratives outlined above [29, 30], individuals at the ‘rejection’ extreme of the vaccine hesitancy continuum often have a strong sense of identity regarding their stance on vaccines [31]. Individuals tend to reinterpret or disregard information if it conflicts with a stance that they strongly identify [32], an effect which has been demonstrated in numerous contexts [33] including vaccination [34]. The challenges of countering misinformation is compounded by the fact that strongly anti-vaccine individuals often form tightly knit communities in large social networks, such as Twitter [35, 36] and Facebook [37]. In such environments, evidence challenging in-group beliefs is dismissed as untrustworthy [8], and often ends up only reinforcing said beliefs [37, 38]. Therefore, study of the interplay between vaccination attitudes and vaccine-related online (mis)information is essential to inform policy [39–41], also at the community level [42]. We utilize two large datasets to study this interplay. The first (Dataset 1) is a large, random sample consisting of of approximately 60 billion tweets. The second (Dataset 2) consists of 6.75 million tweets obtained via Twitter’s search API for tweets containing vaccination-related terms. Both datasets are discussed in detail in the Methods section. Using these datasets, we construct a large network which captures interactions on Twitter, and use machine learning methods to identify Twitter profiles with vaccine stances at the ‘rejection’ and ‘acceptance’ extremes of the hesitancy continuum, known colloquially as anti- and pro-vaxxers, respectively. Based on the data and methods outlined above (which are elaborated upon in the materials and methods section), the remainder of the paper presents a number of analyses on the interplay between strong vaccination stances, social network structure, and online information.

Polarization and epistemic echo chambers Using Dataset 1, we construct a large network representing observed mutual interactions between profiles on Twitter. In this network, profiles are linked if there exists a reciprocal @-mention, or a reciprocal retweet, within a 3-month time window. We refer to the interaction network as the MMR (mutual mention/retweet) network for this reason. Additional details regarding the MMR network, as well as some analyses of the network structure and temporal stability, are provided in the materials and methods section. Links are constructed in this fashion for consecutive time windows, where the number of such 3-month time windows in which two users have interacted can then be viewed as the weight of the link. In addition to thresholding on this weight, which is illustrated in Fig 5, the graph may be thresholded according to the number of vaccine-related tweets from each user, such that only nodes corresponding to users who posted at least a desired number of tweets with vaccination-related keywords are retained in the graph. We initially consider a version of the MMR graph constructed using very strict criteria for node and link inclusion, then subsequently investigate the effects of easing those criteria. We first include, in each time window, only nodes that are assigned a pro- or anti-vaccine stance. Further, we only include links between nodes that interacted in several windows. The strictness of these criteria retains only nodes which consistently express strong vaccine-sentiments, in interact repeatedly nodes that do so as well. As a consequence of the strict criteria, the resulting graph contains only 4894 nodes, of which 3359 (69%) nodes form a giant connected component. The remaining connected components are loosely scattered, have fewer than 30 nodes in each, and contain only 5.6% anti-vaccine users. 395 nodes (11.76%) in the giant connected component represent antivaxx profiles. A representation of the graph using a force layout algorithm [45] is shown in Fig 4. The interplay between the stances of users and their neighborhoods, as well as user connectivity and activity, is visualized in S4 Appendix. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 4. Representation of the repeated mutual interaction graph from 2013–2016. Profiles frequently interact with others who share their own stance, and antivaxx profiles are localized in relatively few, tightly nit clusters. Profiles with and anti- and provaccine stances are illustrated in red and blue, respectively. Only the giant conected component of the interaction graph is depicted. https://doi.org/10.1371/journal.pone.0263746.g004 The graph is heavily stratified with regards to vaccination stance. The assortativity coefficient (Pearson correlation between stance and connectedness in the graph) is r = 0.813. The analyses above, however, dependend on discretely partitioning users into two distinct categories. Considering instead user stance as a continuous variable—given by e.g. the average anti-/pro-vaccine sentiment expressed in their tweets—we obtain similar findings. Discarding users with fewer than 5 vaccination related tweets, we rebuilt the interaction graph while varying the minimum number of 3-month time windows in which users must have interacted before being connected in the graph. Results on the interplay of (continuous) stance and (repeated) connectivity are summarized in Fig 5. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 5. Interplay between average vaccination sentiment and user interactions. a: Users tend to disproportionally interact with users of similar stance, both in cases where users only interact during a single, and multiple, three-month time windows. Specifically, we compute for all users the average probabilities of that user’s tweets expressing pro/anti-vaccine sentiment. Comparing these averages for all nodes and their neighbors, we find a positive correlation between the average pro- and antivaccine sentiments. Similarly, the average pro-vaccine sentiment of nodes exhibits a negative correlation with the anti-vaccine sentiments of their neighbors. The number of nodes in the interaction network decreases exponentially as the minimum number of time windows is increased. The negative correlation between pro- and antivaxx probabilities of neighbors tends slightly toward zero as the threshold for repeated interaction grows. b: As increasingly repeated interactions are considered, users in the interactions graph are increasingly well connected. However, the number of vaccination-related tweets posted by users decreases for interactions occurring very frequently, indicating that at this point, the graph likely includes users who are highly active on Twitter, yet do not discuss vaccination-related topics very often. Error bars on Pearson correlations represent one standard deviation of the Fisher-transformed variables z, i.e. the bounds on the error bar on a correlation r of n data points, is given by tanh(z±σ z ), where z = arctanh(r) and https://doi.org/10.1371/journal.pone.0263746.g005 Correlations between the mean vaccine sentiment expressed in neighboring users’ tweets were roughly the same independently of how frequently the users interacted, as shown in Fig 5a, although the number of users in the interaction graph decreases quickly when using strict inclusion criteria (additional analyses in the methods section). Similarly, we observe an anti-correlation between pro- and anti-vaxx probabilities of neighbors, which seems to diminish somewhat when considering repeated interactions. However, this decrease appears to be driven by a few nodes which have many connections, yet do not frequently discuss vaccines, as shown in Fig 5b. The finding that users interact disproportionally with other users sharing their stance, aligns with previous findings that long time anti-vaccine users of social media tend to form tightly knit clusters which exhibit a high degree of in-group solidarity [46], and in which misinformation may thrive unquestioned [47]. To qualify the latter, we turn again to the URLs most frequently shared by users discussing vaccines, shown in Fig 2, we probe regions in the MMR network around individuals of various stances and assess whether the URLs shared in those regions differ more or less from a normal distribution depending on stance. Considering only the approximately 32 thousand users with at least 5 vaccination-related tweets, we group users based on the mean probability of antivaxx (p av ) of their tweets. We computed the deciles of p av for all tweets and grouped users based on which deciles their mean score fell between, i.e. one bin for mean p av values below the first decile, one for values between the first and second deciles, and so forth. For each such group of users, we observed the regions surrounding them in the MMR network, and extracted the URLs shared by all users who were located in that region, and who had shared at least 5 URLs and posted at least 5 vaccine-related tweets. We then computed the frequency for each of the top URLs for the regions (locally), and observed the difference from the overall (global) frequency distribution. The frequency distributions may be interpreted as maximum likelihood estimates of the probability distributions over links shared in the regions around specific users, and globally. Therefore, we quantify the difference between such distributions using the Jensen-Shannon (JS) distance [48]—an information-theoretical measure of distance between probability distributions which take values in the range between zero (no overlap between distributions) and one (identical distributions). Fig 6 shows the JS-distances between overall link frequencies, and links shared by users adjacent to users with a given mean p av . The figure shows that Twitter profiles that engage in online vaccine discourse are not only disproportionately connected to other users who share their stance, but that users with stronger anti-vaccine stances are also exposed to increasingly atypical sources of information. This is indicative of ‘epistemic echo chambers’ in online vaccine discourse in the sense that users, depending on their stance, are exposed not only to a skewed distribution of stances from other users (i.e. network homophily), but also to information sources that are highly dissimilar to those typically partaking in the overall discussion. Although we do not attempt to explain how these echo chambers arise in the first place, we can point to some mechanisms described in the literature which are consistent with our results. First, it is a well-known result in sociology and network science that links tend to form between nodes that share similar attributes [49, 50]. Second, some studies indicate that people are highly selective in sharing information that aligns well with their convictions [51], which in term can cause polarization by opinion reinforcement [52], and by users cutting ties to avoid exposure to information causing cognitive dissonance [53]. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 6. Profiles that express fringe vaccine sentiments are also exposed via their interaction networks to sources of information that are highly dissimilar to link frequencies in the overall discussion. Here we consider users who posted a minimum of 5 tweets containing vaccine-related keywords, and partition them into deciles based on their tweets’ mean probability of expressing anti- and pro-vaccine sentiment. For each such decile and vaccine stance, the plot shows the Jensen-Shannon distance between the frequencies at which links from the domains shown in Fig 2 are shared in the vicinity of users in that decile, and in the interaction network overall. The error bars are computed using a bootstrap technique in which users in the target stance-decile combination where randomly sampled with replacement and the JS-distance to the overall distribution calculated. The error bars depict the standard deviations of each 1000 such samples. https://doi.org/10.1371/journal.pone.0263746.g006

Discussion In summary, our findings paint a picture of the vaccine discourse on Twitter as highly polarized, where users who express similar sentiments regarding vaccinations are more likely to interact with one another, and tend to share contents from similar sources. Focusing on users whose vaccination stances are the positive and negative extremes of the spectrum, we observe relatively disjoint ‘epistemic echo chambers’ which imply that members of the two groups of users rarely interact, and in which users experience highly dissimilar ‘information landscapes’ depending on their stance. Finally, we find that strongly anti-vaccine users much more frequently share information from actors with a vested commercial interest in promoting medical misinformation. One implication of these findings is that online (medical) misinformation may present an even greater problem than previously thought, because beliefs and behaviors in tightly knit, internally homogeneous communities are more resilient [37, 54], and provide fertile ground for fringe narratives [55, 56], while mainstream information is attenuated [57]. Furthermore, such polarization of communities may become self-perpetuating, because individuals avoid those not sharing their views [58], or because exposure to mainstream information might further entrench fringe viewpoints [59]. A further problem exacerbated by the structure of the debate is that, parents often base their vaccination decisions on their impression of what other parents do [60], so vaccine hesitant parents who encounter a strongly anti-vaccine community might get the impression that not vaccinating is the norm and opt not to. This risk is compounded by the fact that anti-vaccine communities are highly effective at reaching out to undecided individuals [61], which highlights the need to reach undecided individuals with accurate information to overcome vaccine hesitancy [62]. In summary, the characteristics of the online vaccine discourse may contribute to increasing vaccine hesitancy, possibly into the extreme of vaccine denial. A brief discussion of measures that have proven successful in decreasing hesitancy and increasing vaccine uptake therefore seems in order. One such measure is encouraging direct communication between hesitant individuals and healthcare professionals. Parents who interact with health care professionals are significantly more likely to vaccinate their children [63, 64], whereas parents of underimmunized children are significantly more likely to obtain medical information online [23]. Another measure is implementing policies which incentivize vaccination or discourages rejection [65–67]. In terms of digital interventions, our findings highlight the need for measures based—not just on whether content is true or false—but on a more nuanced understanding of the interplay between vaccination attitudes, social network structure, and information sources, including actors with a vested interest in promoting false beliefs. With disinformation campaigns aiming to erode consensus [24, 68], fact-checking at the level of individual stories being shared online might need to be complemented by an understanding of the complex interplay between community structure and information content. Future work based on the findings presented here could investigate e.g. the text content of the communication between users with highly similar and dissimilar stances regarding vaccination, as well as interactions between text topics and community structure.

Acknowledgments The authors wish to thank Alan Mislove for his invaluable help with collection and analysis of Twitter data, and Bjarke Felbo for sharing his wisdom of machine learning.