Check out this new article by Rosen and Walther. It investigates how social interaction on X shapes subsequent hate messaging targeting Muslims and Jews in the month following the 7 October 2023 attack by Hamas and Israel’s military response in Gaza.
The paper’s main goal is to test a “social approval theory” of online hate: that is, to see whether hate-posts generate more extreme and more frequent hate messaging when they receive approving replies (i.e., replies that converge linguistically with the original message), and whether the timing of subsequent posts is influenced by the nature of replies and simple engagement signals (Likes, reposts). The study focuses on anti-Jewish and anti-Muslim hate messages following the 7 October 2023 crisis.
The authors sampled original hate posts and engagement data on X between 8 October and 4 November 2023. They identified users who posted messages containing any of 61 specific anti-Islamic or 22 anti-Semitic words from the Weaponized Word database. These keywords were used via the Brandwatch platform to retrieve original posts plus all replies, Likes and reposts. The final dataset comprised 17,013 original posts and replies by 7,137 users, with 10,718 replies (mean ≈1.68 replies per original post) and 65,331 Likes (mean ≈10.21 Likes).
Hatefulness of messages was scored using the HSD classifier by Vidgen et al. (2021), which estimates the degree to which a text contains hateful content (derogation, animosity, threats, dehumanisation). The scores ranged from .00013 to 1.0 (mean ≈.406).
To measure linguistic convergence/divergence of replies to original hate posts the authors used a Convergence-Entropy Measurement (CEM) technique: they derived word vectors using the RoBERTa-base transformer model, then calculated the Shannon entropy between word vectors from the original post and the replies to estimate how predictable replies were (i.e., how similar) in semantic content. Negative residuals from predicted CEM values indicated convergence (i.e., replies very similar to the original post).
For hypothesis testing they developed two stochastic linear regression models implemented in PyJAGS (via Google Colab). Model 1 predicted the hatefulness of a user’s next post (St+1) as a function of number of convergent replies (nc), divergent replies (nd), Likes (nl), reposts (nr), and the hatefulness of the current post (St). Model 2 predicted the time interval between posts (Δt) as a function of the same predictors.
The study found no significant differences between the anti-Jewish and anti-Muslim subsamples, so analyses pooled both. Key findings:
Policy implications and implications for research
From a policy perspective this research suggests that moderation strategies focusing purely on simple engagement metrics (Likes, repost counts) may miss important dynamics: verbal replies that align with hateful content appear to be stronger drivers of escalation in both hate severity and posting frequency. Platforms may need to monitor and intervene not just when high volumes of engagement occur but when conversational replies show semantic alignment with hate posts. The divergence effect also suggests that fostering replies which do not converge might reduce subsequent hate posting, so designing features or moderation practices that encourage disconfirmation or challenge might help.
For research the paper underscores the value of analysing the content of replies, not just counts of engagement. It also highlights the utility of vector-based semantic measures (like CEM) to operationalise convergence/divergence in replies. Future work could expand to other protected groups, platforms, time frames, and examine offline spill-over effects from online convergence processes. It also raises questions about the role of user status (influencers vs general users), and how reply content interacts with platform moderation or algorithmic amplification.