Research Shows Why Platforms Struggle to Curb Online Toxicity

Check out this article by Beknazar-Yuzbashev, McCrosky, Jiménez-Durán, and Stalinski: Toxic Content and User Engagement on Social Media. The study takes a close look at whether toxic content (like hate speech or rude comments) actually drives more engagement on social media platforms—and what happens if you hide that content.

The authors set out to answer a fundamental but under-explored question: does exposure to toxic content increase how much time people spend on platforms like Facebook, YouTube, and Twitter (now X)? And if so, what are the trade-offs between platform engagement and user wellbeing?

To get credible evidence, the researchers designed a browser-based field experiment. They recruited 742 U.S.-based social media users and randomly assigned half to a treatment group where a custom-built browser extension automatically hid toxic text content across Facebook, Twitter, and YouTube for six weeks. The other half saw their feeds as usual.

“Toxic” content was defined using machine learning models (specifically Unitary’s Detoxify) to flag posts with a high likelihood of being considered rude, disrespectful, or discouraging. Anything scoring above 0.3 on a 0-1 scale was hidden in real-time, often within milliseconds.

The extension tracked over 11 million pieces of content and 30,000 hours of social media activity. Users’ engagement—time spent, posts viewed, ads seen, and even toxicity of their own comments—was monitored throughout the experiment.

Main findings of the paper are:

  • Less toxicity = less engagement. Hiding toxic content led to a drop in time spent (−9.2% on Facebook, −6.8% on YouTube), fewer sessions, fewer ad impressions (−27% on Facebook), and reduced content consumption (−23% on Facebook).
  • Toxicity is contagious. Users exposed to less toxic content wrote less toxic content themselves. There was a 25–30% reduction in the toxicity of posts and comments authored by treated users.
  • Users clicked less. The number of post clicks and ad clicks also fell in the treatment group, showing that toxic content grabs attention.
  • No clear wellbeing benefit. A follow-up survey showed that users exposed to less toxicity were not significantly happier or more willing to keep using the tool, even though they were engaging less.

This study shows a real trade-off between engagement and moderation. Platforms that reduce toxicity may see less ad revenue, which gives them fewer incentives to act—even if users would benefit in the long run. Importantly, the findings support the idea of “freedom of speech, not freedom of reach”: platforms can keep content online but limit its visibility without deleting it.

From a research perspective, the study is methodologically strong and scalable. Future work could test platform-side interventions, investigate long-term impacts, or explore moderation in non-English contexts.