Check out this new article by Elyas Meguellati, Assaad Zeghina, Shazia Sadiq, and Gianluca Demartini, exploring how large language models (LLMs) can make social media moderation smarter—not by churning out more synthetic data, but by cleaning and contextualising what we already have.
The paper focuses on complex content like propaganda, hate speech, and toxic memes—tasks where LLMs in “zero-shot” settings often fall short. Instead of relying on traditional synthetic data generation, the authors propose a semantic augmentation approach: prompting LLMs to clean noisy inputs and add short, meaningful explanations. This enriches the training data without increasing its volume.
They tested their method on three datasets. The main one was SemEval 2024’s persuasive meme dataset, a tough multi-label task involving 22 different propaganda techniques. They also validated on two widely used benchmarks: Google Jigsaw’s toxic comments and Facebook’s hateful memes. These datasets combine textual and multimodal content, pushing models to go beyond surface-level cues.
Here’s how the method works. First, image captions (generated using tools like BLIP and GIT) are cleaned by prompting LLMs like GPT-4o, LLaMA 3.1, or Sonnet 3.5. Invalid or overly sanitised outputs are filtered out. Then, for each meme, an LLM generates a short explanation highlighting rhetorical strategies or hateful triggers. These cleaned captions and explanations are then used to augment the model inputs.
The results? Adding explanations to meme text consistently improved classification performance across all datasets, especially when using GPT-4o. Combining text, cleaned captions, and explanations (T+C+E) produced the best results—beating zero-shot LLMs and traditional multimodal models like CLIP. However, using explanations alone didn’t perform well, showing that context needs to be layered, not isolated.
In domains with explicit hate (like racist memes or toxic comments), LLMs tend to censor crucial signals. To work around this, the team developed a trigger-based strategy where LLMs flag key offensive terms alongside explanations. This helped preserve critical signals while still complying with safety filters.
The study offers clear policy and research implications. First, semantic augmentation—done right—can reduce dependence on costly human annotations while maintaining accuracy. Second, LLMs are not ready to replace traditional classifiers in high-context tasks but can play a powerful supporting role. Finally, open-source LLMs like LLaMA offer a promising, lower-cost alternative for this kind of work.