Check out this recently published systematic review of hate speech detection approaches. The authors find that, despite numerous attempts by researchers, the current natural language processing (NLP) approach to detecting hate speech is still weak. The language in social media is developing rapidly, and the existing approaches are inconsistent, with many remaining challenges.
One critical issue that needs investigation is the prevalence of anti-female or misogynous language on social platforms, which has increased significantly. Existing misogyny detection methods in online environments are still in their infancy. Adopting a multi-label classification approach in misogyny detection could help overcome the growing violence rate. However, embedding biases in misogyny detection models remain to be seen.
Distinguishing profanity from hate speech is a challenging task, as the presence of hate words in a text does not always mean the text carries the meaning of hate speech. Tweets without obvious hate words are often more challenging to classify. A deep learning approach is becoming more common for text classification, and it has been suggested to address the growing prevalence of hate speech on a variety of social media platforms.
The relatively high disagreement between hate speech and human labelling revealed that this classification might become difficult for machines. Further examination is needed to seek the impact of imbalanced learning on intensive feature engineering and classification models. Character-level features could offer models that are more immune to attacks instead of word-level features.
BERT can be used as a dynamic technique for identifying hate speech. It is necessary to explore the impact of an augmented set of instances of each hate class on the detection performance. Some languages have significantly fewer resources, and the transfer of learning from language to language should be introduced to advance hate speech detection in language-based cases.
In conclusion, the article proposes that future works could consider exploring different machine-learning techniques and methods to characterize and track social media user-centered content. The introduction of linguistic features might be a promising path, and incorporating textual and image data together in the detection models could be a further improvement to this task.