The methodology involves three primary steps: (1) data collection from GDELT, (2) feature selection, and (3) prediction of terrorist incidents. GDELT provides a vast dataset, including events and themes associated with specific countries, such as counts of news articles and sentiment analysis. Using Random Forest models, the authors propose a novel feature selection approach to identify the most predictive variables for terrorism risk, focusing only on those with strong predictive power. This ensures that the ML and DL models trained on this data can deliver more accurate predictions.
Several predictive models were tested, including logistic regression, support vector machines (SVM), and gated recurrent units (GRU), with the models evaluated on four key metrics: ROC-AUC, F1 score, precision, and recall. The framework demonstrated notable improvements in predictive performance across all models when using the feature selection process. Specifically, the logistic regression model, trained with selected features, achieved the highest ROC-AUC and F1 scores, outperforming baseline models trained on all available data.
This framework has practical implications for counter-terrorism efforts, offering a data-driven approach to forecasting terrorism risk. By focusing on localised indicators of socio-political unrest, the framework provides a predictive risk assessment that could assist decision-makers in anticipating and mitigating potential threats. The authors suggest that future work could involve refining the framework with tools for filtering fake news and incorporating broader datasets, such as demographics, for more comprehensive risk assessments.