Identifying Relevant Messages from Citizens in a Social Media Platform for Natural Disasters in Indonesia Using Histogram Gradient Boosting and Self-Training Classifier

Authors

DOI:

https://doi.org/10.15294/sji.v11i2.2287

Keywords:

Text relevance, Histogram gradient boosting, Social media analysis, Natural disaster, Semi-supervised approach

Abstract

Purpose: This research aims to develop a classification model using histogram-based gradient boosting to identify relevant contextual tweets about disasters. This model can then be used for subsequent data cleaning stages.

Methods: This study uses a semi-supervised approach to develop a classification model using histogram-based gradient boosting. The model is trained to identify and remove irrelevant tweets that are related to disasters and gathered from Twitter. Optimization techniques, such as the AdaBoost classifier, calibrated classifier, and self-training classifier, are used to enhance the model's performance. The goal is to accurately recognize and categorize relevant tweets for additional data analysis and decision-making.

Result: The classification model that has been developed has achieved a high F1-score of 93.07%, which indicates its effectiveness in filtering disaster-related tweets that are relevant. This highlights the potential of the model to enable more precise aid distribution and faster decision-making in disaster response efforts. The successful implementation of the model also demonstrates its usefulness in utilizing social media data to enhance disaster management practices.

Novelty: This research contributes to the analysis of social media through machine learning algorithms. By utilizing social media, specifically Twitter, as a valuable resource for disaster response efforts, this study tackles challenges related to data collection and analysis in disaster management. The classification of relevant tweets into different types of natural disasters offers opportunities to enhance stakeholder decision-making processes in disaster scenarios.

Downloads

Article ID

2287

Published

16-05-2024

Issue

Section

Articles

How to Cite

Identifying Relevant Messages from Citizens in a Social Media Platform for Natural Disasters in Indonesia Using Histogram Gradient Boosting and Self-Training Classifier. (2024). Scientific Journal of Informatics, 11(2), 315-324. https://doi.org/10.15294/sji.v11i2.2287