Comparative Analysis of BERT, RoBERTa and ALBERT Model Performance with Text Data Augmentation in Multilabel Toxic Comment Classification

Authors

  • Annisa Kunarji Sari Universitas Negeri Semarang Author
  • Zaenal Abidin Universitas Negeri Semarang Author

DOI:

https://doi.org/10.15294/rji.v4i1.25436

Keywords:

deep learning, natural language processing, toxic comment, BERT, RoBERTa, ALBERT

Abstract

Abstract. Toxic comments on social media pose serious challenges to online safety and moderation efforts. These comments are often multilabel in nature and suffer from class imbalance, making them difficult to classify accurately using standard methods.

Purpose: This study investigates the use of three transformer-based language models, BERT, RoBERTa, and ALBERT, for multilabel toxic comment classification through fine-tuning. The main objective is to address class imbalance and evaluate model performance after data augmentation.

Methods/Study design/approach: The Toxic Comment Classification dataset, consisting of six overlapping labels, was used in this study. A data augmentation strategy was applied using synonym replacement techniques from WordNet and easy data augmentation (EDA) to increase the representation of minority classes. After balancing the data, the dataset was split into training, validation, and testing sets. Each transformer model was fine-tuned using the Hugging Face Transformers library with the same hyperparameter settings. Model evaluation was conducted using accuracy, precision, recall, and both micro and macro F1-scores.

Result/Findings: The RoBERTa model achieved the best performance, with 86.73% accuracy and a micro F1-score of 92.35%, outperforming BERT and ALBERT. The macro F1-score also improved significantly compared to previous studies using imbalanced datasets, indicating better recognition of minority classes such as threat and identity hate. 

Novelty/Originality/Value: This study highlights the effectiveness of combining text data augmentation with transformer-based models in handling multilabel classification tasks involving imbalanced data. The use of simple augmentation methods notably improves performance and fairness across labels, contributing to the development of more robust toxic comment detection systems.

Downloads

Published

2026-03-31

Article ID

25436

Issue

Section

Articles

How to Cite

Comparative Analysis of BERT, RoBERTa and ALBERT Model Performance with Text Data Augmentation in Multilabel Toxic Comment Classification. (2026). Recursive Journal of Informatics, 4(1), 40-48. https://doi.org/10.15294/rji.v4i1.25436