Implementation of Stacking Ensemble Classifier for Multi-class Classification of COVID-19 Vaccines Topics on Twitter

Rama Jayapermana(1), Aradea Aradea(2), Neng Ika Kurniati(3),


(1) Universitas Siliwangi
(2) Universitas Siliwangi
(3) Universitas Siliwangi

Abstract

Purpose: However, from the variety of uses of these algorithms, in general, accuracy problems are still a concern today, even accuracy problems related to multi-class classification still require further research.

Methods: This study proposes a stacking ensemble classifier method to produce better accuracy by combining Logistic Regression, Random Forest, and Support Vector Machine (SVM) algorithms as first-level learners and using Logistic Regression as a meta-learner for the multi-class classification of COVID-19 vaccine topics on Twitter.

Result: Based on the evaluation, the proposed Stacking Ensemble Classifier model shows 86% accuracy, 85% precision, 86% recall, and 85% f1-score.

Novelty: The novelty is produce better accuracy by combining Logistic Regression, Random Forest, and Support Vector Machine (SVM) algorithms as first-level learners and using Logistic Regression as a meta-learner.

Keywords

COVID-19 Vaccines, Ensemble Method, Multi-class Classification, Sentiment Analysis, Stacking Ensemble Classifier

Full Text:

PDF

References

CNN Indonesia, “Jokowi Terima Suntikan Dosis Pertama Vaksin Covid-19 Sinovac,” 2021. https://www.cnnindonesia.com/nasional/20210112211001-20-592885/jokowi-terima-suntikan-dosis-pertama-vaksin-covid-19-sinovac (accessed Mar. 18, 2021).

F. F. Rachman and S. Pramana, “Analisis Sentimen Pro dan Kontra Masyarakat Indonesia tentang Vaksin COVID-19 pada Media Sosial Twitter,” Heal. Inf. Manag. J. ISSN, vol. 8, no. 2, pp. 2655–9129, 2020.

S. Kemp, “Digital 2020: Indonesia,” 2020. https://datareportal.com/reports/digital-2020-indonesia (accessed Mar. 18, 2021).

S. Pramana, B. Yuniarto, S. Mariyah, I. Santoso, and R. Nooraeni, Data mining dengan R : konsep serta implementasi. Bogor: In Media, 2018.

M. R. Adrian, M. P. Putra, M. H. Rafialdy, and N. A. Rakhmawati, “Perbandingan Metode Klasifikasi Random Forest dan SVM Pada Analisis Sentimen PSBB,” J. Inform. Upgris, vol. 7, no. 1, Jun. 2021.

A. K. Santoso, A. Noviriandini, A. Kurniasih, B. D. Wicaksono, and A. Nuryanto, “Klasifikasi Persepsi Pengguna Twitter Terhadap Kasus Covid-19 Menggunakan Metode Logistic Regression,” J. Inform. dan Komput., vol. 5, no. 2, pp. 234–241, 2021.

P. Meel, P. Chawla, S. Jain, and U. Rai, “Web Text Content Credibility Analysis using Max Voting and Stacking Ensemble Classifiers,” in 2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA), Jul. 2020, pp. 157–161.

K. Sarkar, “A Stacked Ensemble Approach to Bengali Sentiment Analysis,” in Intelligent Human Computer Interaction, 2020, pp. 102–111.

N. Hayatin, G. I. Marthasari, and L. Nuarini, “Optimization of Sentiment Analysis for Indonesian Presidential Election using Naïve Bayes and Particle Swarm Optimization,” J. Online Inform., vol. 5, no. 1, 2020.

Y. Xia, K. Chen, and Y. Yang, “Multi-label classification with weighted classifier selection and stacked ensemble,” Inf. Sci. (Ny)., vol. 557, pp. 421–442, 2021.

A. D. Dubey, “Public Sentiment Analysis of COVID-19 Vaccination Drive in India,” SSRN Electron. J., 2021.

C. Zacharias and F. Poldi, “TWINT - Twitter Intelligence Tool,” 2020. https://github.com/twintproject/twint.

A. T. J. Harjanta, “Preprocessing Text untuk Meminimalisir Kata yang Tidak Berarti dalam Proses Text Mining,” Inform. UPGRIS, vol. 1, pp. 1–9, 2015.

N. A. Salsabila, Y. A. Winatmoko, A. A. Septiandri, and A. Jamal, “Colloquial Indonesian Lexicon,” in 2018 International Conference on Asian Language Processing (IALP), Nov. 2018, pp. 226–229.

NLTK Project, “Natural Language Toolkit,” 2021. https://www.nltk.org/ (accessed Jun. 25, 2021).

H. A. Robbani, “Sastrawi Python,” 2016. https://github.com/har07/PySastrawi (accessed Jun. 25, 2021).

I. Najiyah and I. Hariyanti, “Sentimen Analisis Covid-19 Dengan Metode Probabilistic Neural Network Dan Tf-Idf,” J. Responsif Ris. Sains dan Inform., vol. 3, no. 1, pp. 100–111, 2021.

F. Koto and G. Y. Rahmaningtyas, “Inset lexicon: Evaluation of a word list for Indonesian sentiment analysis in microblogs,” in 2017 International Conference on Asian Language Processing (IALP), Dec. 2017, pp. 391–394.

F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.

A. Rajaraman and J. D. Ullman, “Data Mining,” in Mining of Massive Datasets, Cambridge: Cambridge University Press, 2011, pp. 1–17.

C. H. Yutika and S. Al Faraby, “Analisis Sentimen Berbasis Aspek pada Review Female Daily Menggunakan TF-IDF dan Naïve Bayes,” J. Media Inform. Budidarma, vol. 5, pp. 422–430, 2021.

Refbacks

  • There are currently no refbacks.




Scientific Journal of Informatics (SJI)
p-ISSN 2407-7658 | e-ISSN 2460-0040
Published By Department of Computer Science Universitas Negeri Semarang
Website: https://journal.unnes.ac.id/nju/index.php/sji
Email: [email protected]

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.