Analysis Of The Use Of Nazief-Adriani Stemming And Porter Stemming In Covid-19 Twitter Sentiment Analysis With Term Frequency-Inverse Document Frequency Weighting Based On K-Nearest Neighbor Algorithm

Muhammad Fikri; Zaenal Abidin

doi:10.15294/rji.v2i2.74267

Muhammad Fikri Universitas Negeri Semarang
Zaenal Abidin Universitas Negeri Semarang

DOI: https://doi.org/10.15294/rji.v2i2.74267

Keywords: Text Mining, Nazief-Adriani, Porter, KNN, TF-IDF, Twitter

Abstract

Abstract.

This system was developed to determine the accuracy of sentiment analysis on Twitter regarding the COVID-19 issue using the Nazief-Adriani and Porter stemmers with TF-IDF weighting, along with a classification process using K-Nearest Neighbor (KNN) that resulted in a comparison of 48.24% for Nazief-Adriani and 48.24% for Porter.

Purpose: This research aims to determine the accuracy of the Nazief-Adriani and Porter stemmer algorithms in performing text preprocessing using a dataset from Indonesian-language Twitter. This research involves word weighting using TF-IDF and classification using the K-Nearest Neighbor (KNN) algorithm.

Methods/Study design/approach: The experimentation was conducted by applying the Nazief-Adriani and Porter stemmer algorithm methods, utilizing data sourced from Twitter related to COVID-19. Subsequently, the data underwent text preprocessing, stemming, TF-IDF weighting, accuracy testing of training and testing data using K-Nearest Neighbor (KNN) algorithm, and the accuracy of both stemmers was calculated employing a confusion matrix table.

Result/Findings: This study obtained reasonably accurate results in testing the Nazief-Adriani stemmer with an accuracy of 50.98%, applied to sentiment analysis of COVID-19-related Twitter data using the Indonesian language. As for the accuracy of the Porter stemmer, it achieved an accuracy rate of 48.24%.

Novelty/Originality/Value: Feature selection is crucial in stemmer accuracy testing. Therefore, in this study, feature selection is carried out using the Nazief-Adriani and Porter stemmers for testing purposes, and the accuracy data classification is conducted using the K-Nearest Neighbor (KNN) algorithm

References

[1] Y. Affandi and E. Sugiharti, “Sentiment Analysis of student on Online Lectured During Covid-19 Pandemic Using K-Means and Naïve Bayes Classifier,” Journal of Advances in Information Systems and Technology, vol. 5, no. 1, pp. 38–49, 2023.
[2] F. F. Rachman and S. Pramana, “Analisis sentimen pro dan kontra masyarakat Indonesia tentang vaksin COVID-19 pada media sosial Twitter,” Indonesian of Health Information Management Journal (INOHIM), vol. 8, no. 2, pp. 100–109, 2020.
[3] J. A. Septian, T. M. Fachrudin, and A. Nugroho, “Analisis Sentimen Pengguna Twitter Terhadap Polemik Persepakbolaan Indonesia Menggunakan Pembobotan TF-IDF dan K-Nearest Neighbor,” INSYST: Journal of Intelligent System and Computation, vol. 1, no. 1, pp. 43–49, 2019.
[4] N. Anggraini, E. S. N. Harahap, and T. B. Kurniawan, “Text Mining-Analisis Teks Terkait Isu Vaksinasi COVID-19 (Text Mining-Text Analysis Related to COVID-19 Vaccination Issues),” JURNAL IPTEKKOM (Jurnal Ilmu Pengetahuan & Teknologi Informasi), vol. 23, no. 2, pp. 141–153, 2021.
[5] A. T. J. Harjanta, “Preprocessing Text untuk Meminimalisir Kata yang Tidak Berarti dalam Proses Text Mining,” Jurnal Informatika Upgris, vol. 1, no. 1 Juni, 2015.
[6] S. Vijayarani, M. J. Ilamathi, and M. Nithya, “Preprocessing techniques for text mining-an overview,” International Journal of Computer Science & Communication Networks, vol. 5, no. 1, pp. 7–16, 2015.
[7] H. M. Keerthi Kumar and B. S. Harish, “Classification of short text using various preprocessing techniques: An empirical evaluation,” in Recent Findings in Intelligent Computing Techniques: Proceedings of the 5th ICACNI 2017, Volume 3, Springer, 2018, pp. 19–30.
[8] A. C. Herlingga, I. G. L. P. E. Prismana, D. R. Prehanto, and D. A. Dermawan, “Algoritma Stemming Nazief & Adriani Dengan Metode Cosine Similarity Untuk Chatbot Telegram Terintegrasi Dengan E-layanan,” Journal of Informatics and Computer Science (JINACS), vol. 2, no. 1, 2020.
[9] R. Rosnelly, “The Similarity of Essay Examination Results using Preprocessing Text Mining with Cosine Similarity and Nazief-Adriani Algorithms,” Turkish Journal of Computer and Mathematics Education (TURCOMAT), vol. 12, no. 3, pp. 1415–1422, 2021.
[10] B. V. Indriyono, E. Utami, and A. Sunyoto, “Pemanfaatan Algoritma Porter Stemmer Untuk Bahasa Indonesia Dalam Proses Klasifikasi Jenis Buku,” Jurnal Buana Informatika, vol. 6, no. 4, 2015.
[11] Y. Darnita, “Pengaruh Algoritma Stemming Porter Terhadap Kinerja Algoritma Rabin Karp Untuk Mendeteksi Plagiarisme Teks Bahasa Indonesia,” JTIS, Volume 3 Nomor 2, Juli 2020 , vol. 3, 2020.
[12] J. Ramos, “Using TF-IDF to Determine Word Relevance in Document Queries.”
[13] Y. M. Elgammal, M. A. Zahran, and M. M. Abdelsalam, “A new strategy for the early detection of alzheimer disease stages using multifractal geometry analysis based on K-Nearest Neighbor algorithm,” Sci Rep, vol. 12, no. 1, Dec. 2022, doi: 10.1038/s41598-022-26958-6.
[14] F. Arsyadani and A. Purwinarko, “Implementation of Synthetic Minority Oversampling Technique and Two-phase Mutation Grey Wolf Optimization on Early Diagnosis of Diabetes using K-Nearest Neighbors,” Recursive Journal of Informatics, vol. 1, no. 1, pp. 9–17, 2023.
[15] M. A. Rohman and D. Arifianto, “Penerapan Metode Euclidean Probality dan Confusion Matrix dalam Diagnosa Penyakit Koi,” Jurnal Smart Teknologi, vol. 2, no. 2, pp. 122–130, 2021.
[16] H. G. Lewis and M. Brown, “A generalized confusion matrix for assessing area estimates from remotely sensed data,” 2001. [Online]. Available: http://www.tandf.co.uk/journals

Analysis Of The Use Of Nazief-Adriani Stemming And Porter Stemming In Covid-19 Twitter Sentiment Analysis With Term Frequency-Inverse Document Frequency Weighting Based On K-Nearest Neighbor Algorithm

Abstract

References

Most read articles by the same author(s)