Comparison of Naive Bayes Classifier and K-Nearest Neighbor Algorithms with Information Gain and Adaptive Boosting for Sentiment Analysis of Spotify App Reviews

  • Meidika Bagus Saputro Universitas Negeri Semarang
  • Alamsyah Alamsyah Universitas Negeri Semarang
Keywords: sentiment analysis, spotify, naive Bayes classifier, k-nearest neighbor, information gain, adaptive boosting

Abstract

Abstract. At this time, the development of technology are increase rapidly. One of the issue that appear with advance technology is data volume in the world has increase too. With the large data volumes that exist in the world it can be used to some purpose in many field. Entertainment is one of the field that have many interest from user in this world. Spotify is the example of entertainment apps that provided by Google Play Store to give online music streams to their users. Because that apps is provided by Google Play Store, many reviews of the user about the apps it can be classified to know the positive, negative, or neutral. One way to classified the review of user is make sentiment analysis. In this paper, to classify the review we use naïve Bayes classifier and k-nearest neighbors that will be compared with adding Information gain as feature selection and adaptive boosting as boosting algorithm of each classification algorithm that we used. The result of classification using naïve Bayes classifier with adding Information gain and adaptive boosting is 87.28% and k-nearest neighbor with adding information gain and adaptive boosting can perform accuracy of 80.35%.

Purpose: Knowing the result each of accuracy from the naïve Bayes classifier and k-nearest neighbor algorithm with adding information gain and adaptive boosting that we used and know how to doing the sentiment analysis step by step with the methods that chosen in this study.

Methods/Study design/approach: This study applied data preprocessing, lexicon based labelling with TextBlob, Normalization, Word Vectorization using TF-IDF, and classification with naïve Bayes classifier and k-nearest neighbor, information gain as feature selection, and adaptive boosting as boosting algorithm to boost the accuracy of classification result.

Result/Findings: The accuracy of naïve Bayes classifier with adding information gain and adaptive boosting is 87.28%. Meanwhile, by k-nearest neighbor with adding information gain and adaptive boosting reach the accuracy of 80.35%. This result obtained by using 60.000 dataset with data splitting 80% as data training and 20% as data testing.

Novelty/Originality/Value: Implementing information gain as feature selection and adaptive boosting as boosting algorithm to naïve Bayes classifier is prove that it can be increase the accuracy of classification, but not same when implementing in k-nearest neighbor. So, for the future research can applied another classification algorithm or feature selection to get better result.

References

[1] D. S. Kusumo, M. A. Bijaksana, and D. Darmantoro, “Data mining dengan algoritma apriori pada Rdbms Oracle,” TEKTRIKA - J. Penelit. dan Pengemb. Telekomun. Kendali, Komputer, Elektr. dan Elektron., vol. 8, no. 1, pp. 1–5, 2016, doi: 10.25124/tektrika.v8i1.215.
[2] V. Moertini, “Data mining sebagai solusi bisnis, vol. 7, no. 1, pp. 44–56, 2017, [Online]. Available: https://eric.ed.gov/?id=ED539082%0Ahttp://www.win.tue.nl/~mpechen/research/edu.html.
[3] A. E. Pramadhani and T. Setiadi, “Penerapan data mining untuk klasifikasi penyakit ISPA dengan algoritma desicion tree,” J. Sarj. Tek. Inform. e-ISSN 2338-5197, vol. 2, no. 1, pp. 831–839, 2014.
[4] D. Pangastuti, “Pengaruh musik dangdut terhadap perkembangan bahasa anak di TK Dharma Wanita Madiun 2014 / 2015,” no. November, pp. 222–224, 2015.
[5] J. F. Andry and C. Tjee, “Analisis minat mahasiswa mendengarkan aplikasi musik berbayar dan unduhan musik gratis, Analysis of ttudent interest in listening to paid music applications and free music downloads,” vol. 2, no. 2, pp. 9–15, 2019.
[6] G. A. Buntoro, “Analisis sentimen calon gubernur DKI Jakarta 2017 di Twitter,” Integer J., vol. 2, no. 1, pp. 32–41, 2017, [Online]. Available: https://t.co/jrvaMsgBdH.
[7] B. Liu, “Sentiment analysis: A multifaceted problem,” IEEE Intell. Syst., vol. 25, no. 3, pp. 76–80, 2010, doi: 10.1109/MIS.2010.75.
[8] R. Puspita and A. Widodo, “Perbandingan metode KNN, decision tree, dan naïve Bayes terhadap analisis sentimen pengguna layanan BPJS,” J. Inform. Univ. Pamulang, vol. 5, no. 4, p. 646, 2021, doi: 10.32493/informatika.v5i4.7622.
[9] S. Surohman, S. Aji, R. Rousyati, and F. F. Wati, “Analisa sentimen terhadap review Fintech dengan metode naive bayes classifier dan k-nearest neighbor,” EVOLUSI J. Sains dan Manaj., vol. 8, no. 1, pp. 93–105, 2020, doi: 10.31294/evolusi.v8i1.7535.
[10] F. S. Jumeilah, “Penerapan support vector machine (SVM) untuk pengkategorian penelitian,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 1, no. 1, pp. 19–25, 2017, doi: 10.29207/resti.v1i1.11.
[11] E. Miana, A. Ernamia, A. Herliana, A. R. Sanjaya, and A. R. Sanjaya, “Analisis sentimen kuliah daring dengan algoritma naïve bayes dan k-nearest neighbor” vol. 4, no. 1, pp. 70–80, 2022.
[12] F. R. Irawan et al., “Analisis sentimen terhadap pengguna Gojek menggunakan metode k-nearest neighbors, Sentiment analysis of Gojek users using k-nearest neighbor,” vol. 5, no. 1, pp. 62–68, 2022, doi: 10.33387/jiko.
[13] A. Rachmat C and Y. Lukito, “Klasifikasi sentimen komentar politik dari Facebook Page menggunakan naive Bayes,” J. Inform. dan Sist. Inf. Univ. Ciputra, vol. 02, no. 02, pp. 26–34, 2016.
[14] S. S. Salim and J. Mayary, “Analisis sentimen pengguna Twitter terhadap dompet elektronik dengan metode lexicon based dan k–nearest neighbor,” J. Ilm. Inform. Komput., vol. 25, no. 1, pp. 1–17, 2020, doi: 10.35760/ik.2020.v25i1.2411.
[15] S. Biswas, K. Young, and J. Griffith, “A comparison of automatic labelling approaches for sentiment analysis,” pp. 312–319, 2022, doi: 10.5220/0011265900003269.
[16] M. Nurjannah and I. Fitri Astuti, “Penerapan algoritma term frequency-inverse document frequency (TF-IDF) untuk text mining mahasiswa S1 program studi Ilmu Komputer FMIPA Universitas Mulawarman dosen program studi Ilmu Komputer FMIPA Universitas Mulawarman,” J. Inform. Mulawarman, vol. 8, no. 3, pp. 110–113, 2013.
[17] M. R. Maulana and M. A. Al Karomi, “Information gain untuk mengetahui pengaruh atribut,” J. Litbang Kota Pekalongan, vol. 9, pp. 113–123, 2015.
[18] G. I. Webb, “Encyclopedia of machine learning and data science,” Encycl. Mach. Learn. Data Sci., no. April, 2020, doi: 10.1007/978-1-4899-7502-7.
[19] I. Kurniawati and H. F. Pardede, “Hybrid method of information gain and particle swarm optimization for selection of features of SVM-based sentiment analysis,” 2018 Int. Conf. Inf. Technol. Syst. Innov. ICITSI 2018 - Proc., pp. 1–5, 2018, doi: 10.1109/ICITSI.2018.8695953.
[20] A. S. Rahayu and A. Fauzi, “Komparasi algoritma naïve Bayes dan support vector machine ( SVM ) pada analisis sentimen Spotify,” vol. 4, pp. 349–354, 2022, doi: 10.30865/json.v4i2.5398.
Published
2024-03-31
How to Cite
Saputro, M., & Alamsyah, A. (2024). Comparison of Naive Bayes Classifier and K-Nearest Neighbor Algorithms with Information Gain and Adaptive Boosting for Sentiment Analysis of Spotify App Reviews. Recursive Journal of Informatics, 2(1), 37-44. https://doi.org/10.15294/rji.v2i1.68551
Section
Articles

Most read articles by the same author(s)