Sentiment Analysis of Visitor Reviews on Baturaden Tourist Attraction Using Machine Learning Methods

Mahazam Afrad; Dany Candra Febrianto; Sena Wijayanto; M. Yoka  Fathoni

doi:10.15294/edukom.v11i1.10561

Authors

Mahazam Afrad Institut Teknologi Telkom Purwokerto Author
Dany Candra Febrianto Institut Teknologi Telkom Purwokerto Author
Sena Wijayanto Institut Teknologi Telkom Purwokerto Author
M. Yoka Fathoni Institut Teknologi Telkom Purwokerto Author

DOI:

https://doi.org/10.15294/edukom.v11i1.10561

Keywords:

K-Nearest Neighbors, Naive Bayes, Random Forest, Sentiment Analysis, SVM, Tourist Attraction, Visitor Reviews

Abstract

This study evaluates the performance of four machine learning models: Support Vector Machine (SVM), Random Forest, K-Nearest Neighbors (KNN), and Naive Bayes in analyzing visitor reviews of the Lokawisata Baturaden tourist attraction. Using 5-fold cross-validation, the study aims to determine which machine learning model best suits sentiment analysis on the Baturaden review data. This study was conducted through several stages, including data preprocessing, feature extraction, and the data training process. Case folding, text cleaning, tokenization, stopword removal, and stemming were performed during the data preprocessing stage. The feature extraction method used was TF-IDF. SMOTE was applied to increase data variation and address the data imbalance in the dataset. The results show that SVM provides the best performance with an accuracy of 0.937, an F1-score of 0.937, a precision of 0.943, and a recall of 0.937. Random Forest also performs well with an accuracy of 0.918 and an F1-score of 0.918, though slightly below SVM. KNN shows the lowest performance with an accuracy of 0.651 and an F1-score of 0.544, while Naive Bayes performs adequately with an accuracy of 0.845 and an F1-score of 0.841. Based on this evaluation, SVM is recommended as the best model for sentiment analysis of reviews, followed by Random Forest as a good alternative. The KNN model is not recommended due to its lower performance, while Naive Bayes can be considered for its speed and simplicity, although its results are not as good as SVM and Random Forest. These conclusions guide the selection of the optimal model to enhance understanding and visitor experience at the Baturaden tourist attraction.

References

Afrad, M., Muljono, M., & Pujiono, P. (2024). Utilization Of Principal Component Analysis To Improve Emotion Classification Performance In Text Using Artificial Neural Networks. Journal of Applied Intelligent System, 9(1), 8-18.

Haryawan, C., & Ardhana, Y. M. K. (2023). Analisa Perbandingan Teknik Oversampling Smote Pada Imbalanced Data. Jurnal Informatika dan Rekayasa Elektronik, 6(1), 73-78. doi: 10.36595/jire.v6i1.834.

Das, A., Gunturi, K. S., Chandrasekhar, A., Padhi, A., & Liu, Q. (2021, December). Automated pipeline for sentiment analysis of political tweets. In 2021 international conference on data mining workshops (icdmw) (pp. 128-135). IEEE.

Era, D., Andryana, S., & Rubhasy, A. (2023). Perbandingan Algoritma Naïve Bayes Dan K-Nearest Neighbor pada Analisis Sentimen Pembukaan Pariwisata Di Masa Pandemi Covid 19. J-SAKTI (Jurnal Sains Komputer dan Informatika), 7(1), 263-272.

Fahmi, S., Purnamawati, L., Shidik, G. F., Muljono, M., & Fanani, A. Z. (2020, September). Sentiment analysis of student review in learning management system based on sastrawi stemmer and SVM-PSO. In 2020 International Seminar on Application for Technology of Information and Communication (iSemantic) (pp. 643-648). IEEE. doi: 10.1109/iSemantic50169.2020.9234291.

Gifari, O. I., Adha, M., Hendrawan, I. R., & Durrand, F. F. S. (2022). Analisis Sentimen Review Film Menggunakan TF-IDF dan Support Vector Machine. Journal of Information Technology, 2(1), 36-40.

Magnolia, C., Nurhopipah, A., & Kusuma, B. A. (2022). Penanganan Imbalanced Dataset untuk Klasifikasi Komentar Program Kampus Merdeka Pada Aplikasi Twitter. Edu Komputika Journal, 9(2), 105-113. doi: 10.15294/edukomputika.v9i2.61854.

Marutho, D., Handaka, S. H., & Wijaya, E. (2018, September). The determination of cluster number at k-mean using elbow method and purity evaluation on headline news. In 2018 international seminar on application for technology of information and communication (pp. 533-538). IEEE. doi: 10.1109/ISEMANTIC.2018.8549751.

Mathayomchan, B., & Sripanidkulchai, K. (2019, July). Utilizing Google translated Reviews from Google maps in sentiment analysis for Phuket tourist attractions. In 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE) (pp. 260-265). IEEE. doi: 10.1109/JCSSE.2019.8864150.

Putu, N. L. P. M., & Amrullah, A. Z. (2021). Analisis Sentimen dan Pemodelan Topik Pariwisata Lombok Menggunakan Algoritma Naive Bayes dan Latent Dirichlet Allocation. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 5(1), 123-131. doi: 10.29207/resti.v5i1.2587.

Setya Rintyarna, B., Sarno, R., & Fatichah, C. (2019). Semantic features for optimizing supervised approach of sentiment analysis on product reviews. Computers, 8(3), 55. doi: 10.3390/computers8030055.

Singgalen, Y. A. (2021). Analisis sentimen dan pemodelan topik dalam optimalisasi pemasaran destinasi pariwisata prioritas di Indonesia. Journal of Information Systems and Informatics, 3(3), 459-470. doi: 10.51519/journalisi.v3i3.171.

Sentiment Analysis of Visitor Reviews on Baturaden Tourist Attraction Using Machine Learning Methods

Authors

DOI:

Keywords:

Abstract

References

Downloads

Article ID

Published

Issue

Section

How to Cite

Main-Sidebar

SINTA Accreditation Certificate

Keywords

Visitors

Stat Counter

ISSN

Tools

Information

Latest publications