Implementation of Support Vector Machine Algorithm with Correlation-Based Feature Selection and Term Frequency Inverse Document Frequency for Sentiment Analysis Review Hotel

Novia Puji Ririanti(1), Aji Purwinarko(2),


(1) Universitas Negeri Semarang
(2) Universitas Negeri Semarang

Abstract

Purpose: The study aims to reduce the number of irrelevant features in sentiment analysis with large features. Methods/Study design/approach: The Support Vector Machine (SVM) algorithm is used to classify hotel review sentiment analysis because it has advantages in processing large datasets. Term Frequency-Inverse Document Frequency (TF-IDF) is used to give weight values to features in the dataset. Result/Findings: This study's results indicate that the accuracy of the SVM method with TF-IDF produces an accuracy of 93.14%, and the SVM method in the classification of hotel reviews by implementing TFIDF and CFS has increased by 1.18% from 93.14% to 94.32%. Novelty/Originality/Value: Use of Correlation-Based Feature Section (CFS) for the feature selection process, which reduces the number of irrelevant features by ranking the feature subset based on the strong correlation value in each feature

Keywords

Support Vector Machine; Correlation-Based Feature Section; Term Frequency-Inverse Document Frequency; Hotel Reviews

Full Text:

PDF

References

E. Kontopoulos, C. Berberidis, T. Dergiades, and N. Bassiliades, “Ontology-based sentiment analysis of Twitter posts,” Expert Syst. Appl., vol. 40, no. 10, pp. 4065–4074, 2013. [2] Z. Zhang, Q. Ye, Z. Zhang, and Y. Li, “Sentiment classification of Internet restaurant reviews written in Cantonese,” Expert Syst. Appl., vol. 38, no. 6, pp. 7674–7682, 2011.

V. Kotu & B. Deshpande, “Data Exploration. In Predictive Analytics and Data Mining,” in Predictive and Analysis, 2015, Ch. 3, pp. 37-61.

F. Zhang, H. Fleyeh, X. Wang, & M. Lu, “Construction site accident analysis using text mining and natural language processing techniques,” Autom. Constr., vol. 99, pp. 238–248, 2019.

R. Moraes, J. F. Valiati, & W. P. Gavião Neto, “Document-level sentiment classification: An empirical comparison between SVM and ANN,” Expert Syst. Appl., vol. 40, pp. 621-633, 2013.

B. Azhagusundari & A. S. Thanamani, “Feature Selection based on Information Gain,” Int. J. Innov. Technol. Explor. Eng., vol. 2, no. 2, pp. 2278-3075, 2013.

P. Yildirim, “Filter Based Feature Selection Methods for Prediction of Risks in Hepatitis Disease,” Int. J. Mach. Learn. Comput., vol. 5, no. 4, pp. 258 –263 2015.

I. Jain, V. K. Jain, & R. Jain, “Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification,” Appl. Soft Comput. J., vol. 62, pp. 203–215, 2018.

C. D. Manning, P. Raghavan, & H. Schütze, “Introduction to Information Retrieval,” Cambridge: Cambridge University Press, 2008.

V. Kalra, & R. Aggarwal, “Importance of Text Data Preprocessing & Implementation in RapidMiner,” in Proc. First Int. Conf. Inf. Technol. Knowl. Manag., 2018, pp. 71 – 75.

M. Shah, A. Monga, S. Patel, M. Shah, H. Bakshi, “A study of prevalence of primary dysmenorrhea in young students - A cross-sectional study,” Heal. J., vol. 4, no. 2, pp.1-4, 2013.

S. Kannan, V. Gurusamy, S. Vijayarani, J. Ilamathi, M. Nithya, S. Kannan, & V. Gurusamy, “Preprocessing Techniques for Text Mining,” Int. J. Comput. Sci. Commun. Netw., vol. 5, no. 1, pp. 7–16, 2015.

Refbacks

  • There are currently no refbacks.




Scientific Journal of Informatics (SJI)
p-ISSN 2407-7658 | e-ISSN 2460-0040
Published By Department of Computer Science Universitas Negeri Semarang
Website: https://journal.unnes.ac.id/nju/index.php/sji
Email: [email protected]

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.