Application of the Naïve Bayes Classifier Algorithm using N-Gram and Information Gain to Improve the Accuracy of Restaurant Review Sentiment Analysis

  • Apriani Solikhatun Department of Computer Science, Faculty of Mathematics and Natural Sciences, Universitas Negeri Semarang, Semarang, Indonesia
  • Endang Sugiharti Department of Computer Science, Faculty of Mathematics and Natural Sciences, Universitas Negeri Semarang, Semarang, Indonesia
Keywords: Text Mining, Sentiment Analysis, Naïve Bayes Classifier, N-Gram, Information Gain

Abstract

A consumer's review is an essential aspect for influencing others in determining decisions. The process of identifying positive or negative reviews can be conducted through sentiment analysis. One of the popular techniques in the sentiment analysis is the Naïve Bayes Classifier (NBC) algorithm, which has optimal performance. The purpose of this study was to improve the accuracy of the classifier in the analysis of restaurant review sentiments by applying N-Gram as feature extraction and Information Gain as a feature selection. N-Gram is used to produce new features that are more varied, while information gain functions to select relevant features with high weights. The dataset used in this study is the sentiment labeled dataset from UCI machine learning. The results of applying the NBC have an accuracy of 82.5%. The research results revealed that the Naïve Bayes Classifier's accuracy by using N-Gram and information gain of 86%. The application of N-Gram and information gain in the NBC algorithm can be concluded that it has succeeded in improving the classification accuracy of the restaurant review sentiment analysis with an increase in accuracy of 3.5%.

Published
2020-10-30
How to Cite
Solikhatun, A., & Sugiharti, E. (2020). Application of the Naïve Bayes Classifier Algorithm using N-Gram and Information Gain to Improve the Accuracy of Restaurant Review Sentiment Analysis. Journal of Advances in Information Systems and Technology, 2(2), 11-20. https://doi.org/10.15294/jaist.v2i2.44303
Section
Articles