Topic Modeling on WhatsApp User Reviews Using Latent Dirichlet Allocation

Iqbal Kharisudin(1), Hera Masri'an(2),


(1) Universitas Negeri Semarang
(2) Universitas Negeri Semarang

Abstract

Abstract.

Purpose: Topic modeling is a practical algorithm for identifying topics in text data. This study aims to find issues of WhatsApp user reviews using Latent Dirichlet Allocation (LDA) and describe the characteristics of each case.

Method: We used 1710 WhatsApp user reviews written 7-13 August 2020 on Google Play. This research was conducted with a qualitative method consisting of five stages: problem identification, data retrieval, preprocessing, modeling, and analysis. The modeling stage consists of making a Document-Term Matrix (DTM), determining the number of iterations and topics, and building a model. We use perplexity as to the indicator in determining the number of iterations and topics. A lower perplexity value indicates a better model performance. The analysis phase includes observations on the top terms and documents to label and describe the characteristics of each topic.

Result: Topic modeling produces word-topic and document-topic assignments. The word-topic assignment contains words with high probability (top terms). Document-topic assignment reveals documents that have a high probability (top documents). The topics most frequently discussed were voice and video calls with 104 reviews, 86 reviews of call quality, photo and video quality with 100 reviews, and voice messages with 75 reviews.

Novelty: In this research, a topic model has been generated for a user review of the WhatsApp application using Latent Dirichlet Allocation. The number of iterations in the modeling was determined based on the observation of the perplexity value, instead of randomly assigning iterations.

Keywords

Text Mining; Topic Modeling; Latent Dirichlet Allocation; WhatsApp User Review

Full Text:

PDF

References

Prasdika and B. Sugiantoro, “A review paper on big data and data mining: concept and techniques,” Int. J. Informatics Dev., vol. 7, no. 1, pp. 36–38, 2018.

B. van der Sloot, D. Broeders, and E. Schrijvers, Exploring the Boundaries of Big Data. Amsterdam: WRR/Amsterdam University Press, The Hague, 2016.

J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 1998.

I. Feinerer, K. Hornik, and D. Meyer, “Text mining infrastructure in R,” J. Stat. Softw., vol. 25, no. 5, pp. 1–54, 2008.

S. N. Asiyah and K. Fithriasari, “Klasifikasi berita online menggunakan metode support vector machine dan k-nearest neighbor,” J. Sains dan Seni ITS, vol. 5, no. 2, pp. 2337–3520, 2016.

I. Adiwijaya, “Text Mining dan Knowledge Discovery,” 2006.

T. Kwartler, Text Mining in Pratice with R. United Kingdom: John Wiley & Sons Ltd, 2017.

B. Esmaeili, B. C. Wallace, H. Huang, and J. W. van de Meent, “Structured neural topic models for reviews,” in 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019, vol. 89.

D. L. John et al., “Topic modeling to extract information from nutraceutical product reviews,” in 16th IEEE Annual Consumer Communications & Networking Conference (CCNC), 2019, pp. 1–6.

R. Akila, S. Revathi, and G. Shreedevi, “Opinion mining on food services using topic modeling and machine learning algorithms,” in 2020 6th International Conference on Advanced Computing and Communication Systems, 2020, vol. 6, pp. 1071–1076.

D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003.

I. M. K. B. Putra, “Analisis topik informasi publik media sosial di surabaya menggunakan pemodelan latent dirichlet allocation (LDA),” Institut Teknologi Sepuluh Nopember, 2017.

R. Annisa, I. Surjandari, and Zulkarnain, “Opinion mining on Mandalika hotel reviews using latent dirichlet allocation,” Procedia Comput. Sci., vol. 161, pp. 739–746, 2019.

I. Sutherland, Y. Sim, S. K. Lee, J. Byun, and K. Kiatkawsin, “Topic modeling of online accommodation reviews via latent dirichlet allocation,” Sustain., vol. 12, no. 5, 2020.

A. R. Destarani, I. Slamet, and S. Subanti, “Trend topic analysis using latent dirichlet allocation (LDA) (study case: Denpasar people’s complaints online website),” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 5, no. 1, pp. 50–58, 2019.

M. D. R. Wahyudi, A. Fatwanto, U. Kiftiyani, and M. G. Wonoseto, “Topic modeling of online media news titles during covid-19 emergency response in Indonesia using the latent dirichlet allocation (LDA) algorithm,” Telematika, vol. 14, no. 2, pp. 101–111, 2021.

M. Song, M. C. Kim, and Y. K. Jeong, “Analyzing the political landscape of 2012 Korean presidential election in twitter,” IEEE Intell. Syst., vol. 29, no. 2, pp. 18–26, 2014.

Y. Zhang, M. Chen, D. Huang, D. Wu, and Y. Li, “iDoctor: personalized and professionalized medical recommendations based on hybrid matrix factorization,” Futur. Gener. Comput. Syst., vol. 66, pp. 30–35, 2017.

X. Yang, D. Lo, L. Li, X. Xia, T. F. Bissyandé, and J. Klein, “Characterizing malicious android apps by mining topic-specific data flow signatures,” Inf. Softw. Technol., vol. 90, pp. 27–39, 2017.

S. W. Putro, H. Semuel, and R. K. M. R. Brahmana, “Pengaruh kualitas layanan dan kualitas produk terhadap kepuasan pelanggan dan loyalitas konsumen restoran happy garden Surabaya,” J. Manaj. Pemasar., vol. 2, no. 1, pp. 1–9, 2014.

G. M. Richardson, J. Bowers, A. J. Woodill, J. R. Barr, J. M. Gawron, and R. A. Levine, “Topic models: a tutorial with R,” Int. J. Semant. Comput., vol. 8, no. 1, pp. 85–98, 2014.

E. L. Nylen and P. Wallisch, Neural Data Science: A Primer with MATLAB® and PythonTM. Academic Press, 2017.

A. Bradley and R. J. E. James, “Web scraping using R,” Adv. Methods Pract. Psychol. Sci., vol. 2, no. 3, pp. 1–7, 2019.

Nidhi, “Number of Topics for LDA on Poems from Elliston Poetry Archive,” 2017. .

H. Jelodar et al., “Latent dirichlet allocation (LDA) and topic modeling: models, applications, a survey,” Multimed. Tools Appl., vol. 78, pp. 183–198, 2018.

D. M. Blei, “Introduction to probabilistic topic models,” Commun. ACM, vol. 55, no. 4, pp. 77–84, 2011.

Zulhanif, “Pemodelan topik dengan latent dirichlet allocation,” 2016.

B. Grün and K. Hornik, “Topicmodels: an R package for fitting topic models,” J. Stat. Softw., vol. 40, no. 13, pp. 1–30, 2011.

T. L. Griffiths and M. Steyvers, “Finding scientific topics,” Proc. Natl. Acad. Sci. U. S. A., vol. 101, pp. 5228–5235, 2004.

A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B Methodol., vol. 39, no. 1, pp. 1–38, 1977.

A. Juari and A. Purwarianti, “Deteksi OOV menggunakan hasil pengenalan suara otomatis untuk bahasa Indonesia,” J. Ilmu Komput. dan Inf., vol. 2, no. 2, pp. 70–75, 2009.

Refbacks

  • There are currently no refbacks.




Scientific Journal of Informatics (SJI)
p-ISSN 2407-7658 | e-ISSN 2460-0040
Published By Department of Computer Science Universitas Negeri Semarang
Website: https://journal.unnes.ac.id/nju/index.php/sji
Email: [email protected]

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.