Indonesian News Text Summarization Using MBART Algorithm

Rahma Hayuning Astuti; Muljono Muljono; Sutriawan Sutriawan

doi:10.15294/sji.v11i1.49224

Indonesian News Text Summarization Using MBART Algorithm

Rahma Hayuning Astuti⁽¹⁾, Muljono Muljono⁽²⁾, Sutriawan Sutriawan⁽³⁾,

DOI: https://doi.org/10.15294/sji.v11i1.49224

(1) Department of Computer Science, Universitas Dian Nuswantoro, Indonesia
(2) Department of Computer Science, Universitas Dian Nuswantoro, Indonesia
(3) Department of Computer Science, Universitas Muhammadiyah Bima, Indonesia

Abstract

Purpose: Technology advancements have led to the production of a large amount of textual data. There are numerous locations where one can find textual information sources, including blogs, news portals, and websites. Kompas, BBC, Liputan 6, CNN, and other news portals are a few websites that offer news in Indonesian. The purpose of this study was to explore the effectiveness of using mBART in text summarization for Bahasa Indonesia.

Methods: This study uses mBART, a transformer architecture, to perform fine-tuning to generate news article summaries in Bahasa Indonesia. Evaluation was conducted using the ROUGE method to assess the quality of the summaries produced.

Results: Evaluation using the ROUGE metric showed better results, with ROUGE-1 of 35.94, ROUGE-2 of 16.43, and ROUGE-L of 29.91. However, the performance of the model is still not optimal compared to existing models in text summarization for another language.

Novelty: The novelty of this research lies in the use of mBART for text summarization, specifically adapted for Bahasa Indonesia. In addition, the findings also contribute to understanding the challenges and opportunities of improving text summarization techniques in the Indonesian context.

Keywords

Abstractive text summarization; MBART; ROUGE; News

Full Text:

PDF

References

H. Lin and V. Ng, “Abstractive Summarization: A Survey of the State of the Art,” Proc. AAAI Conf. Artif. Intell., vol. 33, no. 01, pp. 9815–9822, 2019, doi: 10.1609/aaai.v33i01.33019815.

W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, “Automatic text summarization: A comprehensive survey,” Expert Syst. Appl., vol. 165, p. 113679, 2021, doi: 10.1016/j.eswa.2020.113679.

T. Hasan et al., “XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages,” Find. Assoc. Comput. Linguist. ACL-IJCNLP 2021, pp. 4693–4703, 2021, doi: 10.18653/v1/2021.findings-acl.413.

Á. Hernández-Castañeda, R. A. García-Hernández, Y. Ledeneva, and C. E. Millán-Hernández, “Language-independent extractive automatic text summarization based on automatic keyword extraction,” Comput. Speech Lang., vol. 71, no. May 2021, p. 101267, 2022, doi: 10.1016/j.csl.2021.101267.

Q. Wang, P. Liu, Z. Zhu, H. Yin, Q. Zhang, and L. Zhang, “A text abstraction summary model based on BERT word embedding and reinforcement learning,” Appl. Sci., vol. 9, no. 21, 2019, doi: 10.3390/app9214701.

P. R. Dedhia, H. P. Pachgade, A. P. Malani, N. Raul, and M. Naik, “Study on Abstractive Text Summarization Techniques,” Int. Conf. Emerg. Trends Inf. Technol. Eng. ic-ETITE 2020, no. February 2020, 2020, doi: 10.1109/ic-ETITE47903.2020.087.

R. Adelia, S. Suyanto, and U. N. Wisesty, “Indonesian abstractive text summarization using bidirectional gated recurrent unit,” Procedia Comput. Sci., vol. 157, pp. 581–588, 2019, doi: 10.1016/j.procs.2019.09.017.

P. Verma, S. Pal, and H. Om, “A comparative analysis on Hindi and English extractive text summarization,” ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 18, no. 3, 2019, doi: 10.1145/3308754.

A. S. Girsang and F. J. Amadeus, “Extractive Text Summarization for Indonesian News Article Using Ant System Algorithm,” J. Adv. Inf. Technol., vol. 14, no. 2, pp. 295–301, 2023, doi: 10.12720/jait.14.2.295-301.

and J. D. H. Vaibhav Gulati, Deepika Kumar, Daniela Elena Popescu, “Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm Vaibhav,” Electronics, vol. 12, no. 2, pp. 1–17, 2023, doi: https://doi.org/10.3390/.

W. Etaiwi and A. Awajan, “SemG-TS: Abstractive Arabic Text Summarization Using Semantic Graph Embedding,” Mathematics, vol. 10, no. 18, 2022, doi: 10.3390/math10183225.

A. Reda et al., “A Hybrid Arabic Text Summarization Approach based on Transformers,” in 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference, MIUCC 2022, 2022, pp. 56–62. doi: 10.1109/MIUCC55081.2022.9781694.

Y. Iwasaki, A. Yamashita, Y. Konno, and K. Matsubayashi, “Japanese abstractive text summarization using BERT,” Adv. Sci. Technol. Eng. Syst., vol. 5, no. 6, pp. 1674–1682, 2020, doi: 10.25046/AJ0506199.

P. Kouris, G. Alexandridis, and A. Stafylopatis, “Abstractive text summarization based on deep learning and semantic content generalization,” ACL 2019 - 57th Annu. Meet. Assoc. Comput. Linguist. Proc. Conf., pp. 5082–5092, 2020, doi: 10.18653/v1/p19-1501.

S. Song, H. Huang, and T. Ruan, “Abstractive text summarization using LSTM-CNN based deep learning,” Multimed. Tools Appl., vol. 78, no. 1, pp. 857–875, 2019, doi: 10.1007/s11042-018-5749-3.

R. Rani and D. K. Lobiyal, “An extractive text summarization approach using tagged-LDA based topic modeling,” Multimed. Tools Appl., vol. 80, no. 3, pp. 3275–3305, 2021, doi: 10.1007/s11042-020-09549-3.

R. C. Belwal, S. Rai, and A. Gupta, “A new graph-based extractive text summarization using keywords or topic modeling,” J. Ambient Intell. Humaniz. Comput., vol. 12, no. 10, pp. 8975–8990, 2021, doi: 10.1007/s12652-020-02591-x.

H. Zhang, J. Cai, J. Xu, and J. Wang, “Pretraining-based natural language generation for text summarization,” CoNLL 2019 - 23rd Conf. Comput. Nat. Lang. Learn. Proc. Conf., pp. 789–797, 2019, doi: 10.18653/v1/k19-1074.

S. Lamsiyah, A. El Mahdaouy, S. E. A. Ouatik, and B. Espinasse, “Unsupervised extractive multi-document summarization method based on transfer learning from BERT multi-task fine-tuning,” J. Inf. Sci., 2021, doi: 10.1177/0165551521990616.

M. Lewis et al., “BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 7871–7880, 2020, doi: 10.18653/v1/2020.acl-main.703.

D. Uthus, S. Ontañón, J. Ainslie, and M. Guo, “mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences,” vol. 5, 2023.

I. Gusev, “Dataset for Automatic Summarization of Russian News,” Commun. Comput. Inf. Sci., vol. 1292 CCIS, pp. 122–134, 2020, doi: 10.1007/978-3-030-59082-6_9.

D. Taunk and V. Varma, “Summarizing Indian Languages using Multilingual Transformers based Models,” CEUR Workshop Proc., vol. 3395, pp. 435–442, 2022.

H. Nguyen, L. Phan, J. Anibal, A. Peltekian, and H. Tran, “VieSum: How Robust Are Transformer-based Models on Vietnamese Summarization?,” 2021.

T.-H. Nguyen and T.-N. Do, “Text Summarization on Large-scale Vietnamese Datasets,” J. Inf. Commun. Converg. Eng., vol. 20, no. 4, pp. 309–316, 2022, doi: 10.56977/jicce.2022.20.4.309.

K. Zheng and W. Zheng, “Deep Neural Networks Algorithm for Vietnamese Word Segmentation,” Sci. Program., vol. 2022, 2022, doi: 10.1155/2022/8187680.

Y. Wang, J. Du, J. Kuang, C. Chen, M. Li, and J. Wang, “Two-Scaled Identification of Landscape Character Types and Areas: A Case Study of the Yunnan–Vietnam Railway (Yunnan Section), China,” Sustain., vol. 15, no. 7, 2023, doi: 10.3390/su15076173.

H. Face, “Indonesian News XL-Sum Dataset,” 2024.

C. Y. Lin, “Rouge: A package for automatic evaluation of summaries,” Proc. Work. text Summ. branches out (WAS 2004), no. 1, pp. 25–26, 2004.

J. Zhong and Z. Wang, “MTL-DAS: Automatic Text Summarization for Domain Adaptation,” Comput. Intell. Neurosci., vol. 2022, 2022, doi: 10.1155/2022/4851828.

J. D’Silva and U. Sharma, “Automatic text summarization of konkani texts using pre-trained word embeddings and deep learning,” Int. J. Electr. Comput. Eng., vol. 12, no. 2, pp. 1990–2000, 2022, doi: 10.11591/ijece.v12i2.pp1990-2000.

J. Richardson, “SentencePiece : A simple and language independent subword tokenizer and detokenizer for Neural Text Processing,” pp. 66–71, 2018.

Y. Tang et al., “Multilingual Translation with Extensible Multilingual Pretraining and Finetuning,” 2020.

S. Lamsiyah, A. E. Mahdaouy, S. E. A. Ouatik, and B. Espinasse, “Unsupervised extractive multi-document summarization method based on transfer learning from BERT multi-task fine-tuning,” J. Inf. Sci., vol. 49, no. 1, pp. 164–182, 2023, doi: 10.1177/0165551521990616.

A. Ghadimi and H. Beigy, “Hybrid multi-document summarization using pre-trained language models,” Expert Syst. Appl., vol. 192, 2022, doi: 10.1016/j.eswa.2021.116292.

Y. Liu et al., “Multilingual denoising pre-training for neural machine translation,” Trans. Assoc. Comput. Linguist., vol. 8, pp. 726–742, 2020, doi: 10.1162/tacl_a_00343.

G. L. A. Babu and S. Badugu, “Deep learning based sequence to sequence model for abstractive telugu text summarization,” Multimed. Tools Appl., vol. 82, no. 11, pp. 17075–17096, 2023, doi: 10.1007/s11042-022-14099-x.

G. Sharma and D. Sharma, “Improving Extractive Text Summarization Performance Using Enhanced Feature Based RBM Method,” Rev. d’Intelligence Artif., vol. 36, no. 5, pp. 777–784, 2022, doi: 10.18280/ria.360516.

S. Ullah and A. B. M. A. Al Islam, “A framework for extractive text summarization using semantic graph based approach,” in 6th International Conference on Networking, Systems and Security, NSysS 2019, 2019, pp. 48–58. doi: 10.1145/3362966.3362971.

J.-W. He, W.-J. Jiang, G.-B. Chen, Y.-Q. Le, and X.-F. Ding, “Enhancing N-Gram Based Metrics with Semantics for Better Evaluation of Abstractive Text Summarization,” J. Comput. Sci. Technol., vol. 37, no. 5, pp. 1118–1133, 2022, doi: 10.1007/s11390-022-2125-6.

Z. Deng, F. Ma, R. Lan, W. Huang, and X. Luo, “A Two-stage Chinese text summarization algorithm using keyword information and adversarial learning,” Neurocomputing, vol. 425, pp. 117–126, 2021, doi: 10.1016/j.neucom.2020.02.102.

L. Xue et al., “mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer,” NAACL-HLT 2021 - 2021 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Conf., pp. 483–498, 2021, doi: 10.18653/v1/2021.naacl-main.41.

M. K. Eddine, N. Tomeh, N. Habash, J. Le Roux, and M. Vazirgiannis, “AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization,” 2022.

N. Landro, I. Gallo, R. La Grassa, and E. Federici, “Two New Datasets for Italian-Language Abstractive Text Summarization,” Inf., vol. 13, no. 5, pp. 1–12, 2022, doi: 10.3390/info13050228.

A. Bukhtiyarov and I. Gusev, “Advances of Transformer-Based Models for News Headline Generation,” Commun. Comput. Inf. Sci., vol. 1292 CCIS, pp. 54–61, 2020, doi: 10.1007/978-3-030-59082-6_4.

Refbacks

There are currently no refbacks.

Scientific Journal of Informatics (SJI)
p-ISSN 2407-7658 | e-ISSN 2460-0040
Published By Department of Computer Science Universitas Negeri Semarang
Website: https://journal.unnes.ac.id/nju/index.php/sji
Email: sji@mail.unnes.ac.id

This work is licensed under a Creative Commons Attribution 4.0 International License.

Username
Password
Remember me