The Effect of Rescaling on the Performance of Recognition with Arabic Characters Using Tesseract OCR Based on Long Short Term Memory

  • Timur Gagah Prawiro Department of Computer Science, Faculty of Mathematics and Natural Sciences, Universitas Negeri Semarang, Semarang, Indonesia
  • Arifatul Khasanah Department of Computer Science, Faculty of Mathematics and Natural Sciences, Universitas Negeri Semarang, Semarang, Indonesia
Keywords: Tesseract OCR, Arabic Handwritten, Long Short-Term Memory

Abstract


The development of the ability to recognize handwritten character images is one of the branches of science that includes pattern recognition and image processing using Optical Character Recognition (OCR) technology. The performance achieved in the case of Arabic characters is not optimal, because of it is cursive nature and relatively high difficulty. Tesseract OCR Engine is a popular OCR framework that is open source and accurate in character recognition development. The Tesseract OCR Engine works well with images that are 300 dpi (dots per inch). This study focuses on rescaling analysis on the recognition of Arabic handwritten characters using Tesseract OCR Engine based Long Short-Term Memory, with scaling sizes 90%, 80%, 70%, and 60% of the source image size. And effect performance on recognized character will be measured with character accuracy as a method of success. This study used 70 images from publicly available IFN / ENIT image samples.

Published
2020-10-30
How to Cite
Prawiro, T., & Khasanah, A. (2020). The Effect of Rescaling on the Performance of Recognition with Arabic Characters Using Tesseract OCR Based on Long Short Term Memory. Journal of Advances in Information Systems and Technology, 2(2), 59-62. https://doi.org/10.15294/jaist.v2i2.44311
Section
Articles