The Effect of Rescaling on the Performance of Recognition with Arabic Characters Using Tesseract OCR Based on Long Short Term Memory
Abstract
The development of the ability to recognize handwritten character images is one of the branches of science that includes pattern recognition and image processing using Optical Character Recognition (OCR) technology. The performance achieved in the case of Arabic characters is not optimal, because of it is cursive nature and relatively high difficulty. Tesseract OCR Engine is a popular OCR framework that is open source and accurate in character recognition development. The Tesseract OCR Engine works well with images that are 300 dpi (dots per inch). This study focuses on rescaling analysis on the recognition of Arabic handwritten characters using Tesseract OCR Engine based Long Short-Term Memory, with scaling sizes 90%, 80%, 70%, and 60% of the source image size. And effect performance on recognized character will be measured with character accuracy as a method of success. This study used 70 images from publicly available IFN / ENIT image samples.
Copyright (c) 2020 Journal of Advances in Information Systems and Technology
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.