Performance Evaluation of Otsu and Sauvola Thresholding for Structured Document Binarization
DOI:
https://doi.org/10.15294/sji.v13i1.40245Keywords:
Image thresholding, Otsu, Sauvola, Clahe, Document scannedAbstract
Purpose: Digitizing public administration records, particularly structured forms such as the Transport of Plants and Wildlife Abroad (Surat Angkut Tumbuhan dan Satwa Liar Luar Negeri / SATS-LN), necessitates meticulous preparation for precise subsequent analysis. Most of the photos in the SATS-LN archives are scanned, and they have inconsistent lighting, varying resolution, and background noise, which makes it difficult to separate the text from the backdrop and read it clearly. This work identifies the optimal SATS-LN binarization approach for preserving textual structure and suppressing background artifacts.
Methods: A four-stage pipeline is used. First, Detectron2 localizes seven important SATS-LN fields. Second, binarization is investigated with global Otsu and adaptive Sauvola thresholding under three parameter configurations. Third, following binarization, Contrast-Limited Adaptive Histogram Equalization (CLAHE) boosts local contrast. Finally, Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), Difference from Reference for Distortion (DRD), Precision, Recall, F1-score, and Foreground Ratio are assessed on 200 annotated SATS-LN documents (150 scanner-based/DOC and 50 camera captured/CAM).
Result: The acquisition domain and assessment model affect binarization performance on 200 SATS-LN documents (150 DOC scans and 50 CAM images). Global Otsu_T10 has the highest median PSNR (21.19 dB) and the lowest median MSE (494.69), indicating a visually cleaner background. However, segmentation-based metrics show better stroke preservation with Sauvola, as Sauvola_k05 has the strongest DOC text–background separation (F1 = 0.938). In the CAM domain, where illumination variability dominates, Sauvola performs better across structural and segmentation indicators, with Sauvola_k04 performing best overall (F1 = 0.980) and mitigating the over-segmentation tendency of strict global thresholds. The Sauvola window (25x25) and CLAHE clip limit (1.0) results suggest using Sauvola_k05 for DOC and Sauvola_k04 for CAM to preserve text integrity and reduce background artifacts.
Novelty: This study presents a novel field-level binarization assessment that combines automated cropping and ground-truth evaluation, providing practical guidance for robust preprocessing that supports scalable, reliable, and cross-device public document digitization.
