Ultra-Low-Cost Hybrid OCR–LLM Architecture for Production Grade E-KTP Extraction
DOI:
https://doi.org/10.15294/sji.v12i4.38200Keywords:
e-KTP Extraction, Optical character recognition, Large language model, Microservices, Privacy-Preserving SystemAbstract
Purpose: The purpose of this study is to be able to avoid limitations of inexpensive ID card data extraction services and preserve privacy, which can simultaneously achieve reliable operation even under an environment with minimum infrastructure, in particular if no dependency on GPU-based servers are required.
Method: The proposed approach is a microservice pipeline with three stages: (1) local lightweight pre-processing on devices, (2) Tesseract CPU-based OCR. js, (3) fast text tokenization through a small premature external LLM. The system is developed as TypeScript backend utilizing the Hono framework with all image processing taking place locally in order to keeping user data private.
Result: The result of the experimental evaluations with real ID card samples is that the system can run stably in low-performance VPS (1 vCPU, 1 GB RAM) with operation cost approximately IDR 2.5047 per extraction process and its accuracy level is acceptable for use in a production environment. Moreover, the results indicate that system latency is dominated by LLM inference at the cloud.
Novelty: The main contribution and novelty of this study is that we demonstrate, for the first time, a cost-effective (privacy-preserving) OCR-LLM hybrid pipeline without demanding expensive GPU models at large scale which makes our system suitable under limited storage and resource constraints on-premises or edge environments in small organizations including micro-SaaS services.
