Validasi Otomatis Dokumen Transkrip Nilai Mahasiswa Menggunakan Metoda Optical Character Recognition
Keywords:
Optical character recognition, Document image verification, Information retrievalAbstract
At the Ambon State Polytechnic, students' semester grade reports are still manually typed. This causes frequent typo errors which can result in the invalidity of the document, let alone incorrect grades, student identification numbers and many other label values. Here a java application has been implemented to detect these errors. This application is primarily intended for officials of the Head of Study Program, Head of the Department before signing and validating the report. Officials who legalize it will be greatly assisted because tedious validation work can be replaced by computers. The validation process is carried out by utilizing the optical character recognition technique from the open source library Tesseract-OCR. From the experimental results the verification process can be improved by using OCR specific on specific regions of interest (ROI) after using template matching method from OpenCV. The consideration of the Levehnstein distance in the comparison of label values against the reference database also improves the success rate of the algorithm. The database used has been tested for about 800 grade report documents, with successful verification result above 90%.References
Easterbrook, S., Singer, J., Storey, M.-A., & Damian, D. (2008). Selecting empirical methods for software engineering research. Guide to Advanced Empirical Software Engineering, 285–311.
Farrell, J. (2022). Java programming. Cengage Learning.
Fataicha, Y., Cheriet, M., Nie, J. Y., & Suen, C. Y. (2003). Information Retrieval Based on OCR Errors in Scanned Documents. 2003
Conference on Computer Vision and Pattern Recognition Workshop, 25–25. https://doi.org/10.1109/CVPRW.2003.10020
Gollapudi, S. (2019). Learn computer vision using OpenCV. Springer.
Hatala, Z. (2023). Verifikator Transkrip Nilai Semester Otomatis [Java]. https://github.com/dzhatala/scanned-document-verificator
Lee, Y., Song, J., & Won, Y. (2019). Improving personal information detection using OCR feature recognition rate. The Journal of Supercomputing, 75(4), 1941–1952. https://doi.org/10.1007/s11227-018-2444-0
Sipe-Haesemeyer, M. A. (2005). Bringing the World Wide Web into Third World Countries: Integrating Technology Across the Globe. Global Media Journal, 4(7).
Smith, R. (2007). An Overview of the Tesseract OCR Engine. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 2, 629–633. https://doi.org/10.1109/ICDAR.2007.4376991
Srigiri, S., & Saha, S. K. (2020). Spelling Correction of OCR-Generated Hindi Text Using Word Embedding and Levenshtein Distance. Nanoelectronics, Circuits and Communication Systems: Proceeding of NCCS 2018, 415–424.
Yamakawa, D., & Yoshiura, N. (2012). Applying Tesseract-OCR to detection of image spam mails. 2012 14th Asia-Pacific Network Operations and Management Symposium (APNOMS), 1–4.
Yeow, J. A., Ng, P. K., Tan, K. S., Chin, T. S., & Lim, W. Y. (2014). Effects of stress, repetition, fatigue and work environment on human error in manufacturing industries. Journal of Applied Sciences, 14(24), 3464–3471.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 KAKIFIKOM (Kumpulan Artikel Karya Ilmiah Fakultas Ilmu Komputer)
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.