Validasi Otomatis Dokumen Transkrip Nilai Mahasiswa Menggunakan Metoda Optical Character Recognition

Zulkarnaen Hatala; Ahmad  Thariq; Muhammad  Hudzaly; Muhammad Ikhwan  Burhan

Authors

Zulkarnaen Hatala Politeknik Negeri Ambon
Ahmad Thariq Politeknik Negeri Ambon
Muhammad Hudzaly Politeknik Negeri Ambon
Muhammad Ikhwan Burhan Politeknik Negeri Ambon

Keywords:

Optical character recognition, Document image verification, Information retrieval

Abstract

At the Ambon State Polytechnic, students' semester grade reports are still manually typed. This causes frequent typo errors which can result in the invalidity of the document, let alone incorrect grades, student identification numbers and many other label values. Here a java application has been implemented to detect these errors. This application is primarily intended for officials of the Head of Study Program, Head of the Department before signing and validating the report. Officials who legalize it will be greatly assisted because tedious validation work can be replaced by computers. The validation process is carried out by utilizing the optical character recognition technique from the open source library Tesseract-OCR. From the experimental results the verification process can be improved by using OCR specific on specific regions of interest (ROI) after using template matching method from OpenCV. The consideration of the Levehnstein distance in the comparison of label values against the reference database also improves the success rate of the algorithm. The database used has been tested for about 800 grade report documents, with successful verification result above 90%.

References

Easterbrook, S., Singer, J., Storey, M.-A., & Damian, D. (2008). Selecting empirical methods for software engineering research. Guide to Advanced Empirical Software Engineering, 285–311.

Farrell, J. (2022). Java programming. Cengage Learning.

Fataicha, Y., Cheriet, M., Nie, J. Y., & Suen, C. Y. (2003). Information Retrieval Based on OCR Errors in Scanned Documents. 2003

Conference on Computer Vision and Pattern Recognition Workshop, 25–25. https://doi.org/10.1109/CVPRW.2003.10020

Gollapudi, S. (2019). Learn computer vision using OpenCV. Springer.

Hatala, Z. (2023). Verifikator Transkrip Nilai Semester Otomatis [Java]. https://github.com/dzhatala/scanned-document-verificator

Lee, Y., Song, J., & Won, Y. (2019). Improving personal information detection using OCR feature recognition rate. The Journal of Supercomputing, 75(4), 1941–1952. https://doi.org/10.1007/s11227-018-2444-0

Sipe-Haesemeyer, M. A. (2005). Bringing the World Wide Web into Third World Countries: Integrating Technology Across the Globe. Global Media Journal, 4(7).

Smith, R. (2007). An Overview of the Tesseract OCR Engine. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 2, 629–633. https://doi.org/10.1109/ICDAR.2007.4376991

Srigiri, S., & Saha, S. K. (2020). Spelling Correction of OCR-Generated Hindi Text Using Word Embedding and Levenshtein Distance. Nanoelectronics, Circuits and Communication Systems: Proceeding of NCCS 2018, 415–424.

Yamakawa, D., & Yoshiura, N. (2012). Applying Tesseract-OCR to detection of image spam mails. 2012 14th Asia-Pacific Network Operations and Management Symposium (APNOMS), 1–4.

Yeow, J. A., Ng, P. K., Tan, K. S., Chin, T. S., & Lim, W. Y. (2014). Effects of stress, repetition, fatigue and work environment on human error in manufacturing industries. Journal of Applied Sciences, 14(24), 3464–3471.

Validasi Otomatis Dokumen Transkrip Nilai Mahasiswa Menggunakan Metoda Optical Character Recognition

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information

Developed By

Make a Submission