Validasi Otomatis Dokumen Transkrip Nilai Mahasiswa Menggunakan Metoda Optical Character Recognition

Authors

  • Zulkarnaen Hatala Politeknik Negeri Ambon
  • Ahmad Thariq Politeknik Negeri Ambon
  • Muhammad Hudzaly Politeknik Negeri Ambon
  • Muhammad Ikhwan Burhan Politeknik Negeri Ambon

Keywords:

Optical character recognition, Document image verification, Information retrieval

Abstract

At the Ambon State Polytechnic, students' semester grade reports are still manually typed. This causes frequent typo errors which can result in the invalidity of the document, let alone incorrect grades, student identification numbers and many other label values. Here a java application has been implemented to detect these errors. This application is primarily intended for officials of the Head of Study Program, Head of the Department before signing and validating the report. Officials who legalize it will be greatly assisted because tedious validation work can be replaced by computers. The validation process is carried out by utilizing the optical character recognition technique from the open source library Tesseract-OCR. From the experimental results the verification process can be improved by using OCR  specific on specific regions of interest (ROI) after using template matching method from OpenCV. The consideration of the Levehnstein distance in the comparison of label values against the reference database also improves the success rate of the algorithm. The database used has been tested for about 800 grade report documents, with successful verification result above 90%.

References

Easterbrook, S., Singer, J., Storey, M.-A., & Damian, D. (2008). Selecting empirical methods for software engineering research. Guide to Advanced Empirical Software Engineering, 285–311.

Farrell, J. (2022). Java programming. Cengage Learning.

Fataicha, Y., Cheriet, M., Nie, J. Y., & Suen, C. Y. (2003). Information Retrieval Based on OCR Errors in Scanned Documents. 2003

Conference on Computer Vision and Pattern Recognition Workshop, 25–25. https://doi.org/10.1109/CVPRW.2003.10020

Gollapudi, S. (2019). Learn computer vision using OpenCV. Springer.

Hatala, Z. (2023). Verifikator Transkrip Nilai Semester Otomatis [Java]. https://github.com/dzhatala/scanned-document-verificator

Lee, Y., Song, J., & Won, Y. (2019). Improving personal information detection using OCR feature recognition rate. The Journal of Supercomputing, 75(4), 1941–1952. https://doi.org/10.1007/s11227-018-2444-0

Sipe-Haesemeyer, M. A. (2005). Bringing the World Wide Web into Third World Countries: Integrating Technology Across the Globe. Global Media Journal, 4(7).

Smith, R. (2007). An Overview of the Tesseract OCR Engine. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 2, 629–633. https://doi.org/10.1109/ICDAR.2007.4376991

Srigiri, S., & Saha, S. K. (2020). Spelling Correction of OCR-Generated Hindi Text Using Word Embedding and Levenshtein Distance. Nanoelectronics, Circuits and Communication Systems: Proceeding of NCCS 2018, 415–424.

Yamakawa, D., & Yoshiura, N. (2012). Applying Tesseract-OCR to detection of image spam mails. 2012 14th Asia-Pacific Network Operations and Management Symposium (APNOMS), 1–4.

Yeow, J. A., Ng, P. K., Tan, K. S., Chin, T. S., & Lim, W. Y. (2014). Effects of stress, repetition, fatigue and work environment on human error in manufacturing industries. Journal of Applied Sciences, 14(24), 3464–3471.

Downloads

Published

2023-07-11

How to Cite

Hatala, Z., Thariq , A. ., Hudzaly, M. ., & Burhan, M. I. . (2023). Validasi Otomatis Dokumen Transkrip Nilai Mahasiswa Menggunakan Metoda Optical Character Recognition. KAKIFIKOM (Kumpulan Artikel Karya Ilmiah Fakultas Ilmu Komputer), 5(1), 1–5. Retrieved from https://ejournal.ust.ac.id/index.php/KAKIFIKOM/article/view/2750

Issue

Section

Artikel