Analisis Perbandingan Kinerja Algoritma Klasifikasi Data Menggunakan Metode K-NN, Naive Bayes, dan Decision Tree pada Dataset UCI Iris

Muhammad Dicky Azhary  Octavianto; Tata  Subtari

Authors

Muhammad Dicky Azhary Octavianto Universitas Bina Darma, Palembang
Tata Subtari Universitas Bina Darma, Palembang

Keywords:

Data Mining, Classification, K-NN, Naive Bayes, Decision Tree, Iris Dataset

Abstract

Data classification is one of the important techniques in data mining and machine learning, which is widely used to group data into certain classes. This study aims to analyze and compare the performance of three classification algorithms, namely K-Nearest Neighbor (K-NN), Naive Bayes, and Decision Tree, in classifying Iris data from the UCI Machine Learning Repository. This dataset consists of 150 data with four feature attributes and three target classes. Testing was carried out using the cross-validation method with a k-fold approach of 10 folds. The results of the performance evaluation were measured using the metrics of accuracy, precision, recall, and f1-score. Based on the test results, the K-NN algorithm showed the highest accuracy rate of 96.67%, followed by Decision Tree at 95.33%, and Naive Bayes at 94.00%. These findings indicate that choosing the right classification algorithm can affect the success rate in the data classification process.

References

Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.

Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27.

Rish, I. (2001). An empirical study of the naive Bayes classifier. IJCAI 2001 Workshop on Empirical Methods in AI.

Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77–90.

UCI Machine Learning Repository. (n.d.). Iris Data Set. Retrieved from https://archive.ics.uci.edu/ml/datasets/iris

Prasetyo, E. (2020). Data Mining: Konsep dan Aplikasi Menggunakan MATLAB. Andi.

Lestari, R., & Wijaya, H. (2021). Perbandingan algoritma klasifikasi pada dataset Iris. Jurnal Teknologi dan Informatika, 15(2), 125–134.

Witten, I. H., Frank, E., & Hall, M. A. (2016). Data Mining: Practical Machine Learning Tools and Techniques (4th ed.). Morgan Kaufmann.

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.

Tan, P. N., Steinbach, M., & Kumar, V. (2019). Introduction to Data Mining (2nd ed.). Pearson.

Rokach, L., & Maimon, O. (2014). Data Mining with Decision Trees: Theory and Applications. World Scientific.

Zhang, H. (2004). The optimality of naive Bayes. FLAIRS Conference, 1, 563–567.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer.

Raschka, S. (2015). Python Machine Learning. Packt Publishing.

Pedregosa, F., Varoquaux, G., Gramfort, A., et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.