Klasifikasi Data Tak Seimbang menggunakan Algoritma Random Forest dengan SMOTE dan SMOTE-ENN (Studi Kasus pada Data Stunting)

Anju Fauziah; Julan Hernadi

doi:10.30787/restia.v3i2.1906

Penulis

Anju Fauziah Universitas Ahmad Dahlan
Julan Hernadi

DOI:

https://doi.org/10.30787/restia.v3i2.1906

Kata Kunci:

Informatics Engineering, Information Systems, Distributed Computer Systems, Artificial Intelligence, artificial intelligence system

Abstrak

Algoritma random forest merupakan salah satu metode klasifikasi pembelajaran mesin yang banyak digunakan karena memiliki keunggulan dalam mengurangi resiko overfitting sekaligus meningkatkan kinerja prediksi secara umum. Namun untuk data dengan kelas tidak seimbang, algoritma ini tidak mampu mencapai performa maksimal khususnya dalam memprediksi data pada kelas minoritas. Untuk itu artikel ini menawarkan dua metode resampling untuk menyeimbangkan data, yaitu Synthetic Minority Oversampling Technique (SMOTE) dan Synthetic Minority Oversampling Technique with Edited Nearest Neighbors (SMOTE-ENN). Untuk klasifikasi data diterapkan algoritma random forest terhadap data asli dan hasil resampling baik menggunakan SMOTE maupun SMOTE-ENN. Studi kasus diterapkan pada data stunting yang berjumlah 421 pada kelas mayoritas dan 79 pada kelas minoritas. Diperoleh akurasi 89% pada data asli, 90% pada data hasil resampling dengan SMOTE-ENN, dan 91% pada data resampling dengan SMOTE. Walaupun tidak terlalu signifikan, teknik resampling dengan SMOTE memberikan akurasi terbaik.

Referensi

R. Hitman et al., “Stunting Prevention Counseling for Children (Expanding Stunting Prevention for Children),” Community Development Journal, vol. 2, no. 3 August 2021.

E. Lestari, Z. Shaluhiyah, and M. Sakundarno Adi, “MPPKI Media Publikasi Promosi Kesehatan Indonesia,” vol. 6, no. 2 August 2023, doi: 10.31934/mppki.v2i3.

UNICEF, WHO, and World Bank, Child Malnutrition Levels and Trends 2023. 2023.

Rokom, “Stunting Prevalence in Indonesia Drops to 21.6% from 24.4%.” Accessed: May 17, 2024. [Online]. Available: https://sehatnegeriku.kemkes.go.id/baca/rilis-media/20230125/3142280/prevalensi-stunting-di-indonesia-turun-ke-216-dari-244/

[ “Guidelines for Implementing Integrated Stunting Reduction Interventions in Districts and Cities.”

“Random Forest Algorithm in Machine Learning.” Accessed: December 9, 2024. [Online]. Available: https://www.geeksforgeeks.org/random-forest-algorithm-in-machine-learning

L. Breiman, “Random Forest,” 2001.

V. Kumar et al., “Addressing Binary Classification on Class-Imbalanced Clinical Datasets Using Intelligent Computing Techniques,” Healthcare (Switzerland), vol. 10, no. 7, July 2022, doi: 10.3390/healthcare10071293.

T. Bouabana-Tebibel and S. H. Rubin, “Advances in Intelligent Systems and Computing 446.” [Online]. Available: http://www.springer.com/series/11156

R. Ghorbani and R. Ghousi, “Comparing Different Resampling Methods in Predicting Student Performance Using Machine Learning Techniques,” IEEE Access, vol. 8, pp. 67899–67911, 2020, doi: 10.1109/ACCESS.2020.2986809.

X. Wang et al., “Early warning of diabetes mellitus and factor analysis using Bayesian ensemble networks with SMOTE-ENN and Boruta,” Sci Rep, vol. 13, no. 1, December 2023, doi: 10.1038/s41598-023-40036-5.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: A Synthetic Minority Oversampling Technique,” 2002.

M. Muntasir Nishat et al., “A Comprehensive Investigation of the Performance of Various Machine Learning Classifiers with SMOTE-ENN Oversampling Technique and Hyperparameter Optimization for an Imbalanced Heart Failure Dataset,” Sci Program, vol. 2022, 2022, doi: 10.1155/2022/3649406.

D. Varma, A. Nehansh, and P. Swathy, “Data Preprocessing Toolkit: An Approach to Automate Data Preprocessing,” International Journal of Scientific Research in Engineering and Management, vol. 07, no. 03, March 2023, doi: 10.55041/ijsrem18270.

S. Das, M. S. Imtiaz, N. H. Neom, N. Siddique, and H. Wang, “A hybrid approach for Bangla sign language recognition using deep transfer learning model with random forest classifier,” Expert Syst Appl, vol. 213, March 2023, doi: 10.1016/j.eswa.2022.118914.

G. Devisetty and N. S. Kumar, “Bradycardia Prediction Using Decision Tree Algorithm and Comparing Its Accuracy with Support Vector Machines,” in E3S Web of Conferences, EDP Sciences, July 2023. doi: 10.1051/e3sconf/202339909004.

A. Primajaya and B. N. Sari, “Random Forest Algorithm for Precipitation Prediction,” 2018.

C. Zhang, Y. Liu, and N. Tie, “Forest Land Resource Information Acquisition with Sentinel-2 Imagery Using Support Vector Machines, K-Nearest Neighbor, Random Forest, Decision Tree, and Multi-Layer Perceptron,” Forests, vol. 14, no. 2, February 2023, doi: 10.3390/f14020254.

T. Setiyorini et al., “Application of Gini Index and K-Nearest Neighbor for Classifying Cognitive Level of Questions in Bloom's Taxonomy,” Jurnal Pilar Nusa Mandiri, vol. 13, no. 2, 2017, [Online]. Available: http://www.nusamandiri.ac.id1;http://www.swadharma.ac.id/2

L. C, P. S, A. H. Kashyap, A. Rahaman, S. Niranjan, and V. Niranjan, “Prediction of New Biomarkers for Lung Cancer Using Random Forest Classifier,” Cancer Inform, vol. 22 January 2023, doi: 10.1177/11769351231167992.

P. Soltanzadeh and M. Hashemzadeh, “RCSMOTE: A range-controlled synthetic minority oversampling technique for addressing class imbalance problems,” Inf Sci (NY), vol. 542, pp. 92–111, January 2021, doi: 10.1016/j.ins.2020.07.014.

K. Abhishek and M. Abdelaziz, Machine learning for imbalanced data: addressing imbalanced datasets using machine learning and deep learning techniques.

N. P. Y. T. Wijayanti, E. N. Kencana, and I. W. Sumarjaya, “Smote: Its Potential and Drawbacks in Surveys,” E-Journal of Mathematics, vol. 10, no. 4, p. 235, Nov. 2021, doi: 10.24843/mtk.2021.v10.i04.p348.

A. Salvadorrgarcíaa, M. R. Pratii, and B. Franciscooherrera, “Learning from Imbalanced Datasets.”

B. Santoso, H. Wijayanto, K. A. Notodiputro, and B. Sartono, “Synthetic Oversampling Methods for Dealing with Imbalanced Class Problems: A Review,” in IOP Conference Series: Earth and Environmental Sciences, Institute of Physics Publishing, Apr. 2017. doi: 10.1088/1755-1315/58/1/012031.

G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A Study on the Behavior of Some Methods for Balancing Machine Learning Training Data.”