Artificial Neural Network-Based Diabetes Prediction Analysis Using CDC Diabetes Health Indicators Data

Analisis dan Prediksi Diabetes Menggunakan Artificial Neural Network dengan Dataset CDC Diabetes Health Indicators

Authors

  • Dodi Dwi Riskianto Dwi Universitas Nurul Jadid
  • Muhammad Afandi Universitas Nurul Jadid, Probolinggo
  • M. Raihan Ramadhan Universitas Nurul Jadid, Probolinggo
  • Sudriyanto Sudriyanto Universitas Nurul Jadid, Probolinggo

DOI:

https://doi.org/10.30787/restia.v4i1.2096

Keywords:

Diabetic, Artificial Neural Network, Prediction, Health Indicators, Machine Learning

Abstract

Diabetes mellitus is a chronic disease with increasing prevalence and requires effective early detection efforts. This study aims to develop a diabetes risk prediction model using an Artificial Neural Network (ANN) based on non-laboratory health indicators. The dataset used is the CDC Diabetes Health Indicators with a large amount of data and characteristics of classes that are not fully balanced. The research stages include data preprocessing that includes handling missing values, encoding categorical data using one-hot encoding, normalization of numerical features, and analysis of the target class distribution. The ANN model was trained using a Multilayer Perceptron architecture with dropout regularization and L2 penalty and AdamW optimization. The evaluation results show that the model achieved an accuracy of 86.45%, a precision of 85.2%, a recall of 82.7%, and an AUC-ROC value of 0.89. Although the accuracy is in the medium range for a large dataset, the high AUC value indicates excellent model discrimination ability. This performance is affected by the limited number of non-laboratory features used and the imbalanced class distribution. The findings of this study indicate that ANN based on simple health indicators has the potential to be used as a diabetes risk screening tool in primary healthcare. Further research is recommended to apply class balancing techniques, model interpretability analysis, and external validation in the Indonesian population.

References

S. I. Oktora and D. B. Butar, “Determinants of Diabetes Mellitus Prevalence in Indonesia,” J. Kesehat. Masy., vol. 18, no. 2, pp. 266–273, 2022.

A. R. Kandula, V. V. Samhitha, V. Harshitha, R. N. Sindhuri, and S. A. Nagavalli, “Machine Learning based Screening for Diabetes Risk Prediction,” in 2024 3rd International Conference on Automation, Computing and Renewable Systems (ICACRS), IEEE, 2024, pp. 1188–1193.

N. Nagarjuna and H. N. Lakshmi, “Predictive Modeling of Diabetes Mellitus Utilizing Machine Learning Techniques,” CVR J. Sci. Technol., vol. 26, no. 1, pp. 112–117, 2024.

T. Elansari, M. Ouanan, and H. Bourray, “Modeling of Multilayer Perceptron Neural Network Hyperparameter Optimization and Training,” 2023.

D. Kurniawan, T. T. Wulansari, and N. A. D. Febrianti, “Model Prediction Using Artificial Neural Network (ANN) to Strengthen Diagnostic Analysis of Diabetes Melitus,” ComTech Comput. Math. Eng. Appl., vol. 15, no. 2, pp. 83–91, 2024.

S. M. Lawan and S. J. Kawu, “Artificial Neural Network (ANN) Prediction of Guide Vane for Effective Utilization of Pump as Turbine (PAT),” in 2024 IEEE International Conference on Power and Energy (PECon), IEEE, 2024, pp. 34–39.

P. Dhiman et al., “Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review,” BMC Med. Res. Methodol., vol. 22, no. 1, p. 101, 2022.

S. Demir and E. K. Sahin, “Random forest importance-based feature ranking and subset selection for slope stability assessment using the ranger implementation,” Avrupa Bilim ve Teknol. Derg., no. 48, pp. 23–28, 2023.

S. Prakash, S. Singh, and A. Mankar, “Bridging data gaps: A comparative study of different imputation methods for numeric datasets,” in 2024 International Conference on Data Science and Network Security (ICDSNS), IEEE, 2024, pp. 1–7.

K. Song, R. Solozabal, H. Li, M. Takac, L. Ren, and F. Karray, “Robustly train normalizing flows via kl divergence regularization,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2024, pp. 15047–15055.

T. Filimonova, H. Samoylenko, A. Selivanova, Y. Yurchenko, and A. Parashchak, “CONSTRUCTION OF A NEURAL NETWORK FOR HANDWRITTEN DIGITS RECOGNITION BASED ON TENSORFLOW LIBRARY APPLYING AN ERROR BACKPROPAGATION ALGORITHM.,” Eastern-European J. Enterp. Technol., vol. 126, no. 2, 2023.

Z. Lin, S. Zhang, Y. Zhou, H. Wang, and S. Wang, “Learning rate burst for superior SGDM and AdamW integration,” J. Intell. Fuzzy Syst., p. JIFS-239157, 2024.

J. Qiu, “An analysis of model evaluation with cross-validation: techniques, applications, and recent advances,” Adv. Econ. Manag. Polit. Sci., vol. 99, pp. 69–72, 2024.

M. Owusu-Adjei, J. Ben Hayfron-Acquah, T. Frimpong, and G. Abdul-Salaam, “Imbalanced class distribution and performance evaluation metrics: A systematic review of prediction accuracy for determining model performance in healthcare systems,” PLOS Digit. Heal., vol. 2, no. 11, p. e0000290, 2023.

G. S. Sahoo, S. Dass, and F. Prakash, “The Synergy of Advanced Language Models, Sequential Skeleton Features, and GAN-Enhanced Data Augmentation for Unprecedented Precision,” in 2023 International Conference on System, Computation, Automation and Networking (ICSCAN), IEEE, 2023, pp. 1–6.

T. F. Monaghan et al., “Foundational statistical principles in medical research: sensitivity, specificity, positive predictive value, and negative predictive value,” Medicina (B. Aires)., vol. 57, no. 5, p. 503, 2021.

R. H. Stern, “Interpretation of the area under the ROC curve for risk prediction models. arXiv,” arXiv Prepr. arXiv2102.11053, 2021.

P. H. Artanti, “Penerapan Neural Network dengan optimasi Ant Colony Optimization dan Backpropagation untuk membangun model prediksi diabetes tahap awal,” 2023, Universitas Islam Negeri Maulana Malik Ibrahim.

Downloads

Published

2026-02-02

How to Cite

Dwi, D. D. R., Afandi, M., Ramadhan, M. R., & Sudriyanto, S. (2026). Artificial Neural Network-Based Diabetes Prediction Analysis Using CDC Diabetes Health Indicators Data: Analisis dan Prediksi Diabetes Menggunakan Artificial Neural Network dengan Dataset CDC Diabetes Health Indicators. Jurnal Riset Sistem Dan Teknologi Informasi, 4(1), 22–29. https://doi.org/10.30787/restia.v4i1.2096

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.