Depression Prediction Among University Students Using a Random Forest Algorithm Based on Psychosocial Data
Prediksi Depresi Mahasiswa: Pendekatan Berbasis Data Psikososial Menggunakan Algoritma Random Forest
DOI:
https://doi.org/10.30787/restia.v4i1.2100Keywords:
Student Depression, Machine Learning, Prediction, Random ForestAbstract
College students' mental health is a critical issue that is gaining increasing attention, particularly regarding depression, which significantly impacts quality of life and academic achievement. This study aims to develop a predictive model for depression in college students based on psychosocial data using the Random Forest algorithm. The data used is a public secondary dataset from Kaggle with 1,000 samples, covering demographic variables, lifestyle, and psychological indicators. The analysis process included data preprocessing, class balancing, model training, and evaluation using accuracy, precision, recall, F1-score, and confusion matrix metrics. Test results showed that the Random Forest model was able to predict depression with 87.0% accuracy, 86.1% precision, 87.4% recall, and 86.7% F1-score, demonstrating good and stable performance. Word cloud visualization identified academic pressure, stress, and anxiety as dominant factors. Compared to previous research using the SVM algorithm, Random Forest demonstrated improved performance, particularly in handling complex and imbalanced data. This study confirms the effectiveness of the Random Forest-based machine learning approach in supporting the early detection of college students' depression and provides a foundation for the development of mental health monitoring systems in higher education settings.
References
G. Limenih, A. MacDougall, M. Wedlake, and E. Nouvet, “Depression and global mental health in the global south: a critical analysis of policy and discourse,” Int. J. Soc. Determ. Heal. Heal. Serv., vol. 54, no. 2, pp. 95–107, 2024.
K. S. Chaudhari, M. P. Dhapkas, A. Kumar, and R. G. Ingle, “Mental disorders–a serious global concern that needs to address,” Int J Pharm Qual Assur, vol. 15, no. 02, pp. 973–978, 2024.
G. I. Al Jowf et al., “A public health perspective of post-traumatic stress disorder,” Int. J. Environ. Res. Public Health, vol. 19, no. 11, p. 6474, 2022.
N. R. Rohmah and M. Mahrus, “Mengidentifikasi Faktor-faktor Penyebab Stres Akademik pada Mahasiswa dan Strategi Pengelolaannya,” JIEM J. Islam. Educ. Manag., vol. 5, no. 1, pp. 36–43, 2024.
V. Blanco, M. Salmerón, P. Otero, and F. L. Vázquez, “Symptoms of Depression, Anxiety, and Stress and Prevalence of Major Depression and Its Predictors in Female University Students.,” Int. J. Environ. Res. Public Health, vol. 18, no. 11, May 2021, doi: 10.3390/ijerph18115845.
S. Verma, C. Sharma, G. Aggarwal, and P. Upadhya, “Artificial intelligence-based approach for classification and prediction of mental health,” in 2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence), IEEE, 2024, pp. 708–713.
B. Acharya, “Comparative analysis of machine learning algorithms: KNN, SVM, decision tree and logistic regression for efficiency and performance,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 12, no. 11, pp. 614–619, 2024.
L. F. Voges, L. C. Jarren, and S. Seifert, “Exploitation of surrogate variables in random forests for unbiased analysis of mutual impact and importance of features,” Bioinformatics, vol. 39, no. 8, p. btad471, 2023.
J. Buesa et al., “Predictors of postpartum depression in threatened preterm labour: importance of psychosocial factors,” Spanish J. Psychiatry Ment. Heal., vol. 17, no. 1, pp. 51–54, 2024.
H. S. BALTACI, D. Kucuker, I. Ozkilic, U. Y. Karatas, and H. A. Ozdemir, “Investigation of Variables Predicting Depression in College Students.,” Eurasian J. Educ. Res., no. 93, 2021.
W. Narkbunnum and K. Wisaeng, “Prediction of Depression for Undergraduate Students Based on Imbalanced Data by Using Data Mining Techniques,” Appl. Syst. Innov., vol. 5, no. 6, p. 120, 2022.
G. S. Dhillon and S. Kaur, “Depression Among College Students: Prevalence And Associated Risk Factors,” Indian J. Ment. Heal., vol. 9, no. 2, 2022.
N. Kosaraju, S. R. Sankepally, and K. Mallikharjuna Rao, “Categorical data: Need, encoding, selection of encoding method and its emergence in machine learning models—a practical review study on heart disease prediction dataset using pearson correlation,” in Proceedings of International Conference on Data Science and Applications: ICDSA 2022, Volume 1, Springer, 2023, pp. 369–382.
A. Bansal, A. Verma, S. Singh, and Y. Jain, “Combination of oversampling and undersampling techniques on imbalanced datasets,” in International Conference on Innovative Computing and Communications: Proceedings of ICICC 2022, Volume 3, Springer, 2022, pp. 647–656.
M. Maindola et al., “Utilizing random forests for high-accuracy classification in medical diagnostics,” in 2024 7th International Conference on Contemporary Computing and Informatics (IC3I), IEEE, 2024, pp. 1679–1685.
K. Vita, P. Yana, B. Liliia, and V. Dmytro, “AUTOMATED DETECTION OF POTENTIALLY DANGEROUS URL ADDRESSES USING THE SCIKIT-LEARN LIBRARY,” pp. 353–357, 2024.
F. Aziz, S. Abasa, and A. Andyka, “Pengembangan dan Validasi Model Hybrid Machine Learning untuk Diagnosis Awal Depresi,” J. Pharm. Appl. Comput. Sci., vol. 3, no. 1, pp. 8–15, 2025.
O. Iparraguirre-Villanueva, C. Paulino-Moreno, A. Epifanía-Huerta, and C. Torres-Ceclén, “Machine Learning Models to Classify and Predict Depression in College Students.,” Int. J. Interact. Mob. Technol., vol. 18, no. 14, 2024.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Abiyya Alfahrizi Putra Arifiansyah Abiyya, Muhammad Afandi, Dodi Dwi Riskianto, Sudriyanto Sudriyanto

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.










