Background and Aim: Neonatal hypoxic-ischemic encephalopathy (HIE) is a clinical syndrome characterized by impaired brain function resulting from oxygen deprivation and reduced cerebral blood flow. Developing predictive models can serve as valuable tools for physicians in forecasting disease outcomes and facilitating early interventions. The present study was conducted with the aim of constructing a predictive model for neonatal hypoxic-ischemic encephalopathy using data mining algorithms.
Materials and Methods: This applied study was conducted using a descriptive approach. In the first stage, the factors influencing the prediction of neonatal hypoxic-ischemic encephalopathy were identified through expert surveys. In the second stage, data pertaining to 4,000 neonates were collected from the Iman system, available in the database of the Ministry of Health and Medical Education, during the years 2020–2021. Following preprocessing, a dataset comprising 3,962 records with 13 features was extracted. Subsequently, predictive models were developed using algorithms including artificial neural networks, decision tree variants, random forest, support vector machines, logistic regression, and Bayesian networks. Model construction was performed using the Python programming language within the Anaconda environment. Finally, performance evaluation and comparison were carried out using metrics such as accuracy, precision, specificity, F1-score, and the Area Under the Curve (AUC).
Results: The findings of the study revealed that the Area Under the Receiver Operating Characteristic Curve (AUROC) for models developed using logistic regression, artificial neural networks, random forest, Bayesian networks, support vector machines, and decision trees were 86%, 86%, 84%, 82%, 76%, and 74%, respectively. The highest performance was achieved by the logistic regression algorithm, with an accuracy of 81%, sensitivity of 85%, and specificity of 96%. The greatest sensitivity was observed in logistic regression, artificial neural networks, and support vector machines, whereas the naïve Bayesian algorithm demonstrated the lowest performance metrics. In the predictive model for hypoxic-ischemic encephalopathy, the most influential feature was the first-minute Apgar score, while the least influential factor was delivery outside the hospital.
Conclusion: The findings of the present study indicated that the predictive model for neonatal hypoxic-ischemic encephalopathy based on the logistic regression algorithm demonstrated superior performance. It is anticipated that the application of practical data-driven algorithms for neonates with hypoxic-ischemic encephalopathy will play a crucial role in the rapid identification of the condition and the provision of appropriate treatment. Such approaches can enable healthcare professionals to act within the critical window of opportunity, thereby improving the quality of care, preventing disease progression, and reducing the severity of adverse outcomes.