Risk factors identification and prediction of anemia among women in Bangladesh using machine learning techniques
Background: Anemia is a major public health problem with raising its prevalence worldwide including Bangladesh. Objectives: To identify the risk factors of anemia among women in Bangladesh and its prediction using machine learning (ML) based techniques. Methods: The anemia dataset, comprising of 3,020 respondents, was extracted from the Bangladesh demographic and health survey (BDHS). Two feature selection techniques as logistic regression (LR) and random forest (RF) have been utilized to determine the risk factors of anemia. Additionally, eight ML-based techniques, namely LR, linear discriminant analysis (LDA), K-nearest neighborhood (KNN), support vector machine (SVM), quadratic discriminant analysis (QDA), neural network (NN), classification and regression tree (CART), and RF have been also utilized to predict anemia disease among women in Bangladesh. Classification accuracy and area under the curve (AUC) are used to evaluate the performances of these classifiers. Results: LR and RF-based feature selection results indicate that out of 15 factors, 13 for LR and 14 factors for RF appear to be significant risk factors for anemia among women. All predictive models provide the highest classification accuracy and AUC from 74.10-81.29% and 0.744-0.819 under RF features. However, the combination of RF-based feature selection along with RF-based classifier gives the highest classification accuracy (81.29%) and AUC (0.819). Conclusion: Out of eight predictive models, the RF-RF based combination model shows the best performance for the prediction of anemia. This study suggests policymakers to make appropriate decisions to control the anemia using these mentioned combinations to save time and reduce the cost for Bangladeshi women.