Machine learning approach for predicting under-five mortality determinants in Ethiopia: evidence from the 2016 Ethiopian Demographic and Health Survey

Abstract There is a dearth of literature on the use of machine learning models to predict important under-five mortality risks in Ethiopia. In this study, we showed spatial variations of under-five mortality and used machine learning models to predict its important sociodemographic determinants in Ethiopia. The study data were drawn from the 2016 Ethiopian Demographic and Health Survey. We used three machine learning models such as random forests, logistic regression, and K-nearest neighbors as well as one traditional logistic regression model to predict under-five mortality determinants. For each machine learning model, measures of model accuracy and receiver operating characteristic curves were used to evaluate the predictive power of each model. The descriptive results show that there are considerable regional variations in under-five mortality rates in Ethiopia. The under-five mortality prediction ability was found to be between 46.3 and 67.2% for the models considered, with the random forest model (67.2%) showing the best performance. The best predictive model shows that household size, time to the source of water, breastfeeding status, number of births in the preceding 5 years, sex of a child, birth intervals, antenatal care, birth order, type of water source, and mother’s body mass index play an important role in under-five mortality levels in Ethiopia. The random forest machine learning model produces a better predictive power for estimating under-five mortality risk factors and may help to improve policy decision-making in this regard. Childhood survival chances can be improved considerably by using these important factors to inform relevant policies.

Download Full-text

Predictive models and under-five mortality determinants in Ethiopia: evidence from the 2016 Ethiopian Demographic and Health Survey

10.21203/rs.2.13113/v1 ◽

2019 ◽

Author(s):

Fikrewold Bitew ◽

Samuel H. Nyarko ◽

Lloyd Potter ◽

Corey S. Sparks

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Health Survey ◽

Mortality Risk ◽

Predictive Models ◽

Predictive Power ◽

Proportional Hazard ◽

Under Five ◽

Cox Proportional Hazard

Abstract Background There is a dearth of literature on predictive models estimating under-five mortality risk in Ethiopia. In this study, we develop a spatial map and predictive models to predict the sociodemographic determinants of under-five mortality in Ethiopia.Methods The study data were drawn from the 2016 Ethiopian Demographic and Health Survey. We used machine learning algorithms such as random forest, logistic regression, and Cox-proportional hazard models to predict the sociodemographic risks for under-five mortality in Ethiopia. The Receiver Operating Characteristic curve was used to evaluate the predictive power of the models.Results There are considerable regional variations in under-five mortality rates in Ethiopia. The under-five mortality prediction ability was found to be 88.7% for the random forest model, 68.3% for the logistic regression model, and 68.0% for the Cox-Proportional Hazard model. Maternal age at birth, sex of a child, previous birth interval, water source, contraceptive use, health facility delivery services, antenatal and post-natal care checkups have been found to be significantly associated with under-five mortality in Ethiopia.Conclusions The random forest machine learning algorithm produces a higher predictive power for under-five mortality risk factors for the study sample. There is a need to improve the quality and access to health care services to enhance childhood survival chances in the country.

Download Full-text

A comparison of regularized logistic regression and random forest machine learning models for daytime diagnosis of obstructive sleep apnea

Medical & Biological Engineering & Computing ◽

10.1007/s11517-020-02206-9 ◽

2020 ◽

Vol 58 (10) ◽

pp. 2517-2529

Author(s):

Farahnaz Hajipour ◽

Mohammad Jafari Jozani ◽

Zahra Moussavi

Keyword(s):

Machine Learning ◽

Obstructive Sleep Apnea ◽

Logistic Regression ◽

Sleep Apnea ◽

Random Forest ◽

Learning Models ◽

Obstructive Sleep ◽

Machine Learning Models

Download Full-text

Drug Classification using Black-box models and Interpretability

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38203 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1518-1529

Author(s):

Pooja Thakkar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Learning Models ◽

Drug Classification ◽

Box Models ◽

Machine Learning Model ◽

Black Box Models ◽

Insight Into ◽

Machine Learning Models

Abstract: The focus of this study is on drug categorization utilising Machine Learning models, as well as interpretability utilizing LIME and SHAP to get a thorough understanding of the ML models. To do this, the researchers used machine learning models such as random forest, decision tree, and logistic regression to classify drugs. Then, using LIME and SHAP, they determined if these models were interpretable, which allowed them to better understand their results. It may be stated at the conclusion of this paper that LIME and SHAP can be utilised to get insight into a Machine Learning model and determine which attribute is accountable for the divergence in the outcomes. According to the LIME and SHAP results, it is also discovered that Random Forest and Decision Tree ML models are the best models to employ for drug classification, with Na to K and BP being the most significant characteristics for drug classification. Keywords: Machine Learning, Back-box models, LIME, SHAP, Decision Tree

Download Full-text

Statistical and machine learning models for classification of human wear and delivery days in accelerometry data

10.1101/2020.12.31.424867 ◽

2021 ◽

Author(s):

Ryan Moore ◽

Kristin R. Archer ◽

Leena Choi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Human Activity ◽

Recurrent Neural Network ◽

Learning Models ◽

Learning Context ◽

Machine Learning Models

AbstractPurposeAccelerometers are increasingly utilized in healthcare research to assess human activity. Accelerometry data are often collected by mailing accelerometers to participants, who wear the accelerometers to collect data on their activity. The devices are then mailed back to the laboratory for analysis. We develop models to classify days in accelerometry data as activity from actual human wear or the delivery process. These models can be used to automate the cleaning of accelerometry datasets that are adulterated with activity from delivery.MethodsFor the classification of delivery days in accelerometry data, we developed statistical and machine learning models in a supervised learning context using a large human activity and delivery labeled accelerometry dataset. We extracted several features, which were included to develop random forest, logistic regression, mixed effects regression, and multilayer perceptron models, while convolutional neural network, recurrent neural network, and hybrid convolutional recurrent neural network models were developed without feature extraction. Model performances were assessed using Monte Carlo cross-validation.ResultsWe found that a hybrid convolutional recurrent neural network performed best in the classification task with an F1 score of 0.960 but simpler models such as logistic regression and random forest also had excellent performance with F1 scores of 0.951 and 0.957, respectively.ConclusionThe models developed in this study can be used to classify days in accelerometry data as either human or delivery activity. An analyst can weigh the larger computational cost and greater performance of the convolutional recurrent neural network against the faster but slightly less powerful random forest or logistic regression. The best performing models for classification of delivery data are publicly available on the open source R package, PhysicalActivity.

Download Full-text

Predictive models and under-five mortality determinants in Ethiopia: evidence from the 2016 Ethiopian Demographic and Health Survey

10.21203/rs.2.13113/v3 ◽

2020 ◽

Author(s):

Fikrewold Bitew ◽

Samuel H. Nyarko ◽

Lloyd Potter ◽

Corey S. Sparks

Keyword(s):

Random Forest ◽

Health Survey ◽

Mortality Risk ◽

Predictive Models ◽

Access To Health Care ◽

Predictive Power ◽

Learning Algorithm ◽

Demographic And Health Survey ◽

Prediction Ability ◽

Under Five

Abstract Background: There is a dearth of literature on predictive models estimating under-five mortality risk in Ethiopia. In this study, we develop a spatial map and predictive models to predict the sociodemographic determinants of under-five mortality in Ethiopia. Methods: The study data were drawn from the 2016 Ethiopian Demographic and Health Survey. We used three predictive models to predict under-five mortality within this sample. The three techniques are random forests, logistic regression, and k-nearest neighbors For each model, measures of model accuracy and Receiver Operating Characteristic curves are used to evaluate the predictive power of each model. Results: There are considerable regional variations in under-five mortality rates in Ethiopia. The under-five mortality prediction ability was found to be moderate to low for the models considered, with the random forest model showing the best performance. Maternal age at birth, sex of a child, previous birth interval, water source, health facility delivery services, antenatal and post-natal care checkups, breastfeeding behavior and household size have been found to be significantly associated with under-five mortality in Ethiopia. Conclusions: The random forest machine learning algorithm produces a higher predictive power for under-five mortality risk factors for the study sample. There is a need to improve the quality and access to health care services to enhance childhood survival chances in the country.

Download Full-text

Fake News Data Exploration and Analytics

Electronics ◽

10.3390/electronics10192326 ◽

2021 ◽

Vol 10 (19) ◽

pp. 2326

Author(s):

Mazhar Javed Awan ◽

Awais Yasin ◽

Haitham Nobanee ◽

Ahmed Abid Ali ◽

Zain Shahzad ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Random Forest Classifier ◽

The Internet ◽

Fake News ◽

Learning Models ◽

Decision Tree Classifier ◽

Tree Classifier ◽

Machine Learning Models

Before the internet, people acquired their news from the radio, television, and newspapers. With the internet, the news moved online, and suddenly, anyone could post information on websites such as Facebook and Twitter. The spread of fake news has also increased with social media. It has become one of the most significant issues of this century. People use the method of fake news to pollute the reputation of a well-reputed organization for their benefit. The most important reason for such a project is to frame a device to examine the language designs that describe fake and right news through machine learning. This paper proposes models of machine learning that can successfully detect fake news. These models identify which news is real or fake and specify the accuracy of said news, even in a complex environment. After data-preprocessing and exploration, we applied three machine learning models; random forest classifier, logistic regression, and term frequency-inverse document frequency (TF-IDF) vectorizer. The accuracy of the TFIDF vectorizer, logistic regression, random forest classifier, and decision tree classifier models was approximately 99.52%, 98.63%, 99.63%, and 99.68%, respectively. Machine learning models can be considered a great choice to find reality-based results and applied to other unstructured data for various sentiment analysis applications.

Download Full-text

Predictive models and under-five mortality determinants in Ethiopia: evidence from the 2016 Ethiopian Demographic and Health Survey

10.21203/rs.2.13113/v2 ◽

2020 ◽

Author(s):

Fikrewold Bitew ◽

Samuel H. Nyarko ◽

Lloyd Potter ◽

Corey S. Sparks

Keyword(s):

Random Forest ◽

Health Survey ◽

Mortality Risk ◽

Predictive Models ◽

Access To Health Care ◽

Predictive Power ◽

Learning Algorithm ◽

Demographic And Health Survey ◽

Prediction Ability ◽

Under Five

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

High performance logistic regression for privacy-preserving genome analysis

BMC Medical Genomics ◽

10.1186/s12920-020-00869-9 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Martine De Cock ◽

Rafael Dowsley ◽

Anderson C. A. Nascimento ◽

Davis Railsback ◽

Jianwei Shen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Genome Analysis ◽

Local Area Network ◽

Local Area ◽

Activation Function ◽

Area Network ◽

Learning Models ◽

Data Set ◽

Machine Learning Models

Abstract Background In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao’s garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.

Download Full-text