Reliable Prediction Models Based on Enriched Data for Identifying the Mode of Childbirth by Using Machine Learning Methods: Development Study

Background The use of artificial intelligence has revolutionized every area of life such as business and trade, social and electronic media, education and learning, manufacturing industries, medicine and sciences, and every other sector. The new reforms and advanced technologies of artificial intelligence have enabled data analysts to transmute raw data generated by these sectors into meaningful insights for an effective decision-making process. Health care is one of the integral sectors where a large amount of data is generated daily, and making effective decisions based on these data is therefore a challenge. In this study, cases related to childbirth either by the traditional method of vaginal delivery or cesarean delivery were investigated. Cesarean delivery is performed to save both the mother and the fetus when complications related to vaginal birth arise. Objective The aim of this study was to develop reliable prediction models for a maternity care decision support system to predict the mode of delivery before childbirth. Methods This study was conducted in 2 parts for identifying the mode of childbirth: first, the existing data set was enriched and second, previous medical records about the mode of delivery were investigated using machine learning algorithms and by extracting meaningful insights from unseen cases. Several prediction models were trained to achieve this objective, such as decision tree, random forest, AdaBoostM1, bagging, and k-nearest neighbor, based on original and enriched data sets. Results The prediction models based on enriched data performed well in terms of accuracy, sensitivity, specificity, F-measure, and receiver operating characteristic curves in the outcomes. Specifically, the accuracy of k-nearest neighbor was 84.38%, that of bagging was 83.75%, that of random forest was 83.13%, that of decision tree was 81.25%, and that of AdaBoostM1 was 80.63%. Enrichment of the data set had a good impact on improving the accuracy of the prediction process, which supports maternity care practitioners in making decisions in critical cases. Conclusions Our study shows that enriching the data set improves the accuracy of the prediction process, thereby supporting maternity care practitioners in making informed decisions in critical cases. The enriched data set used in this study yields good results, but this data set can become even better if the records are increased with real clinical data.

Download Full-text

Data Enrichment and Developing Reliable Prediction Models for Identifying Mode of Delivery in Healthcare Practice Using Machine Learning Methods (Preprint)

10.2196/preprints.28856 ◽

2021 ◽

Author(s):

Zahid Ullah ◽

Farrukh Saleem ◽

Mona Jamjoom

Keyword(s):

Machine Learning ◽

Health Care ◽

Random Forest ◽

Decision Tree ◽

Maternity Care ◽

Nearest Neighbor ◽

Prediction Models ◽

Mode Of Delivery ◽

K Nearest Neighbor ◽

Reliable Prediction

BACKGROUND The use of artificial intelligence (AI) has revolutionized every area of life such as business and trade, social and electronic media, education and learning, manufacturing industries, medical and sciences, and every other sector. The new reforms and advanced technologies of AI have enabled data analysts to transmute raw data generated by these sectors into meaningful insights for an effective decision-making process. Health care is one of the integral sectors where a large amount of data is generated daily, and making effective decisions based on this data is therefore a challenge. In health care, cases related to childbirth either by the traditional method of vaginal delivery or cesarean delivery have been investigated in this study. Cesarean delivery is performed to save both mother and fetal lives when complications arise related to vaginal birth. OBJECTIVE To develop reliable prediction models for a maternity care decision support system to predict mode of delivery before birth. METHODS This study is conducted in two folds for identifying the mode of delivery: firstly, to enrich the existing dataset; secondly, to investigate previous medical records about the mode of delivery using machine learning algorithms and extract meaningful insight into the unseen cases. To achieve this objective, several prediction models were trained such as Decision Tree (DT), Random Forest (RF), AdaBoostM1 (AB), Bagging, and k-Nearest Neighbor (k-NN), based on original and enriched datasets. RESULTS To achieve the objective, several prediction models were trained such as Decision Tree (DT), Random Forest (RF), AdaBoostM1 (AB), Bagging, and k-Nearest Neighbor (k-NN) based on original and enriched datasets. As an outcome, the prediction models based on enriched data performed well in terms of accuracy, sensitivity, specificity, F-measure, and ROC. Specifically, k-NN outperformed with an accuracy of 84.38%, Bagging (83.75%), RF (83.13%), DT (81.25%), and AB (80.63%). In the end, enriching the dataset improves the accuracy of the prediction process, which supports maternity care practitioners in making decisions for critical cases. CONCLUSIONS Enriching the dataset improves the accuracy of the prediction process, which supports maternity care practitioners in making decisions for critical cases. The enriched dataset in its current stage used in this study yields better results, but this could be even better if its records were increased with real clinical data.

Download Full-text

Machine Learning Application for Classification Prediction of Household’s Welfare Status

Journal on Information Technology and Computer Engineering ◽

10.25077/jitce.4.02.72-82.2020 ◽

2020 ◽

Vol 4 (02) ◽

pp. 72-82

Author(s):

Nofriani Nofriani

Keyword(s):

Machine Learning ◽

Random Forest ◽

Social Welfare ◽

Nearest Neighbor ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

The Government ◽

Classification Prediction

Various approaches have been attempted by the Government of Indonesia to eradicate poverty throughout the country, one of which is equitable distribution of social assistance for target households according to their classification of social welfare status. This research aims to re-evaluate the prior evaluation of five well-known machine learning techniques; Naïve Bayes, Random Forest, Support Vector Machines, K-Nearest Neighbor, and C4.5 Algorithm; on how well they predict the classifications of social welfare statuses. Afterwards, the best-performing one is implemented into an executable machine learning application that may predict the user’s social welfare status. Other objectives are to analyze the reliability of the chosen algorithm in predicting new data set, and generate a simple classification-prediction application. This research uses Python Programming Language, Scikit-Learn Library, Jupyter Notebook, and PyInstaller to perform all the methodology processes. The results shows that Random Forest Algorithm is the best machine learning technique for predicting household’s social welfare status with classification accuracy of 74.20% and the resulted application based on it could correctly predict 60.00% of user’s social welfare status out of 40 entries.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

Modeling Barrier Island Habitats Using Landscape Position Information

Remote Sensing ◽

10.3390/rs11080976 ◽

2019 ◽

Vol 11 (8) ◽

pp. 976

Author(s):

Nicholas M. Enwright ◽

Lei Wang ◽

Hongqing Wang ◽

Michael J. Osland ◽

Laura C. Feher ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Barrier Island ◽

Barrier Islands ◽

Machine Learning Algorithms ◽

Landscape Position ◽

K Nearest Neighbor ◽

Island Habitats

Barrier islands are dynamic environments because of their position along the marine–estuarine interface. Geomorphology influences habitat distribution on barrier islands by regulating exposure to harsh abiotic conditions. Researchers have identified linkages between habitat and landscape position, such as elevation and distance from shore, yet these linkages have not been fully leveraged to develop predictive models. Our aim was to evaluate the performance of commonly used machine learning algorithms, including K-nearest neighbor, support vector machine, and random forest, for predicting barrier island habitats using landscape position for Dauphin Island, Alabama, USA. Landscape position predictors were extracted from topobathymetric data. Models were developed for three tidal zones: subtidal, intertidal, and supratidal/upland. We used a contemporary habitat map to identify landscape position linkages for habitats, such as beach, dune, woody vegetation, and marsh. Deterministic accuracy, fuzzy accuracy, and hindcasting were used for validation. The random forest algorithm performed best for intertidal and supratidal/upland habitats, while the K-nearest neighbor algorithm performed best for subtidal habitats. A posteriori application of expert rules based on theoretical understanding of barrier island habitats enhanced model results. For the contemporary model, deterministic overall accuracy was nearly 70%, and fuzzy overall accuracy was over 80%. For the hindcast model, deterministic overall accuracy was nearly 80%, and fuzzy overall accuracy was over 90%. We found machine learning algorithms were well-suited for predicting barrier island habitats using landscape position. Our model framework could be coupled with hydrodynamic geomorphologic models for forecasting habitats with accelerated sea-level rise, simulated storms, and restoration actions.

Download Full-text

Identification of Leukemia Subtypes from Microscopic Images Using Convolutional Neural Network

Diagnostics ◽

10.3390/diagnostics9030104 ◽

2019 ◽

Vol 9 (3) ◽

pp. 104 ◽

Cited By ~ 11

Author(s):

Ahmed ◽

Yigit ◽

Isik ◽

Alpkocak

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Leukemia Data

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.

Download Full-text

Execution Assessment of Machine Learning Algorithms for Spam Profile Detection on Instagram

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/561032021 ◽

2021 ◽

Vol 10 (3) ◽

pp. 1889-1894

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Tools ◽

Learning Models ◽

K Nearest Neighbor

Witheverypassingsecondsocialnetworkcommunityisgrowingrapidly,becauseofthat,attackershaveshownkeeninterestinthesekindsofplatformsandwanttodistributemischievouscontentsontheseplatforms.Withthefocus on introducing new set of characteristics and features forcounteractivemeasures,agreatdealofstudieshasresearchedthe possibility of lessening the malicious activities on social medianetworks. This research was to highlight features for identifyingspammers on Instagram and additional features were presentedto improve the performance of different machine learning algorithms. Performance of different machine learning algorithmsnamely, Multilayer Perceptron (MLP), Random Forest (RF), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM)were evaluated on machine learning tools named, RapidMinerand WEKA. The results from this research tells us that RandomForest (RF) outperformed all other selected machine learningalgorithmsonbothselectedmachinelearningtools.OverallRandom Forest (RF) provided best results on RapidMiner. Theseresultsareusefulfortheresearcherswhoarekeentobuildmachine learning models to find out the spamming activities onsocialnetworkcommunities.

Download Full-text

A Research Travelogue on Classification Algorithms using R Programming

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d9014.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 9155-9158

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Statistical Tests ◽

Learning Task ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Domain Experts ◽

R Programming ◽

Training Examples

Classification is a machine learning task which consists in predicting the set association of unclassified examples, whose label is not known, by the properties of examples in a representation learned earlier as of training examples, that label was known. Classification tasks contain a huge assortment of domains and real world purpose: disciplines such as medical diagnosis, bioinformatics, financial engineering and image recognition between others, where domain experts can use the model erudite to sustain their decisions. All the Classification Approaches proposed in this paper were evaluate in an appropriate experimental framework in R Programming Language and the major emphasis is on k-nearest neighbor method which supports vector machines and decision trees over large number of data sets with varied dimensionality and by comparing their performance against other state-of-the-art methods. In this process the experimental results obtained have been verified by statistical tests which support the better performance of the methods. In this paper we have survey various classification techniques of Data Mining and then compared them by using diverse datasets from “University of California: Irvine (UCI) Machine Learning Repository” for acquiring the accurate calculations on Iris Data set.

Download Full-text

Ability of a Machine Learning Algorithm to Predict the Need for Perioperative Red Blood Cells Transfusion in Pelvic Fracture Patients: A Multicenter Cohort Study in China

Frontiers in Medicine ◽

10.3389/fmed.2021.694733 ◽

2021 ◽

Vol 8 ◽

Author(s):

Xueyuan Huang ◽

Yongjun Wang ◽

Bingyu Chen ◽

Yuanshuai Huang ◽

Xinhua Wang ◽

...

Keyword(s):

Machine Learning ◽

Cohort Study ◽

Random Forest ◽

Blood Cells ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Kappa Coefficient ◽

Gradient Boosting ◽

Machine Learning Algorithm ◽

K Nearest Neighbor

Background: Predicting the perioperative requirement for red blood cells (RBCs) transfusion in patients with the pelvic fracture may be challenging. In this study, we constructed a perioperative RBCs transfusion predictive model (ternary classifications) based on a machine learning algorithm.Materials and Methods: This study included perioperative adult patients with pelvic trauma hospitalized across six Chinese centers between September 2012 and June 2019. An extreme gradient boosting (XGBoost) algorithm was used to predict the need for perioperative RBCs transfusion, with data being split into training test (80%), which was subjected to 5-fold cross-validation, and test set (20%). The ability of the predictive transfusion model was compared with blood preparation based on surgeons' experience and other predictive models, including random forest, gradient boosting decision tree, K-nearest neighbor, logistic regression, and Gaussian naïve Bayes classifier models. Data of 33 patients from one of the hospitals were prospectively collected for model validation.Results: Among 510 patients, 192 (37.65%) have not received any perioperative RBCs transfusion, 127 (24.90%) received less-transfusion (RBCs < 4U), and 191 (37.45%) received more-transfusion (RBCs ≥ 4U). Machine learning-based transfusion predictive model produced the best performance with the accuracy of 83.34%, and Kappa coefficient of 0.7967 compared with other methods (blood preparation based on surgeons' experience with the accuracy of 65.94%, and Kappa coefficient of 0.5704; the random forest method with an accuracy of 82.35%, and Kappa coefficient of 0.7858; the gradient boosting decision tree with an accuracy of 79.41%, and Kappa coefficient of 0.7742; the K-nearest neighbor with an accuracy of 53.92%, and Kappa coefficient of 0.3341). In the prospective dataset, it also had a food performance with accuracy 81.82%.Conclusion: This multicenter retrospective cohort study described the construction of an accurate model that could predict perioperative RBCs transfusion in patients with pelvic fractures.

Download Full-text

Machine Learning Based Indoor Localization using Wi-Fi Fingerprinting

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a2133.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 502-506

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Indoor Localization ◽

The Internet ◽

Location Prediction ◽

K Nearest Neighbor ◽

Data Types ◽

Specific Location ◽

Web Access

The aim of indoor localization is to locate the objects inside a location wirelessly. This paper reports the models that predict the location along with floor and coordinates from the WAPs (Web Access Points) signal strengths of a user who connects to the internet at a specific location which had three locations. Starting with the cleaning of data, then assigning attributes into proper data types, making subset of dataset for each location, examining each column, and normalizing WAPs rows in order to build models. Different algorithms have been used to predict the location, floor, and coordinates of a logged in user. The models that have been used in this paper are k-Nearest Neighbor (k-NN) for location prediction, random forest for floor prediction and regression with k-NN for coordinate prediction.

Download Full-text

Booking Prediction Models for Peer-to-peer Accommodation Listings using Logistics Regression, Decision Tree, K-Nearest Neighbor, and Random Forest Classifiers

Journal of Information Systems Engineering and Business Intelligence ◽

10.20473/jisebi.6.2.123-132 ◽

2020 ◽

Vol 6 (2) ◽

pp. 123

Author(s):

Mochammad Agus Afrianto ◽

Meditya Wasesa

Keyword(s):

Random Forest ◽

Decision Tree ◽

Revenue Management ◽

Nearest Neighbor ◽

Prediction Models ◽

Model Development ◽

Peer To Peer ◽

K Nearest Neighbor ◽

Logistics Regression ◽

Roc Score

Background: Literature in the peer-to-peer accommodation has put a substantial focus on accommodation listings' price determinants. Developing prediction models related to the demand for accommodation listings is vital in revenue management because accurate price and demand forecasts will help determine the best revenue management responses.Objective: This study aims to develop prediction models to determine the booking likelihood of accommodation listings.Methods: Using an Airbnb dataset, we developed four machine learning models, namely Logistics Regression, Decision Tree, K-Nearest Neighbor (KNN), and Random Forest Classifiers. We assessed the models using the AUC-ROC score and the model development time by using the ten-fold three-way split and the ten-fold cross-validation procedures.Results: In terms of average AUC-ROC score, the Random Forest Classifiers outperformed other evaluated models. In three-ways split procedure, it had a 15.03% higher AUC-ROC score than Decision Tree, 2.93 % higher than KNN, and 2.38% higher than Logistics Regression. In the cross-validation procedure, it has a 26,99% higher AUC-ROC score than Decision Tree, 4.41 % higher than KNN, and 3.31% higher than Logistics Regression. It should be noted that the Decision Tree model has the lowest AUC-ROC score, but it has the smallest model development time.Conclusion: The performance of random forest models in predicting booking likelihood of accommodation listings is the most superior. The model can be used by peer-to-peer accommodation owners to improve their revenue management responses.

Download Full-text