scholarly journals A Comparative Analysis of Machine Learning Models for the Prediction of Insurance Uptake in Kenya

Data ◽  
2021 ◽  
Vol 6 (11) ◽  
pp. 116
Author(s):  
Nelson Kemboi Yego ◽  
Juma Kasozi ◽  
Joseph Nkurunziza

The role of insurance in financial inclusion and economic growth, in general, is immense and is increasingly being recognized. However, low uptake impedes the growth of the sector, hence the need for a model that robustly predicts insurance uptake among potential clients. This study undertook a two phase comparison of machine learning classifiers. Phase I had eight machine learning models compared for their performance in predicting the insurance uptake using 2016 Kenya FinAccessHousehold Survey data. Taking Phase I as a base in Phase II, random forest and XGBoost were compared with four deep learning classifiers using 2019 Kenya FinAccess Household Survey data. The random forest model trained on oversampled data showed the highest F1-score, accuracy, and precision. The area under the receiver operating characteristic curve was furthermore highest for random forest; hence, it could be construed as the most robust model for predicting the insurance uptake. Finally, the most important features in predicting insurance uptake as extracted from the random forest model were income, bank usage, and ability and willingness to support others. Hence, there is a need for a design and distribution of low income based products, and bancassurance could be said to be a plausible channel for the distribution of insurance products.

Cancers ◽  
2021 ◽  
Vol 13 (23) ◽  
pp. 6013
Author(s):  
Hyun-Soo Park ◽  
Kwang-sig Lee ◽  
Bo-Kyoung Seo ◽  
Eun-Sil Kim ◽  
Kyu-Ran Cho ◽  
...  

This prospective study enrolled 147 women with invasive breast cancer who underwent low-dose breast CT (80 kVp, 25 mAs, 1.01–1.38 mSv) before treatment. From each tumor, we extracted eight perfusion parameters using the maximum slope algorithm and 36 texture parameters using the filtered histogram technique. Relationships between CT parameters and histological factors were analyzed using five machine learning algorithms. Performance was compared using the area under the receiver-operating characteristic curve (AUC) with the DeLong test. The AUCs of the machine learning models increased when using both features instead of the perfusion or texture features alone. The random forest model that integrated texture and perfusion features was the best model for prediction (AUC = 0.76). In the integrated random forest model, the AUCs for predicting human epidermal growth factor receptor 2 positivity, estrogen receptor positivity, progesterone receptor positivity, ki67 positivity, high tumor grade, and molecular subtype were 0.86, 0.76, 0.69, 0.65, 0.75, and 0.79, respectively. Entropy of pre- and postcontrast images and perfusion, time to peak, and peak enhancement intensity of hot spots are the five most important CT parameters for prediction. In conclusion, machine learning using texture and perfusion characteristics of breast cancer with low-dose CT has potential value for predicting prognostic factors and risk stratification in breast cancer patients.


2021 ◽  
Author(s):  
M.D.S. Sudaraka ◽  
I. Abeyagunawardena ◽  
E. S. De Silva ◽  
S Abeyagunawardena

Abstract BackgroundElectrocardiogram (ECG) is a key diagnostic test in cardiac investigation. Interpretation of ECG is based on the understanding of normal electrical patterns produced by the heart and alterations of those patterns in specific disease conditions. With machine learning techniques, it is possible to interpret ECGs with increased accuracy. However, there is a lacuna in machine learning models to detect myocardial infarction (MI) coupled with the affected territories of the heart. MethodsThe dataset was obtained from the University of California, Irvine, Machine Learning Repository. It was filtered to obtain observations categorized as Normal, Ischemic changes, Old Anterior MI and Old Inferior MI. The dataset was randomly split into a training set (70%) and a test set (30%). 73 out of the 270 ECG features were selected based on the changes observed following MI, after excluding predictors that had near zero variance across the observations. Three machine learning classification models (Bootstrap Aggregation Decision Trees, Random Forest, Multi-layer Perceptron) were trained using the training dataset, optimizing for the Kappa statistic and the parameter tuning was achieved with repeated 10-fold cross validation. Accuracy and Kappa of the samples were used to evaluate performance between the models. ResultsThe Random Forest model identified old anterior and old inferior MIs with 100% sensitivity and specificity and all 4 categorized observations with an overall accuracy of 0.9167 (95% CI 0.8424 - 0.9633). Both the Bootstrap Aggregation Decision Trees and the Multi-layer Perceptron models identified old anterior MIs with 100% sensitivity and specificity and their overall accuracies for all 4 observations were 0.8958 (95% CI 0.8168 - 0.9489) and 0.8542 (95% CI 0.7674 - 0.9179) respectively.Conclusion With a medically informed feature selection we were able to identify old anterior MI with 100% sensitivity and specificity by all three models in this study, and old inferior MI with 100% sensitivity and specificity by Random Forest Model. If the data set can be improved it is possible to utilize these machine learning models in hospital setting to identify cardiac emergencies by incorporating them into cardiac monitors, until trained personnel become available.


Environments ◽  
2021 ◽  
Vol 8 (10) ◽  
pp. 99
Author(s):  
Shun-Yuan Wang ◽  
Wen-Bin Lin ◽  
Yu-Chieh Shu

In this study, a mobile air pollution sensing unit based on the Internet of Things framework was designed for monitoring the concentration of fine particulate matter in three urban areas. This unit was developed using the NodeMCU-32S microcontroller, PMS5003-G5 (particulate matter sensing module), and Ublox NEO-6M V2 (GPS positioning module). The sensing unit transmits data of the particulate matter concentration and coordinates of a polluted location to the backend server through 3G and 4G telecommunication networks for data collection. This system will complement the government’s PM2.5 data acquisition system. Mobile monitoring stations meet the air pollution monitoring needs of some areas that require special observation. For example, an AIoT development system will be installed. At intersections with intensive traffic, it can be used as a reference for government transportation departments or environmental inspection departments for environmental quality monitoring or evacuation of traffic flow. Furthermore, the particulate matter distributions in three areas, namely Xinzhuang, Sanchong, and Luzhou Districts, which are all in New Taipei City of Taiwan, were estimated using machine learning models, the data of stationary monitoring stations, and the measurements of the mobile sensing system proposed in this study. Four types of learning models were trained, namely the decision tree, random forest, multilayer perceptron, and radial basis function neural network, and their prediction results were evaluated. The root mean square error was used as the performance indicator, and the learning results indicate that the random forest model outperforms the other models for both the training and testing sets. To examine the generalizability of the learning models, the models were verified in relation to data measured on three days: 15 February, 28 February, and 1 March 2019. A comparison between the model predicted and the measured data indicates that the random forest model provides the most stable and accurate prediction values and could clearly present the distribution of highly polluted areas. The results of these models are visualized in the form of maps by using a web application. The maps allow users to understand the distribution of polluted areas intuitively.


Author(s):  
Farrikh Alzami ◽  
Erika Devi Udayanti ◽  
Dwi Puji Prabowo ◽  
Rama Aria Megantara

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.


2021 ◽  
Vol 9 ◽  
Author(s):  
Daniel Lowell Weller ◽  
Tanzy M. T. Love ◽  
Martin Wiedmann

Recent studies have shown that predictive models can supplement or provide alternatives to E. coli-testing for assessing the potential presence of food safety hazards in water used for produce production. However, these studies used balanced training data and focused on enteric pathogens. As such, research is needed to determine 1) if predictive models can be used to assess Listeria contamination of agricultural water, and 2) how resampling (to deal with imbalanced data) affects performance of these models. To address these knowledge gaps, this study developed models that predict nonpathogenic Listeria spp. (excluding L. monocytogenes) and L. monocytogenes presence in agricultural water using various combinations of learner (e.g., random forest, regression), feature type, and resampling method (none, oversampling, SMOTE). Four feature types were used in model training: microbial, physicochemical, spatial, and weather. “Full models” were trained using all four feature types, while “nested models” used between one and three types. In total, 45 full (15 learners*3 resampling approaches) and 108 nested (5 learners*9 feature sets*3 resampling approaches) models were trained per outcome. Model performance was compared against baseline models where E. coli concentration was the sole predictor. Overall, the machine learning models outperformed the baseline E. coli models, with random forests outperforming models built using other learners (e.g., rule-based learners). Resampling produced more accurate models than not resampling, with SMOTE models outperforming, on average, oversampling models. Regardless of resampling method, spatial and physicochemical water quality features drove accurate predictions for the nonpathogenic Listeria spp. and L. monocytogenes models, respectively. Overall, these findings 1) illustrate the need for alternatives to existing E. coli-based monitoring programs for assessing agricultural water for the presence of potential food safety hazards, and 2) suggest that predictive models may be one such alternative. Moreover, these findings provide a conceptual framework for how such models can be developed in the future with the ultimate aim of developing models that can be integrated into on-farm risk management programs. For example, future studies should consider using random forest learners, SMOTE resampling, and spatial features to develop models to predict the presence of foodborne pathogens, such as L. monocytogenes, in agricultural water when the training data is imbalanced.


ADMET & DMPK ◽  
2020 ◽  
Author(s):  
John Mitchell

<p class="ADMETabstracttext">We describe three machine learning models submitted to the 2019 Solubility Challenge. All are founded on tree-like classifiers, with one model being based on Random Forest and another on the related Extra Trees algorithm. The third model is a consensus predictor combining the former two with a Bagging classifier. We call this consensus classifier Vox Machinarum, and here discuss how it benefits from the Wisdom of Crowds. On the first 2019 Solubility Challenge test set of 100 low-variance intrinsic aqueous solubilities, Extra Trees is our best classifier. One the other, a high-variance set of 32 molecules, we find that Vox Machinarum and Random Forest both perform a little better than Extra Trees, and almost equally to one another. We also compare the gold standard solubilities from the 2019 Solubility Challenge with a set of literature-based solubilities for most of the same compounds.</p>


Sign in / Sign up

Export Citation Format

Share Document