scholarly journals Validation of Machine Learning Models for Health Insurance Risks Assessment

A universal healthcare policy success is impossible without the use of insurance instruments. The healthcare and insurance industries are on the verge of integrating seamlessly with the help of sensors and algorithms. This research work focuses on validating an algorithm that can help to model and classify health insurance risk data. Six algorithms Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), Naive Bayes (NB) and Support Vector Machine (SVM) were evaluated and objective validation of these algorithms has been demonstrated. To maintain the replicability of the study the data and code are available in public repository. From the study, it is clear that the KNN algorithm is best suited as a risk classifier. This is evidence from the values of R2 , error metrics, completeness score, explained variance, normalized mutual score v measure score, precision, recall, f1 score, and accuracy metrics. Secondly, the algorithms have been validated using 10 k-fold method using five types of performance metrics. In almost all cases, it was found that the KNN algorithm performs consistently and is the most suitable numerically. This can be attributed that the standard deviation remains tight of performance metrics in evaluation. From all the validation test, it can be claimed that on the current dataset, the KNN algorithm with Accuracy, Homogeneity Score Explained variance and Normalized mutual score hyper-parameter configuration is the best performer

Author(s):  
Abdullahi Adeleke ◽  
Noor Azah Samsudin ◽  
Mohd Hisyam Abdul Rahim ◽  
Shamsul Kamal Ahmad Khalid ◽  
Riswan Efendi

Machine learning involves the task of training systems to be able to make decisions without being explicitly programmed. Important among machine learning tasks is classification involving the process of training machines to make predictions from predefined labels. Classification is broadly categorized into three distinct groups: single-label (SL), multi-class, and multi-label (ML) classification. This research work presents an application of a multi-label classification (MLC) technique in automating Quranic verses labeling. MLC has been gaining attention in recent years. This is due to the increasing amount of works based on real-world classification problems of multi-label data. In traditional classification problems, patterns are associated with a single-label from a set of disjoint labels. However, in MLC, an instance of data is associated with a set of labels. In this paper, three standard <em>MLC</em> methods: <span>binary relevance (BR), classifier chain (CC), and label powerset (LP) algorithms are implemented with four baseline classifiers: support vector machine (SVM), naïve Bayes (NB), k-nearest neighbors (k-NN), and J48. The research methodology adopts the multi-label problem transformation (PT) approach. The results are validated using six conventional performance metrics. These include: hamming loss, accuracy, one error, micro-F1, macro-F1, and avg. precision. From the results, the classifiers effectively achieved above 70% accuracy mark. Overall, SVM achieved the best results with CC and LP algorithms.</span>


2019 ◽  
Vol 9 (18) ◽  
pp. 3723
Author(s):  
Sharif ◽  
Mumtaz ◽  
Shafiq ◽  
Riaz ◽  
Ali ◽  
...  

The rise of social media has led to an increasing online cyber-war via hate and violent comments or speeches, and even slick videos that lead to the promotion of extremism and radicalization. An analysis to sense cyber-extreme content from microblogging sites, specifically Twitter, is a challenging, and an evolving research area since it poses several challenges owing short, noisy, context-dependent, and dynamic nature content. The related tweets were crawled using query words and then carefully labelled into two classes: Extreme (having two sub-classes: pro-Afghanistan government and pro-Taliban) and Neutral. An Exploratory Data Analysis (EDA) using Principal Component Analysis (PCA), was performed for tweets data (having Term Frequency—Inverse Document Frequency (TF-IDF) features) to reduce a high-dimensional data space into a low-dimensional (usually 2-D or 3-D) space. PCA-based visualization has shown better cluster separation between two classes (extreme and neutral), whereas cluster separation, within sub-classes of extreme class, was not clear. The paper also discusses the pros and cons of applying PCA as an EDA in the context of textual data that is usually represented by a high-dimensional feature set. Furthermore, the classification algorithms like naïve Bayes’, K Nearest Neighbors (KNN), random forest, Support Vector Machine (SVM) and ensemble classification methods (with bagging and boosting), etc., were applied with PCA-based reduced features and with a complete set of features (TF-IDF features extracted from n-gram terms in the tweets). The analysis has shown that an SVM demonstrated an average accuracy of 84% compared with other classification models. It is pertinent to mention that this is the novel reported research work in the context of Afghanistan war zone for Twitter content analysis using machine learning methods.


Recommendation systems are subdivision of Refine Data that request to anticipate ranking or liking a user would give to an item. Recommended systems produce user customized exhortations for product or service. Recommended systems are used in different services like Google Search Engine, YouTube, Gmail and also Product recommendation service on any E-Commerce website. These systems usually depends on content based approach. in this paper, we develop these type recommended systems by using several algorithms like K-Nearest neighbors(KNN), Support-Vector Machine(SVM), Logistic Regression(LR), MultinomialNB(MNB),and Multi-layer Perception(MLP). These will predict nearest categories from the News Category Data, among these categories we will recommend the most common sentence to a user and we analyze the performance metrics. This approach is tested on News Category Data set. This data set having more or less 200k Headlines of News and 41 classes, collected from the Huff post from the year of 2012-2018.


Author(s):  
Umar Sidiq ◽  
Syed Mutahar Aaqib ◽  
Rafi Ahmad Khan

Classification is one of the most considerable supervised learning data mining technique used to classify predefined data sets the classification is mainly used in healthcare sectors for making decisions, diagnosis system and giving better treatment to the patients. In this work, the data set used is taken from one of recognized lab of Kashmir. The entire research work is to be carried out with ANACONDA3-5.2.0 an open source platform under Windows 10 environment. An experimental study is to be carried out using classification techniques such as k nearest neighbors, Support vector machine, Decision tree and Naïve bayes. The Decision Tree obtained highest accuracy of 98.89% over other classification techniques.


Author(s):  
Dorian Ruiz Alonso ◽  
Claudia Zepeda Cortés ◽  
Hilda Castillo Zacatelco ◽  
José Luis Carballido Carranza

In this work, we propose the extension of a methodology for the multi-label classification of feedback according to the Hattie and Timperley feedback model, incorporating a hyperparameter tuning stage. It is analyzed whether the incorporation of the hyperparameter tuning stage prior to the execution of the algorithms support vector machines, random forest and multi-label k-nearest neighbors, improves the performance metrics of multi-label classifiers that automatically locate the feedback generated by a teacher to the activities sent by students in online courses on the Blackboard platform at the task, process, regulation, praise and other levels proposed in the feedback model by Hattie and Timperley. The grid search strategy is used to refine the hyperparameters of each algorithm. The results show that the adjustment of the hyperparameters improves the performance metrics for the data set used.


Author(s):  
Hani Bani-Salameh ◽  
Shadi M. Alkhatib ◽  
Moawyiah Abdalla ◽  
Mo’taz Al-Hami ◽  
Ruaa Banat ◽  
...  

Background: Diabetes and hypertension are two of the commonest diseases in the world. As they unfavorably affect people of different age groups, they have become a cause of concern and must be predicted and diagnosed well in advance. Objective: This research aims to determine the effectiveness of artificial neural networks (ANNs) in predicting diabetes and blood pressure diseases and to point out the factors which have a high impact on these diseases. Sample: This work used two online datasets which consist of data collected from 768 individuals. We applied neural network algorithms to predict if the individuals have those two diseases based on some factors. Diabetes prediction is based on five factors: age, weight, fat-ratio, glucose, and insulin, while blood pressure prediction is based on six factors: age, weight, fat-ratio, blood pressure, alcohol, and smoking. Method: A model based on the Multi-Layer Perceptron Neural Network (MLP) was implemented. The inputs of the network were the factors for each disease, while the output was the prediction of the disease’s occurrence. The model performance was compared with other classifiers such as Support Vector Machine (SVM) and K-Nearest Neighbors (KNN). We used performance metrics measures to assess the accuracy and performance of MLP. Also, a tool was implemented to help diagnose the diseases and to understand the results. Result: The model predicted the two diseases with correct classification rate (CCR) of 77.6% for diabetes and 68.7% for hypertension. The results indicate that MLP correctly predicts the probability of being diseased or not, and the performance can be significantly increased compared with both SVM and KNN. This shows MLPs effectiveness in early disease prediction.


Author(s):  
Ahmet Elbir ◽  
Hamza Osman Ilhan ◽  
Mehmet Furkan Aydin ◽  
Yunus Emre Demirbulut

One of the most important problems of telecommunication companies is the potential transfer of customers between the firms. In order to avoid this problem, it is very important to identify customers who are likely to leave. In this study, the performance of the classification and the clustering algorithms in machine learning techniques has been evaluated and compared on the analysis of potential customer trends, which have been reported as churn analysis. K nearest neighbors, decision trees, random forests, support vector machines and naive bayes methods were tested in scope of classification idea. Additionally, K-Means and hierarchical clustering methods were tested. The performances of the methods have been evaluated according to the accuracy, precision, sensitivity and F-measure performance metrics.


Author(s):  
Yasin Kaya

Bundle Branch Block (BBB) beats are the most common Electrocardiogram (ECG) arrhythmias and can be indicators of significant heart disease. This study aimed to provide an effective machine-learning method for the detection of BBB beats. To this purpose, statistical and temporal features were calculated and the more valuable ones searched using feature selection algorithms. Forward search, backward elimination and genetic algorithms were used for feature selection. Three different classifiers, K-Nearest Neighbors (KNN), neural networks, and support vector machines, were used comparatively in this study. Accuracy, specificity, and sensitivity performance metrics were calculated in order to compare the results. Normal sinus rhythm (N), Right Bundle Branch Block (RBBB), and Left Bundle Branch Block (LBBB) ECG beat types were used in the study. All beats containing these three beat types in the MIT-BIH arrhythmia database were used in the experiments. All of the feature sets were obtained at a promising classification accuracy for BBB classification. The KNN classifier using backward elimination-selected features achieved the highest classification accuracy results in the study with 99.82%. The results showed the proposed approach to be successful in the detection of BBB beats.


Data ◽  
2019 ◽  
Vol 5 (1) ◽  
pp. 2 ◽  
Author(s):  
Shiny Abraham ◽  
Chau Huynh ◽  
Huy Vu

Hydrologic soil groups play an important role in the determination of surface runoff, which, in turn, is crucial for soil and water conservation efforts. Traditionally, placement of soil into appropriate hydrologic groups is based on the judgement of soil scientists, primarily relying on their interpretation of guidelines published by regional or national agencies. As a result, large-scale mapping of hydrologic soil groups results in widespread inconsistencies and inaccuracies. This paper presents an application of machine learning for classification of soil into hydrologic groups. Based on features such as percentages of sand, silt and clay, and the value of saturated hydraulic conductivity, machine learning models were trained to classify soil into four hydrologic groups. The results of the classification obtained using algorithms such as k-Nearest Neighbors, Support Vector Machine with Gaussian Kernel, Decision Trees, Classification Bagged Ensembles and TreeBagger (Random Forest) were compared to those obtained using estimation based on soil texture. The performance of these models was compared and evaluated using per-class metrics and micro- and macro-averages. Overall, performance metrics related to kNN, Decision Tree and TreeBagger exceeded those for SVM-Gaussian Kernel and Classification Bagged Ensemble. Among the four hydrologic groups, it was noticed that group B had the highest rate of false positives.


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 257
Author(s):  
Anaahat Dhindsa ◽  
Sanjay Bhatia ◽  
Sunil Agrawal ◽  
Balwinder Singh Sohi

The accurate classification of microbes is critical in today’s context for monitoring the ecological balance of a habitat. Hence, in this research work, a novel method to automate the process of identifying microorganisms has been implemented. To extract the bodies of microorganisms accurately, a generalized segmentation mechanism which consists of a combination of convolution filter (Kirsch) and a variance-based pixel clustering algorithm (Otsu) is proposed. With exhaustive corroboration, a set of twenty-five features were identified to map the characteristics and morphology for all kinds of microbes. Multiple techniques for feature selection were tested and it was found that mutual information (MI)-based models gave the best performance. Exhaustive hyperparameter tuning of multilayer layer perceptron (MLP), k-nearest neighbors (KNN), quadratic discriminant analysis (QDA), logistic regression (LR), and support vector machine (SVM) was done. It was found that SVM radial required further improvisation to attain a maximum possible level of accuracy. Comparative analysis between SVM and improvised SVM (ISVM) through a 10-fold cross validation method ultimately showed that ISVM resulted in a 2% higher performance in terms of accuracy (98.2%), precision (98.2%), recall (98.1%), and F1 score (98.1%).


Sign in / Sign up

Export Citation Format

Share Document