Machine Learning Techniques Applied to Profile Mobile Banking Users in India

This paper profiles mobile banking users using machine learning techniques viz. Decision Tree, Logistic Regression, Multilayer Perceptron, and SVM to test a research model with fourteen independent variables and a dependent variable (adoption). A survey was conducted and the results were analysed using these techniques. Using Decision Trees the profile of the mobile banking adopter’s profile was identified. Comparing different machine learning techniques it was found that Decision Trees outperformed the Logistic Regression and Multilayer Perceptron and SVM. Out of all the techniques, Decision Tree is recommended for profiling studies because apart from obtaining high accurate results, it also yields ‘if–then’ classification rules. The classification rules provided here can be used to target potential customers to adopt mobile banking by offering them appropriate incentives.

Download Full-text

Aprendizado de Máquina Aplicado à Predição de Doenças Cardiometabólicas com Utilização de Indicadores Metabólicos e Comportamentais de Risco à Saúde

10.14210/cotb.v12.p301-308 ◽

2021 ◽

Author(s):

Alan Lopes de Sousa Freitas ◽

Ana Silvia Degasperi Ieker ◽

Josiane Melchiori Pinheiro ◽

Wilson Rinaldi ◽

Heloise Manica Paris Teixeira

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Logistic Regression ◽

Decision Tree ◽

Causes Of Death ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Cardiometabolic Diseases ◽

Learning Techniques ◽

Good Classification

Cardiometabolic diseases, developed throughout the worker’s life,such as hypertension, diabetes, dyslipidemia and obesity are amongthe main causes of death and are associated with modifiable andcontrollable risk factors. The general objective of this study wasto apply supervised Machine Learning techniques and to comparetheir performance to predict the risk of developing cardiometabolicdisease from servers working at the School Hospital of south inBrazil. We sought to map the characteristics of individuals who aremore likely to develop cardiometabolic diseases. The machine learningmodels evaluated were Naive Bayes, Decision Tree, RandomForest, KNN, Logistic Regression and SVM. The results obtained inthe experiments showed that some supervised machine learningmodels produce a good classification, depending on the attributesand hyperparameters used.

Download Full-text

Analyzing injury severity of motorcycle at-fault crashes using machine learning techniques, decision tree and logistic regression models

International Journal of Transportation Science and Technology ◽

10.1016/j.ijtst.2019.10.002 ◽

2020 ◽

Vol 9 (2) ◽

pp. 89-99 ◽

Cited By ~ 7

Author(s):

Mahdi Rezapour ◽

Amirarsalan Mehrara Molan ◽

Khaled Ksaibati

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Decision Tree ◽

Regression Models ◽

Injury Severity ◽

Machine Learning Techniques ◽

Logistic Regression Models ◽

Learning Techniques

Download Full-text

Breast cancer classification using machine learning techniques: a comparative study

Medical Technologies Journal ◽

10.26415/2572-004x-vol4iss2p535-544 ◽

2020 ◽

Vol 4 (2) ◽

pp. 535-544

Author(s):

Djihane HOUFANI ◽

◽

Sihem SLATNIA ◽

Okba KAZAR ◽

Noureddine ZERHOUNI ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Logistic Regression ◽

Multilayer Perceptron ◽

Learning Algorithms ◽

Cancer Classification ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Breast Cancer Classification ◽

Learning Techniques

Background: The second leading deadliest disease affecting women worldwide, after lung cancer, is breast cancer. Traditional approaches for breast cancer diagnosis suffer from time consumption and some human errors in classification. To deal with this problems, many research works based on machine learning techniques are proposed. These approaches show their effectiveness in data classification in many fields, especially in healthcare. Methods: In this cross sectional study, we conducted a practical comparison between the most used machine learning algorithms in the literature. We applied kernel and linear support vector machines, random forest, decision tree, multi-layer perceptron, logistic regression, and k-nearest neighbors for breast cancer tumors classification. The used dataset is Wisconsin diagnosis Breast Cancer. Results: After comparing the machine learning algorithms efficiency, we noticed that multilayer perceptron and logistic regression gave the best results with an accuracy of 98% for breast cancer classification. Conclusion: Machine learning approaches are extensively used in medical prediction and decision support systems. This study showed that multilayer perceptron and logistic regression algorithms are performant ( good accuracy specificity and sensitivity) compared to the other evaluated algorithms.

Download Full-text

A Practical Tutorial for Decision Tree Induction

ACM Computing Surveys ◽

10.1145/3429739 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-38

Author(s):

Víctor Adrián Sosa Hernández ◽

Raúl Monroy ◽

Miguel Angel Medina-Pérez ◽

Octavio Loyola-González ◽

Francisco Herrera

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Machine Learning Techniques ◽

Evaluation Measures ◽

Decision Tree Induction ◽

Learning Techniques ◽

Tree Models ◽

Evaluation Measure ◽

Main Components ◽

Support Decision Making

Experts from different domains have resorted to machine learning techniques to produce explainable models that support decision-making. Among existing techniques, decision trees have been useful in many application domains for classification. Decision trees can make decisions in a language that is closer to that of the experts. Many researchers have attempted to create better decision tree models by improving the components of the induction algorithm. One of the main components that have been studied and improved is the evaluation measure for candidate splits. In this article, we introduce a tutorial that explains decision tree induction. Then, we present an experimental framework to assess the performance of 21 evaluation measures that produce different C4.5 variants considering 110 databases, two performance measures, and 10× 10-fold cross-validation. Furthermore, we compare and rank the evaluation measures by using a Bayesian statistical analysis. From our experimental results, we present the first two performance rankings in the literature of C4.5 variants. Moreover, we organize the evaluation measures into two groups according to their performance. Finally, we introduce meta-models that automatically determine the group of evaluation measures to produce a C4.5 variant for a new database and some further opportunities for decision tree models.

Download Full-text

Evolutionary Algorithm for Improving Decision Tree with Global Discretization in Manufacturing

Sensors ◽

10.3390/s21082849 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2849

Author(s):

Sungbum Jun

Keyword(s):

Decision Tree ◽

Evolutionary Algorithm ◽

Decision Trees ◽

Manufacturing Systems ◽

Ensemble Methods ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Industrial Internet ◽

Tree Models ◽

Real World Datasets

Due to the recent advance in the industrial Internet of Things (IoT) in manufacturing, the vast amount of data from sensors has triggered the need for leveraging such big data for fault detection. In particular, interpretable machine learning techniques, such as tree-based algorithms, have drawn attention to the need to implement reliable manufacturing systems, and identify the root causes of faults. However, despite the high interpretability of decision trees, tree-based models make a trade-off between accuracy and interpretability. In order to improve the tree’s performance while maintaining its interpretability, an evolutionary algorithm for discretization of multiple attributes, called Decision tree Improved by Multiple sPLits with Evolutionary algorithm for Discretization (DIMPLED), is proposed. The experimental results with two real-world datasets from sensors showed that the decision tree improved by DIMPLED outperformed the performances of single-decision-tree models (C4.5 and CART) that are widely used in practice, and it proved competitive compared to the ensemble methods, which have multiple decision trees. Even though the ensemble methods could produce slightly better performances, the proposed DIMPLED has a more interpretable structure, while maintaining an appropriate performance level.

Download Full-text

Malicious URL Detection using Logistic Regression

10.36227/techrxiv.14790381 ◽

2021 ◽

Author(s):

Rohit Rayala ◽

Sashank Pasumarthi ◽

Rohith Kuppa ◽

S R KARTHIK

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Machine Learning Techniques ◽

Learning Techniques

Paper is based on a model that is built to detect malicious URLs using machine learning techniques.

Download Full-text

Predicting Metabolic Syndrome With Machine Learning Models Using a Decision Tree Algorithm: Retrospective Cohort Study (Preprint)

10.2196/preprints.17110 ◽

2019 ◽

Author(s):

Cheng-Sheng Yu ◽

Yu-Jiun Lin ◽

Chang-Hsien Lin ◽

Sen-Te Wang ◽

Shiyng-Yu Lin ◽

...

Keyword(s):

Machine Learning ◽

Metabolic Syndrome ◽

Logistic Regression ◽

Decision Tree ◽

Characteristic Curve ◽

Significant Risk ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Health Examination ◽

Multivariate Logistic Regression

BACKGROUND Metabolic syndrome is a cluster of disorders that significantly influence the development and deterioration of numerous diseases. FibroScan is an ultrasound device that was recently shown to predict metabolic syndrome with moderate accuracy. However, previous research regarding prediction of metabolic syndrome in subjects examined with FibroScan has been mainly based on conventional statistical models. Alternatively, machine learning, whereby a computer algorithm learns from prior experience, has better predictive performance over conventional statistical modeling. OBJECTIVE We aimed to evaluate the accuracy of different decision tree machine learning algorithms to predict the state of metabolic syndrome in self-paid health examination subjects who were examined with FibroScan. METHODS Multivariate logistic regression was conducted for every known risk factor of metabolic syndrome. Principal components analysis was used to visualize the distribution of metabolic syndrome patients. We further applied various statistical machine learning techniques to visualize and investigate the pattern and relationship between metabolic syndrome and several risk variables. RESULTS Obesity, serum glutamic-oxalocetic transaminase, serum glutamic pyruvic transaminase, controlled attenuation parameter score, and glycated hemoglobin emerged as significant risk factors in multivariate logistic regression. The area under the receiver operating characteristic curve values for classification and regression trees and for the random forest were 0.831 and 0.904, respectively. CONCLUSIONS Machine learning technology facilitates the identification of metabolic syndrome in self-paid health examination subjects with high accuracy.

Download Full-text

Improved argumentative paragraphs detection in academic theses supported with unit segmentation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219237 ◽

2021 ◽

pp. 1-11

Author(s):

Jesús Miguel García-Gorrostieta ◽

Aurelio López-López ◽

Samuel González-López ◽

Adrián Pastor López-Monroy

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Automatic Detection ◽

Machine Learning Techniques ◽

Svm Classifier ◽

Complex Task ◽

Decision Tree Classifier ◽

Learning Techniques ◽

Tree Classifier ◽

Academic Author

Academic theses writing is a complex task that requires the author to be skilled in argumentation. The goal of the academic author is to communicate clear ideas and to convince the reader of the presented claims. However, few students are good arguers, and this is a skill that takes time to master. In this paper, we present an exploration of lexical features used to model automatic detection of argumentative paragraphs using machine learning techniques. We present a novel proposal, which combines the information in the complete paragraph with the detection of argumentative segments in order to achieve improved results for the detection of argumentative paragraphs. We propose two approaches; a more descriptive one, which uses the decision tree classifier with indicators and lexical features; and another more efficient, which uses an SVM classifier with lexical features and a Document Occurrence Representation (DOR). Both approaches consider the detection of argumentative segments to ensure that a paragraph detected as argumentative has indeed segments with argumentation. We achieved encouraging results for both approaches.

Download Full-text

Analysis of Kinase Inhibitors and Druggability of Kinase-Targets Using Machine Learning Techniques

Pattern Discovery Using Sequence Data Mining ◽

10.4018/978-1-61350-056-9.ch009 ◽

2012 ◽

pp. 155-165

Author(s):

S. Prasanthi ◽

S.Durga Bhavani ◽

T. Sobha Rani ◽

Raju S. Bapi

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Kinase Inhibitors ◽

Kinase Inhibitor ◽

Classification Problem ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Decision Tree Classifier ◽

Data Set ◽

Learning Techniques

Vast majority of successful drugs or inhibitors achieve their activity by binding to, and modifying the activity of a protein leading to the concept of druggability. A target protein is druggable if it has the potential to bind the drug-like molecules. Hence kinase inhibitors need to be studied to understand the specificity of a kinase inhibitor in choosing a particular kinase target. In this paper we focus on human kinase drug target sequences since kinases are known to be potential drug targets. Also we do a preliminary analysis of kinase inhibitors in order to study the problem in the protein-ligand space in future. The identification of druggable kinases is treated as a classification problem in which druggable kinases are taken as positive data set and non-druggable kinases are chosen as negative data set. The classification problem is addressed using machine learning techniques like support vector machine (SVM) and decision tree (DT) and using sequence-specific features. One of the challenges of this classification problem is due to the unbalanced data with only 48 druggable kinases available against 509 non-drugggable kinases present at Uniprot. The accuracy of the decision tree classifier obtained is 57.65 which is not satisfactory. A two-tier architecture of decision trees is carefully designed such that recognition on the non-druggable dataset also gets improved. Thus the overall model is shown to achieve a final performance accuracy of 88.37. To the best of our knowledge, kinase druggability prediction using machine learning approaches has not been reported in literature.

Download Full-text