PREDICTION OF GEAR PITTING DEFECT BY USING DECISION TREE CLASSIFIER MACHINE LEARNING ALGORITHM

Heart disease is a common problem which can be very severe in old ages and also in people not having a healthy lifestyle. With regular check-up and diagnosis in addition to maintaining a decent eating habit can prevent it to some extent. In this paper we have tried to implement the most sought after and important machine learning algorithm to predict the heart disease in a patient. The decision tree classifier is implemented based on the symptoms which are specifically the attributes required for the purpose of prediction. Using the decision tree algorithm, we will be able to identify those attributes which are the best one that will lead us to a better prediction of the datasets. The decision tree algorithm works in a way where it tries to solve the problem by the help of tree representation. Here each internal node of the tree represents an attribute, and each leaf node corresponds to a class label. The support vector machine algorithm helps us to classify the datasets on the basis of kernel and it also groups the dataset using hyperplane. The main objective of this project is to try and reduce the number of occurrences of the heart diseases in patients

Download Full-text

Successful Case Study of Machine Learning Application to Streamline and Improve History Matching Process for Complex Gas-Condensate Reservoirs in Hai Thach Field, Offshore Vietnam

10.2118/204835-ms ◽

2021 ◽

Author(s):

Son Hoang ◽

Tung Tran ◽

Tan Nguyen ◽

Tu Truong ◽

Duy Pham ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

History Matching ◽

Dynamic Models ◽

Naive Bayes ◽

Naïve Bayes ◽

Gas Condensate ◽

Decision Tree Classifier ◽

Matching Process ◽

Tree Classifier

Abstract This paper reports a successful case study of applying machine learning to improve the history matching process, making it easier, less time-consuming, and more accurate, by determining whether Local Grid Refinement (LGR) with transmissibility multiplier is needed to history match gas-condensate wells producing from geologically complex reservoirs as well as determining the required LGR setup to history match those gas-condensate producers. History matching Hai Thach gas-condensate production wells is extremely challenging due to the combined effect of condensate banking, sub-seismic fault network, complex reservoir distribution and connectivity, uncertain HIIP, and lack of PVT data for most reservoirs. In fact, for some wells, many trial simulation runs were conducted before it became clear that LGR with transmissibility multiplier was required to obtain good history matching. In order to minimize this time-consuming trial-and-error process, machine learning was applied in this study to analyze production data using synthetic samples generated by a very large number of compositional sector models so that the need for LGR could be identified before the history matching process begins. Furthermore, machine learning application could also determine the required LGR setup. The method helped provide better models in a much shorter time, and greatly improved the efficiency and reliability of the dynamic modeling process. More than 500 synthetic samples were generated using compositional sector models and divided into separate training and test sets. Multiple classification algorithms such as logistic regression, Gaussian Naive Bayes, Bernoulli Naive Bayes, multinomial Naive Bayes, linear discriminant analysis, support vector machine, K-nearest neighbors, and Decision Tree as well as artificial neural networks were applied to predict whether LGR was used in the sector models. The best algorithm was found to be the Decision Tree classifier, with 100% accuracy on the training set and 99% accuracy on the test set. The LGR setup (size of LGR area and range of transmissibility multiplier) was also predicted best by the Decision Tree classifier with 91% accuracy on the training set and 88% accuracy on the test set. The machine learning model was validated using actual production data and the dynamic models of history-matched wells. Finally, using the machine learning prediction on wells with poor history matching results, their dynamic models were updated and significantly improved.

Download Full-text

Improved argumentative paragraphs detection in academic theses supported with unit segmentation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219237 ◽

2021 ◽

pp. 1-11

Author(s):

Jesús Miguel García-Gorrostieta ◽

Aurelio López-López ◽

Samuel González-López ◽

Adrián Pastor López-Monroy

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Automatic Detection ◽

Machine Learning Techniques ◽

Svm Classifier ◽

Complex Task ◽

Decision Tree Classifier ◽

Learning Techniques ◽

Tree Classifier ◽

Academic Author

Academic theses writing is a complex task that requires the author to be skilled in argumentation. The goal of the academic author is to communicate clear ideas and to convince the reader of the presented claims. However, few students are good arguers, and this is a skill that takes time to master. In this paper, we present an exploration of lexical features used to model automatic detection of argumentative paragraphs using machine learning techniques. We present a novel proposal, which combines the information in the complete paragraph with the detection of argumentative segments in order to achieve improved results for the detection of argumentative paragraphs. We propose two approaches; a more descriptive one, which uses the decision tree classifier with indicators and lexical features; and another more efficient, which uses an SVM classifier with lexical features and a Document Occurrence Representation (DOR). Both approaches consider the detection of argumentative segments to ensure that a paragraph detected as argumentative has indeed segments with argumentation. We achieved encouraging results for both approaches.

Download Full-text

An Adaptive Multi-Layer Botnet Detection Technique Using Machine Learning Classifiers

Applied Sciences ◽

10.3390/app9112375 ◽

2019 ◽

Vol 9 (11) ◽

pp. 2375 ◽

Cited By ~ 7

Author(s):

Riaz Ullah Khan ◽

Xiaosong Zhang ◽

Rajesh Kumar ◽

Abubakar Sharif ◽

Noorbakhsh Amiri Golilarz ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Network Traffic ◽

Traffic Classification ◽

Decision Tree Classifier ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Average Accuracy ◽

Final Layer ◽

Tree Classifier

In recent years, the botnets have been the most common threats to network security since it exploits multiple malicious codes like a worm, Trojans, Rootkit, etc. The botnets have been used to carry phishing links, to perform attacks and provide malicious services on the internet. It is challenging to identify Peer-to-peer (P2P) botnets as compared to Internet Relay Chat (IRC), Hypertext Transfer Protocol (HTTP) and other types of botnets because P2P traffic has typical features of the centralization and distribution. To resolve the issues of P2P botnet identification, we propose an effective multi-layer traffic classification method by applying machine learning classifiers on features of network traffic. Our work presents a framework based on decision trees which effectively detects P2P botnets. A decision tree algorithm is applied for feature selection to extract the most relevant features and ignore the irrelevant features. At the first layer, we filter non-P2P packets to reduce the amount of network traffic through well-known ports, Domain Name System (DNS). query, and flow counting. The second layer further characterized the captured network traffic into non-P2P and P2P. At the third layer of our model, we reduced the features which may marginally affect the classification. At the final layer, we successfully detected P2P botnets using decision tree Classifier by extracting network communication features. Furthermore, our experimental evaluations show the significance of the proposed method in P2P botnets detection and demonstrate an average accuracy of 98.7%.

Download Full-text

Predicting Bank Operational Efficiency Using Machine Learning Algorithm: Comparative Study of Decision Tree, Random Forest, and Neural Networks

Advances in Fuzzy Systems ◽

10.1155/2020/8581202 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Peter Appiahene ◽

Yaw Marfo Missah ◽

Ussiph Najim

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Banking Sector ◽

Banking Industry ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

And Performance

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.

Download Full-text

A Preliminary Look at Heuristic Analysis for Assessing Artificial Intelligence Explainability

WSEAS TRANSACTIONS ON COMPUTER RESEARCH ◽

10.37394/232018.2020.8.9 ◽

2020 ◽

Vol 8 ◽

pp. 61-72

Author(s):

Kara Combs ◽

Mary Fendley ◽

Trevor Bihl

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Decision Tree ◽

Human Factors ◽

Black Box ◽

Decision Processes ◽

Decision Tree Classifier ◽

Tree Classifier ◽

Explainable Ai ◽

Heuristic Analysis

Artificial Intelligence and Machine Learning (AI/ML) models are increasingly criticized for their “black-box” nature. Therefore, eXplainable AI (XAI) approaches to extract human-interpretable decision processes from algorithms have been explored. However, XAI research lacks understanding of algorithmic explainability from a human factors’ perspective. This paper presents a repeatable human factors heuristic analysis for XAI with a demonstration on four decision tree classifier algorithms.

Download Full-text

Automatic Classification of Hypertension Types Based on Personal Features by Machine Learning Algorithms

Mathematical Problems in Engineering ◽

10.1155/2020/2742781 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Majid Nour ◽

Kemal Polat

Keyword(s):

Machine Learning ◽

Blood Pressure ◽

Random Forest ◽

Decision Tree ◽

Systolic Blood Pressure ◽

Diastolic Blood Pressure ◽

Decision Tree Classifier ◽

Tree Classifier ◽

C4.5 Decision Tree

Hypertension (high blood pressure) is an important disease seen among the public, and early detection of hypertension is significant for early treatment. Hypertension is depicted as systolic blood pressure higher than 140 mmHg or diastolic blood pressure higher than 90 mmHg. In this paper, in order to detect the hypertension types based on the personal information and features, four machine learning (ML) methods including C4.5 decision tree classifier (DTC), random forest, linear discriminant analysis (LDA), and linear support vector machine (LSVM) have been used and then compared with each other. In the literature, we have first carried out the classification of hypertension types using classification algorithms based on personal data. To further explain the variability of the classifier type, four different classifier algorithms were selected for solving this problem. In the hypertension dataset, there are eight features including sex, age, height (cm), weight (kg), systolic blood pressure (mmHg), diastolic blood pressure (mmHg), heart rate (bpm), and BMI (kg/m2) to explain the hypertension status and then there are four classes comprising the normal (healthy), prehypertension, stage-1 hypertension, and stage-2 hypertension. In the classification of the hypertension dataset, the obtained classification accuracies are 99.5%, 99.5%, 96.3%, and 92.7% using the C4.5 decision tree classifier, random forest, LDA, and LSVM. The obtained results have shown that ML methods could be confidently used in the automatic determination of the hypertension types.

Download Full-text

On Internet Traffic Classification: A Two-Phased Machine Learning Approach

Journal of Computer Networks and Communications ◽

10.1155/2016/2048302 ◽

2016 ◽

Vol 2016 ◽

pp. 1-21 ◽

Cited By ~ 21

Author(s):

Taimur Bakhshi ◽

Bogdan Ghita

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Machine Learning Techniques ◽

Traffic Classification ◽

Decision Tree Classifier ◽

Internet Applications ◽

Adaptive Boosting ◽

Machine Learning Classification ◽

Computational Performance ◽

Tree Classifier

Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application throughk-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected tok-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.

Download Full-text

Enhanced Decision Tree-J48 With SMOTE Machine Learning Algorithm for Effective Botnet Detection in Imbalance Dataset

2019 15th International Conference on Electronics, Computer and Computation (ICECCO) ◽

10.1109/icecco48375.2019.9043233 ◽

2019 ◽

Author(s):

Ilyas Adeleke Jimoh ◽

Idris Ismaila ◽

Morufu Olalere

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Botnet Detection ◽

Imbalance Dataset

Download Full-text

Decision Tree With Only Two Musculoskeletal Sites to Diagnose Polymyalgia Rheumatica Using [18F]FDG PET-CT

Frontiers in Medicine ◽

10.3389/fmed.2021.646974 ◽

2021 ◽

Vol 8 ◽

Author(s):

Anthime Flaus ◽

Julie Amat ◽

Nathalie Prevot ◽

Louis Olagne ◽

Lucie Descamps ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Polymyalgia Rheumatica ◽

Fdg Pet ◽

Validation Cohort ◽

Learning Algorithm ◽

Final Diagnosis ◽

Machine Learning Algorithm ◽

Pet Ct ◽

Fdg Pet Ct

Introduction: The aim of this study was to find the best ordered combination of two FDG positive musculoskeletal sites with a machine learning algorithm to diagnose polymyalgia rheumatica (PMR) vs. other rheumatisms in a cohort of patients with inflammatory rheumatisms.Methods: This retrospective study included 140 patients who underwent [18F]FDG PET-CT and whose final diagnosis was inflammatory rheumatism. The cohort was randomized, stratified on the final diagnosis into a training and a validation cohort. FDG uptake of 17 musculoskeletal sites was evaluated visually and set positive if uptake was at least equal to that of the liver. A decision tree classifier was trained and validated to find the best combination of two positives sites to diagnose PMR. Diagnosis performances were measured first, for each musculoskeletal site, secondly for combination of two positive sites and thirdly using the decision tree created with machine learning.Results: 55 patients with PMR and 85 patients with other inflammatory rheumatisms were included. Musculoskeletal sites, used either individually or in combination of two, were highly imbalanced to diagnose PMR with a high specificity and a low sensitivity. The machine learning algorithm identified an optimal ordered combination of two sites to diagnose PMR. This required a positive interspinous bursa or, if negative, a positive trochanteric bursa. Following the decision tree, sensitivity and specificity to diagnose PMR were respectively 73.2 and 87.5% in the training cohort and 78.6 and 80.1% in the validation cohort.Conclusion: Ordered combination of two visually positive sites leads to PMR diagnosis with an accurate sensitivity and specificity vs. other rheumatisms in a large cohort of patients with inflammatory rheumatisms.

Download Full-text