Heart Disease Prediction Using Machine Learning

This paper revolves around a classification use case of machine learning in which the intention is to predict the possibility of a heart disease in an individual given certain parameters. Machine Learning is extensively being used across the world. The healthcare industry has also commenced leveraging these data driven techniques. Machine Learning can play a vital role in predicting the likelihood of locomotor disorders, Heart ailments and more such diseases because machine learning is well known for its use cases in classifying, categorizing and predicting. Such information, if predicted well, can provide key foresight to doctors who can hence mould their diagnosis and course of treatment per patient basis. The main advantage of using machine learning in healthcare is its ability to parse and process huge datasets which are beyond the scope of human abilities, and then accurately convert the derived analysis of that data into clinical insights that can aid medical practitioners round the globe in planning stratergies for providing care to patients, ultimately leading to more promising results, reduced costs of care and last but not the least , increased patient satiation and response/recovery. To simplify and solve this problem, solutions were provided using multiple supervised learning algorithms like logistic regression, Naïve Bayes, random forests, decision trees, support vector machines and K-nearest neighbours. The best accuracy was seen using random forests.

Download Full-text

Heart disease prediction using machine learning techniques : a survey

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.8.10557 ◽

2018 ◽

Vol 7 (2.8) ◽

pp. 684 ◽

Cited By ~ 12

Author(s):

V V. Ramalingam ◽

Ayantan Dandapath ◽

M Karthik Raja

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Complex Data ◽

Learning Techniques ◽

Vector Machines ◽

Supervised Learning Algorithms ◽

Life Threatening

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.

Download Full-text

Prediction of Heart Disease using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1081.0982s1019 ◽

2019 ◽

Vol 8 (2S10) ◽

pp. 474-477

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Support Vector Machines ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Data Set ◽

Vector Machines ◽

Naive Bayes Classification ◽

Naïve Bayes Classification

Machine learning is one of the fast growing aspect in current world. Machine learning (ML) and Artificial Neural Network (ANN) are helpful in detection and diagnosis of various heart diseases. Naïve Bayes Classification is a vital approach of classification in machine learning. The heart disease consists of set of range disorders affecting the heart. It includes blood vessel problems such as irregular heart beat issues, weak heart muscles, congenital heart defects, cardio vascular disease and coronary artery disease. Coronary heart disorder is a familiar type of heart disease. It reduces the blood flow to the heart leading to a heart attack. In this paper the UCI machine learning repository data set consisting of patients suffering from heart disease is analyzed using Naïve Bayes classification and support vector machines. The classification accuracy of the patients suffering from heart disease is predicted using Naïve Bayes classification and support vector machines. Implementation is done using R language.

Download Full-text

A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery

Bioinformatics ◽

10.1093/bioinformatics/btz293 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4656-4663 ◽

Cited By ~ 3

Author(s):

Oliver P Watson ◽

Isidro Cortes-Ciriano ◽

Aimee R Taylor ◽

James A Watson

Keyword(s):

Machine Learning ◽

Random Forests ◽

Ridge Regression ◽

Machine Learning Algorithms ◽

Neural Nets ◽

Loss Functions ◽

Support Vector ◽

Activity Distribution ◽

Structure Activity ◽

Vector Machines

Abstract Motivation Artificial intelligence, trained via machine learning (e.g. neural nets, random forests) or computational statistical algorithms (e.g. support vector machines, ridge regression), holds much promise for the improvement of small-molecule drug discovery. However, small-molecule structure-activity data are high dimensional with low signal-to-noise ratios and proper validation of predictive methods is difficult. It is poorly understood which, if any, of the currently available machine learning algorithms will best predict new candidate drugs. Results The quantile-activity bootstrap is proposed as a new model validation framework using quantile splits on the activity distribution function to construct training and testing sets. In addition, we propose two novel rank-based loss functions which penalize only the out-of-sample predicted ranks of high-activity molecules. The combination of these methods was used to assess the performance of neural nets, random forests, support vector machines (regression) and ridge regression applied to 25 diverse high-quality structure-activity datasets publicly available on ChEMBL. Model validation based on random partitioning of available data favours models that overfit and ‘memorize’ the training set, namely random forests and deep neural nets. Partitioning based on quantiles of the activity distribution correctly penalizes extrapolation of models onto structurally different molecules outside of the training data. Simpler, traditional statistical methods such as ridge regression can outperform state-of-the-art machine learning methods in this setting. In addition, our new rank-based loss functions give considerably different results from mean squared error highlighting the necessity to define model optimality with respect to the decision task at hand. Availability and implementation All software and data are available as Jupyter notebooks found at https://github.com/owatson/QuantileBootstrap. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Detection of Loss Zones while Drilling Using Different Machine Learning Techniques

Journal of Energy Resources Technology ◽

10.1115/1.4051553 ◽

2021 ◽

pp. 1-29

Author(s):

Ahmed Alsaihati ◽

Mahmoud Abughaban ◽

Salaheldin Elkatatny ◽

Abdulazeez Abdulraheem

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Random Forests ◽

Nearest Neighbors ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbors ◽

Learning Techniques ◽

Vector Machines ◽

Testing Set

Abstract Fluid loss into formations is a common operational issue that is frequently encountered when drilling across naturally or induced fractured formations. This could pose significant operational risks, such as well-control, stuck pipe, and wellbore instability, which, in turn, lead to an increase of well time and cost. This research aims to use and evaluate different machine learning techniques, namely: support vector machines, random forests, and K-nearest neighbors in detecting loss circulation occurrences while drilling using solely drilling surface parameters. Actual field data of seven wells, which had suffered partial or severe loss circulation, were used to build predictive models, while Well-8 was used to compare the performance of the developed models. Different performance metrics were used to evaluate the performance of the developed models. Recall, precision, and F1-score measures were used to evaluate the ability of the developed model to detect loss circulation occurrences. The results showed the K-nearest neighbors classifier achieved a high F1-score of 0.912 in detecting loss circulation occurrence in the testing set, while the random forests was the second-best classifier with almost the same F1-score of 0.910. The support vector machines achieved an F1-score of 0.83 in predicting the loss circulation occurrence in the testing set. The K-nearest neighbors outperformed other models in detecting the loss circulation occurrences in Well-8 with an F1-score of 0.80. The main contribution of this research as compared to previous studies is that it identifies losses events based on real-time measurements of the active pit volume.

Download Full-text

BioLearner: A Machine Learning-Powered Smart Heart Disease Risk Prediction System Utilizing Biomedical Markers

Journal of Interconnection Networks ◽

10.1142/s0219265921450031 ◽

2021 ◽

Author(s):

Syed Saad Amer ◽

Gurleen Wander ◽

Manmeet Singh ◽

Rami Bahsoon ◽

Nicholas R. Jennings ◽

...

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Disease Risk ◽

Support Vector ◽

Prediction System ◽

Developing Heart ◽

Nearest Neighbours ◽

Different Types ◽

History Of ◽

The Uk

Heart disease kills more people around the world than any other disease, and it is one of the leading causes of death in the UK, triggering up to 74,000 deaths per year. An essential part in the prevention of deaths by heart disease and thus heart disease itself is the analysis of biomedical markers to determine the risk of a person developing heart disease. Lots of research has been conducted to assess the accuracy of detecting heart disease by analyzing biomedical markers. However, no previous study has attempted to identify the biomedical markers which are most important in this identification. To solve this problem, we proposed a machine learning-based intelligent heart disease prediction system called BioLearner for the determination of vital biomedical markers. This study aims to improve upon the accuracy of predicting heart disease and identify the most essential biological markers. This is done with the intention of composing a set of markers that impacts the development of heart disease the most. Multiple factors determine whether or not a person develops heart disease. These factors are thought to include Age, history of chest pain (of different types), fasting blood sugar of different types, heart rate, smoking, and other essential factors. The dataset is analyzed, and the different aspects are compared. Various machine learning models such as [Formula: see text] Nearest Neighbours, Neural Networks, Support Vector Machine (SVM) are trained and used to determine the accuracy of our prediction for future heart disease development. BioLearner is able to predict the risk of heart disease with an accuracy of 95%, much higher than the baseline methods.

Download Full-text

Human Papillomavirus Targeted Immunotherapy Outcome Prediction Using Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37197 ◽

2021 ◽

Vol 9 (VII) ◽

pp. 3598-3611

Author(s):

Vidya Moni

Keyword(s):

Machine Learning ◽

Human Papillomavirus ◽

Outcome Prediction ◽

Performance Comparison ◽

Gradient Boosting ◽

Support Vector ◽

Machine Learning Classification ◽

Nearest Neighbours ◽

Vector Machines ◽

Modern Machine

Warts caused by the Human Papillomavirus (HPV) is a highly contagious disease, and affects several million people across the globe every year, in the form of small lesions on the skin, commonly known as warts. Warts can be treated effectively with several methods, the most effective being Immunotherapy and Cryotherapy. Our research is focused on the performance comparison of modern Machine Learning classification techniques to predict the outcome (positive or negative) of Immunotherapy treatment given to a patient, by using patient data as input features to our classifiers. The precision, recall, f-measure and accuracy were used to compare the performance of the various classifiers considered in this study. We considered Logistic Regression, ZeroR, AdaBoost, K-Nearest Neighbours (KNN), Support Vector Machines (SVM), Gradient Boosting, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), Decision Trees and Random Forests. The ZeroR classifier was used as a baseline to provide us with insights into the skewed nature of the data, so as to enable us to better understand the comparison in performance of the various classifiers.

Download Full-text

Comparative study of support vector machines and random forests machine learning algorithms on credit operation

Software Practice and Experience ◽

10.1002/spe.2842 ◽

2020 ◽

Cited By ~ 1

Author(s):

Germanno Teles ◽

Joel J. P. C. Rodrigues ◽

Ricardo A. L. Rabêlo ◽

Sergei A. Kozlov

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Comparative Study ◽

Random Forests ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Vector Machines

Download Full-text

Modeling and Trading the EUR/USD Exchange Rate Using Machine Learning Techniques

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.200 ◽

2012 ◽

Vol 2 (5) ◽

pp. 269-272 ◽

Cited By ~ 2

Author(s):

K. Theofilatos ◽

S. Likothanassis ◽

A. Karathanasopoulos

Keyword(s):

Machine Learning ◽

Exchange Rate ◽

Support Vector Machines ◽

Random Forests ◽

Moving Average ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Techniques ◽

Vector Machines ◽

Sharp Ratio

The present paper aims in investigating the performance of state-of-the-art machine learning techniques in trading with the EUR/USD exchange rate at the ECB fixing. For this purpose, five supervised learning classification techniques (K-Nearest Neighbors algorithm, Naïve Bayesian Classifier, Artificial Neural Networks, Support Vector Machines and Random Forests) were applied in the problem of the one day ahead movement prediction of the EUR/USD exchange rate with only autoregressive terms as inputs. For comparison reasons, the performance of all machine learning techniques was benchmarked by two traditional techniques (Naïve Strategy and moving average convergence/divergence model). Trading strategies produced by the machine learning techniques of Support Vector Machines and Random Forests clearly outperformed all other strategies in terms of annualized return and sharp ratio. To the best of our knowledge, this is the first application of Random Forests in the problem of trading with the EUR/USD exchange rate providing extremely satisfactory results.

Download Full-text

A Study on the Office Rent Estimation by the Machine Learning Methods -Focusing on the Use of Random Forests, Artificial Neural Networks, Support Vector Machines-

The Journal of Korea Real Estate Analysists Association ◽

10.19172/kreaa.26.2.2 ◽

2020 ◽

Vol 26 (2) ◽

pp. 23-53

Author(s):

Sung-Hoon 정성훈 ◽

Changha Jin

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Artificial Neural Networks ◽

Support Vector Machines ◽

Random Forests ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

Vector Machines ◽

Office Rent

Download Full-text

The upside of uncertainty: Identification of lithology contact zones from airborne geophysics and satellite data using random forests and support vector machines

Geophysics ◽

10.1190/geo2012-0411.1 ◽

2013 ◽

Vol 78 (3) ◽

pp. WB113-WB126 ◽

Cited By ~ 43

Author(s):

Matthew J. Cracknell ◽

Anya M. Reading

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Random Forests ◽

Supervised Classification ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Geophysical Data ◽

Support Vector ◽

Vector Machines ◽

Spatially Varying

Inductive machine learning algorithms attempt to recognize patterns in, and generalize from empirical data. They provide a practical means of predicting lithology, or other spatially varying physical features, from multidimensional geophysical data sets. It is for this reason machine learning approaches are increasing in popularity for geophysical data inference. A key motivation for their use is the ease with which uncertainty measures can be estimated for nonprobabilistic algorithms. We have compared and evaluated the abilities of two nonprobabilistic machine learning algorithms, random forests (RF) and support vector machines (SVM), to recognize ambiguous supervised classification predictions using uncertainty calculated from estimates of class membership probabilities. We formulated a method to establish optimal uncertainty threshold values to identify and isolate the maximum number of incorrect predictions while preserving most of the correct classifications. This is illustrated using a case example of the supervised classification of surface lithologies in a folded, structurally complex, metamorphic terrain. We found that (1) the use of optimal uncertainty thresholds significantly improves overall classification accuracy of RF predictions, but not those of SVM, by eliminating the maximum number of incorrectly classified samples while preserving the maximum number of correctly classified samples; (2) RF, unlike SVM, was able to exploit dependencies and structures contained within spatially varying input data; and (3) high RF prediction uncertainty is spatially coincident with transitions in lithology and associated contact zones, and regions of intense deformation. Uncertainty has its upside in the identification of areas of key geologic interest and has wide application across the geosciences, where transition zones are important classes in their own right. The techniques used in this study are of practical value in prioritizing subsequent geologic field activities, which, with the aid of this analysis, may be focused on key lithology contacts and problematic localities.

Download Full-text