scholarly journals Diabetes Prediction Using Machine Learning

Author(s):  
KM Jyoti Rani

Diabetes is a chronic disease with the potential to cause a worldwide health care crisis. According to International Diabetes Federation 382 million people are living with diabetes across the whole world. By 2035, this will be doubled as 592 million. Diabetes is a disease caused due to the increase level of blood glucose. This high blood glucose produces the symptoms of frequent urination, increased thirst, and increased hunger. Diabetes is a one of the leading cause of blindness, kidney failure, amputations, heart failure and stroke. When we eat, our body turns food into sugars, or glucose. At that point, our pancreas is supposed to release insulin. Insulin serves as a key to open our cells, to allow the glucose to enter and allow us to use the glucose for energy. But with diabetes, this system does not work. Type 1 and type 2 diabetes are the most common forms of the disease, but there are also other kinds, such as gestational diabetes, which occurs during pregnancy, as well as other forms. Machine learning is an emerging scientific field in data science dealing with the ways in which machines learn from experience. The aim of this project is to develop a system which can perform early prediction of diabetes for a patient with a higher accuracy by combining the results of different machine learning techniques. The algorithms like K nearest neighbour, Logistic Regression, Random forest, Support vector machine and Decision tree are used. The accuracy of the model using each of the algorithms is calculated. Then the one with a good accuracy is taken as the model for predicting the diabetes.

2020 ◽  
Vol 17 (8) ◽  
pp. 3449-3452
Author(s):  
M. S. Roobini ◽  
Y. Sai Satwick ◽  
A. Anil Kumar Reddy ◽  
M. Lakshmi ◽  
D. Deepa ◽  
...  

In today’s world diabetes is the major health challenges in India. It is a group of a syndrome that results in too much sugar in the blood. It is a protracted condition that affects the way the body mechanizes the blood sugar. Prevention and prediction of diabetes mellitus is increasingly gaining interest in medical sciences. The aim is how to predict at an early stage of diabetes using different machine learning techniques. In this paper basically, we use well-known classification that are Decision tree, K-Nearest Neighbors, Support Vector Machine, and Random forest. These classification techniques used with Pima Indians diabetes dataset. Therefore, we predict diabetes at different stage and analyze the performance of different classification techniques. We Also proposed a conceptual model for the prediction of diabetes mellitus using different machine learning techniques. In this paper we also compare the accuracy of the different machine learning techniques to finding the diabetes mellitus at early stage.


2020 ◽  
Vol 24 (106) ◽  
pp. 79-87
Author(s):  
Fredy Humberto Troncoso Espinosa ◽  
Javiera Valentina Ruiz Tapia

La fuga de clientes es un problema relevante al que enfrentan las empresas de servicios y que les puede generar pérdidas económicas significativas. Identificar los elementos que llevan a un cliente a dejar de consumir un servicio es una tarea compleja, sin embargo, mediante su comportamiento es posible estimar una probabilidad de fuga asociada a cada uno de ellos. Esta investigación aplica minería de datos para la predicción de la fuga de clientes en una empresa de distribución de gas natural, mediante dos técnicas de machine learning: redes neuronales y support vector machine. Los resultados muestran que mediante la aplicación de estas técnicas es posible identificar los clientes con mayor probabilidad de fuga para tomar sobre estas acciones de retenciónoportunas y focalizadas, minimizando los costos asociados al error en la identificación de estos clientes. Palabras Clave: fuga de clientes, minería de datos, machine learning, distribución de gas natural. Referencias [1]J. Miranda, P. Rey y R. Weber, «Predicción de Fugas de Clientes para una Institución Financiera Mediante Support Vector Machines,» Revista Ingeniería de Sistemas Volumen XIX, pp. 49-68, 2005. [2]P. A. Pérez V., «Modelo de predicción de fuga de clientes de telefonía movil post pago,» Universidad de Chile, Santiago, Chile, 2014. [3]Gas Sur S.A., «https://www.gassur.cl/Quienes-Somos/,» [En línea]. [4]J. Xiao, X. Jiang, C. He y G. Teng, «Churn prediction in customer relationship management via GMDH-based multiple classifiers ensemble,» IEEE IntelligentSystems, vol. 31, nº 2, pp. 37-44, 2016. [5]A. M. Almana, M. S. Aksoy y R. Alzahrani, «A survey on data mining techniques in customer churn analysis for telecom industry,» International Journal of Engineering Research and Applications, vol. 4, nº 5, pp. 165-171, 2014. [6]A. Jelvez, M. Moreno, V. Ovalle, C. Torres y F. Troncoso, «Modelo predictivo de fuga de clientes utilizando mineríaa de datos para una empresa de telecomunicaciones en chile,» Universidad, Ciencia y Tecnología, vol. 18, nº 72, pp. 100-109, 2014. [7]D. Anil Kumar y V. Ravi, «Predicting credit card customer churn in banks using data mining,» International Journal of Data Analysis Techniques and Strategies, vol. 1, nº 1, pp. 4-28, 2008. [8]E. Aydoğan, C. Gencer y S. Akbulut, «Churn analysis and customer segmentation of a cosmetics brand using data mining techniques,» Journal of Engineeringand Natural Sciences, vol. 26, nº 1, 2008. [9]G. Dror, D. Pelleg, O. Rokhlenko y I. Szpektor, «Churn prediction in new users of Yahoo! answers,» de Proceedings of the 21st International Conference onWorld Wide Web, 2012. [10]T. Vafeiadis, K. Diamantaras, G. Sarigiannidis y K. Chatzisavvas, «A comparison of machine learning techniques for customer churn prediction,» SimulationModelling Practice and Theory, vol. 55, pp. 1-9, 2015. [11]Y. Xie, X. Li, E. Ngai y W. Ying, «Customer churn prediction using improved balanced random forests,» Expert Systems with Applications, vol. 36, nº 3, pp.5445-5449, 2009. [12]U. Fayyad, G. Piatetsky-Shapiro y P. Smyth, «Knowledge Discovery and Data Mining: Towards a Unifying Framework,» de KDD-96 Proceedings, 1996. [13]R. Brachman y T. Anand, «The process of knowledge discovery in databases,» de Advances in knowledge discovery and data mining, 1996. [14]K. Lakshminarayan, S. Harp, R. Goldman y T. Samad, «Imputation of Missing Data Using Machine Learning Techniques,» de KDD, 1996. [15]B. Nguyen , J. L. Rivero y C. Morell, «Aprendizaje supervisado de funciones de distancia: estado del arte,» Revista Cubana de Ciencias Informáticas, vol. 9, nº 2, pp. 14-28, 2015. [16]I. Monedero, F. Biscarri, J. Guerrero, M. Peña, M. Roldán y C. León, «Detection of water meter under-registration using statistical algorithms,» Journal of Water Resources Planning and Management, vol. 142, nº 1, p. 04015036, 2016. [17]I. Guyon y A. Elisseeff, «An introduction to variable and feature selection,» Journal of machine learning research, vol. 3, nº Mar, pp. 1157-1182, 2003. [18]K. Polat y S. Güneş, «A new feature selection method on classification of medical datasets: Kernel F-score feature selection,» Expert Systems with Applications, vol. 36, nº 7, pp. 10367-10373, 2009. [19]D. J. Matich, «Redes Neuronales. Conceptos Básicos y Aplicaciones,» de Cátedra: Informática Aplicada ala Ingeniería de Procesos- Orientación I, 2001. [20]E. Acevedo M., A. Serna A. y E. Serna M., «Principios y Características de las Redes Neuronales Artificiales, » de Desarrollo e Innovación en Ingeniería, Medellín, Editorial Instituto Antioqueño de Investigación, 2017, pp. Capítulo 10, 173-182. [21]M. Hofmann y R. Klinkenberg, RapidMiner: Data mining use cases and business analytics applications, CRC Press, 2016. [22]R. Pupale, «Towards Data Science,» 2018. [En línea]. Disponible: https://towardsdatascience.com/https-medium-com-pupalerushikesh-svm-f4b42800e989. [23]F. H. Troncoso Espinosa, «Prediction of recidivismin thefts and burglaries using machine learning,» Indian Journal of Science and Technology, vol. 13, nº 6, pp. 696-711, 2020. [24]L. Tashman, «Out-of-sample tests of forecasting accuracy: an analysis and review,» International journal of forecasting, vol. 16, nº 4, pp. 437-450, 2000. [25]S. Varma y R. Simon, «Bias in error estimation when using cross-validation for model selection,» BMC bioinformatics, vol. 7, nº 1, p. 91, 2006. [26]N. V. Chawla, K. W. Bowyer, L. O. Hall y W. Kegelmeyer, «SMOTE: Synthetic Minority Over-sampling Technique,» Journal of Artificial Inteligence Research16, pp. 321-357, 2002. [27]M. Sokolova y G. Lapalme, «A systematic analysis of performance measures for classification tasks,» Information processing & management, vol. 45, nº 4, pp. 427-437, 2009. [28]S. Narkhede, «Understanding AUC-ROC Curve,» Towards Data Science, vol. 26, 2018. [29]R. Westermann y W. Hager, «Error Probabilities in Educational and Psychological Research,» Journal of Educational Statistics, Vol 11, No 2, pp. 117-146, 1986.  


2022 ◽  
Vol 19 ◽  
pp. 1-9
Author(s):  
Nikhil Bora ◽  
Sreedevi Gutta ◽  
Ahmad Hadaegh

Heart Disease has become one of the most leading cause of the death on the planet and it has become most life-threatening disease. The early prediction of the heart disease will help in reducing death rate. Predicting Heart Disease has become one of the most difficult challenges in the medical sector in recent years. As per recent statistics, about one person dies from heart disease every minute. In the realm of healthcare, a massive amount of data was discovered for which the data-science is critical for analyzing this massive amount of data. This paper proposes heart disease prediction using different machine-learning algorithms like logistic regression, naïve bayes, support vector machine, k nearest neighbor (KNN), random forest, extreme gradient boost, etc. These machine learning algorithm techniques we used to predict likelihood of person getting heart disease on the basis of features (such as cholesterol, blood pressure, age, sex, etc. which were extracted from the datasets. In our research we used two separate datasets. The first heart disease dataset we used was collected from very famous UCI machine learning repository which has 303 record instances with 14 different attributes (13 features and one target) and the second dataset that we used was collected from Kaggle website which contained 1190 patient’s record instances with 11 features and one target. This dataset is a combination of 5 popular datasets for heart disease. This study compares the accuracy of various machine learning techniques. In our research, for the first dataset we got the highest accuracy of 92% by Support Vector Machine (SVM). And for the second dataset, Random Forest gave us the highest accuracy of 94.12%. Then, we combined both the datasets which we used in our research for which we got the highest accuracy of 93.31% using Random Forest.


2021 ◽  
Author(s):  
Leonardo Deiss ◽  
Shameema Oottikkal ◽  
Karen Tomko ◽  
Wanyu Huang ◽  
Steve Culman ◽  
...  

<p>Soil infrared spectroscopy has great potential for estimating soil properties, but reference soil measurements are typically required in combination with multivariate statistical models to estimate soil properties. User-friendly predictive tools based on open-source statistical environment remain one of the main limitations to enable technology diffusion to non-specialist users. Our aim is to build capacity for an automated machine learning routine for rapid and robust prediction of soil health indicators using lab acquired soil infrared spectra. This intelligent system runs on R statistical environment and includes (1) a diverse soil spectral library comprising main physiographic regions from the USA Midwest region under diverse land uses and various sampling depths, (2) a classification process to detect potential outliers in newly acquired spectra using supervised machine learning techniques, and (3) a multi-model optimized prediction process based on linear and non-linear statistical procedures (partial least squares, support vector machines, and neural network). This prediction system works at the intersection of soil and data science and high-performance computing to enable efficient parallel processing of spectral data on multi-core coprocessors. Using artificial intelligence to automate soil infrared spectroscopy is a fundamental demand that will make this technique an effective routine in soil laboratories to estimate soil health.</p>


Author(s):  
Anton Ovchinnikov ◽  
Scotiabank Scholar

This case, along with its B case (UVA-QA-0865), is an effective vehicle for introducing students to the use of machine-learning techniques for classification. The specific context is predicting customer retention based on a wide range of customer attributes/features. The specific techniques could include (but are not limited to): regressions (linear and logistic), variable selection (forward/backward and stepwise), regularizations (e.g., LASSO), classification and regression trees (CART), random forests, graduate boosted trees (xgboost), neural networks, and support vector machines (SVM).The case is suitable for an advanced data analysis (data science, machine learning, and artificial intelligence) class at all levels: upper-level business undergraduate, MBA, EMBA, as well as specialized graduate or undergraduate programs in analytics (e.g., masters of science in business analytics [MSBA] and masters of management analytics [MMA]) and/or in management (e.g., masters of science in management [MScM] and masters in management [MiM, MM]).The teaching note for the case contains the pedagogy and the analyses, alongside the detailed explanations of the various techniques and their implementations in R (code provided in Exhibits and supplementary files). Python code, as well as the spreadsheet implementation in XLMiner, are available upon request.


2019 ◽  
Vol 8 (3) ◽  
pp. 1268-1271

On the 15th of April, 1912 the titanic witnessed a disaster resulting in the sinking of her passengers on the maiden voyage near North Atlantic. Even though it is a very long time since this maritime disaster took place, the idea behind what impacts each individual survival is still a great research attracting researcher’s attention. The approach taken in this paper is to utilize the publically available data set from website called Kaggle. Kaggle is a popular data science webpage that put together information of people in the titanic into a data set for the data mining competition: “Titanic: Machine Learning from Disaster”. The research and comparisons in this paper uses a few machine learning techniques and algorithms to analyse the data for classification and prediction of survivors. The prediction and efficiency of these algorithms depend greatly on data analysis and model. The techniques used to do so are Random Forest, Support Vector Machine, Gradient Boosting Machine.


2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


Author(s):  
Anantvir Singh Romana

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.


2020 ◽  
Author(s):  
Azhagiya Singam Ettayapuram Ramaprasad ◽  
Phum Tachachartvanich ◽  
Denis Fourches ◽  
Anatoly Soshilov ◽  
Jennifer C.Y. Hsieh ◽  
...  

Perfluoroalkyl and Polyfluoroalkyl Substances (PFASs) pose a substantial threat as endocrine disruptors, and thus early identification of those that may interact with steroid hormone receptors, such as the androgen receptor (AR), is critical. In this study we screened 5,206 PFASs from the CompTox database against the different binding sites on the AR using both molecular docking and machine learning techniques. We developed support vector machine models trained on Tox21 data to classify the active and inactive PFASs for AR using different chemical fingerprints as features. The maximum accuracy was 95.01% and Matthew’s correlation coefficient (MCC) was 0.76 respectively, based on MACCS fingerprints (MACCSFP). The combination of docking-based screening and machine learning models identified 29 PFASs that have strong potential for activity against the AR and should be considered priority chemicals for biological toxicity testing.


2020 ◽  
Author(s):  
Nalika Ulapane ◽  
Karthick Thiyagarajan ◽  
sarath kodagoda

<div>Classification has become a vital task in modern machine learning and Artificial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classification. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classifier performance. In this paper, we consider the case of a given supervised learning classification task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classification performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classification accuracy of a Support Vector Machine (SVM) classifier increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>


Sign in / Sign up

Export Citation Format

Share Document