scholarly journals Early prediction of diabetes using Feature Transformation and hybrid Random Forest Algorithm

Diabetes is the most common chronic disease among the world. Early prediction of these will assist the physicians to provide the improved treatment. Machine learning approaches are widely used for predicting the disease at the earlier stage. However the selecting the significant features and the suitable classifier are still reduces the diagnosis accuracy. In this paper the PCA based feature transformation and the hybrid random forest classifier is utilized for diabetes prediction. PCA attempt to identify the best subset of transformed components that greatly improves the classification result. The system is compared with priori machine learning approaches to evaluate the efficiency of this work. The experimental result shows that the present study enhances the prediction accuracy.

This paper introduces a new decision tree algorithm Diabetes Prediction Algorithm (DPA), for the early prediction of diabetes based on the datasets. The datasets are collected by using Internet of Things (IOT) Diabetes Sensors, comprises of 15000 records, out of which 11250 records are used for training purpose and 3750 are used for testing purpose. The proposed algorithm DPA yielded an accuracy of 90.02 %, specificity of 92.60 %, and precision of 89.17% and error rate of 9.98%. further, the proposed algorithm is compared with existing approaches. Currently there are numerous algorithms available which are not complete accurate and DPA helps.


Author(s):  
KM Jyoti Rani

Diabetes is a chronic disease with the potential to cause a worldwide health care crisis. According to International Diabetes Federation 382 million people are living with diabetes across the whole world. By 2035, this will be doubled as 592 million. Diabetes is a disease caused due to the increase level of blood glucose. This high blood glucose produces the symptoms of frequent urination, increased thirst, and increased hunger. Diabetes is a one of the leading cause of blindness, kidney failure, amputations, heart failure and stroke. When we eat, our body turns food into sugars, or glucose. At that point, our pancreas is supposed to release insulin. Insulin serves as a key to open our cells, to allow the glucose to enter and allow us to use the glucose for energy. But with diabetes, this system does not work. Type 1 and type 2 diabetes are the most common forms of the disease, but there are also other kinds, such as gestational diabetes, which occurs during pregnancy, as well as other forms. Machine learning is an emerging scientific field in data science dealing with the ways in which machines learn from experience. The aim of this project is to develop a system which can perform early prediction of diabetes for a patient with a higher accuracy by combining the results of different machine learning techniques. The algorithms like K nearest neighbour, Logistic Regression, Random forest, Support vector machine and Decision tree are used. The accuracy of the model using each of the algorithms is calculated. Then the one with a good accuracy is taken as the model for predicting the diabetes.


Author(s):  
Mohammed Faim Uddin Bhuiyan ◽  
Md. Tanzim Rahman ◽  
Mehfuj Ahmed Anik ◽  
Musharrat Khan

Author(s):  
Sunhae Kim ◽  
Hye-Kyung Lee ◽  
Kounseok Lee

(1) Background: The Patient Health Questionnaire-9 (PHQ-9) is a tool that screens patients for depression in primary care settings. In this study, we evaluated the efficacy of PHQ-9 in evaluating suicidal ideation (2) Methods: A total of 8760 completed questionnaires collected from college students were analyzed. The PHQ-9 was scored in combination with and evaluated against four categories (PHQ-2, PHQ-8, PHQ-9, and PHQ-10). Suicidal ideations were evaluated using the Mini-International Neuropsychiatric Interview suicidality module. Analyses used suicide ideation as the dependent variable, and machine learning (ML) algorithms, k-nearest neighbors, linear discriminant analysis (LDA), and random forest. (3) Results: Random forest application using the nine items of the PHQ-9 revealed an excellent area under the curve with a value of 0.841, with 94.3% accuracy. The positive and negative predictive values were 84.95% (95% CI = 76.03–91.52) and 95.54% (95% CI = 94.42–96.48), respectively. (4) Conclusion: This study confirmed that ML algorithms using PHQ-9 in the primary care field are reliably accurate in screening individuals with suicidal ideation.


2020 ◽  
Author(s):  
Aaron Cardenas-Martinez ◽  
Victor Rodriguez-Galiano ◽  
Juan Antonio Luque-Espinar ◽  
Maria Paula Mendes

<p>The establishment of the sources and driven-forces of groundwater nitrate pollution is of paramount importance, contributing to agro-environmental measures implementation and evaluation. High concentrations of nitrates in groundwater occur all around the world, in rich and less developed countries.</p><p>In the case of Spain, 21.5% of the wells of the groundwater quality monitoring network showed mean concentrations above the quality standard (QS) of 50 mg/l. The objectives of this work were: i) to predict the current probability of having nitrate concentrations above the QS in Andalusian groundwater bodies (Spain) using past time features, being some of them obtained from satellite observations; ii) to assess the importance of features in the prediction; iii) to evaluate different machine learning approaches (ML) and feature selection techniques (FS).</p><p>Several predictive models based on an ML algorithm, the Random Forest, were used, as well as, FS techniques. 321 nitrate samples and respective predictive features were obtained from different groundwater bodies. These predictive features were divided into three groups, regarding their focus: agricultural production (phenology); livestock pressure (excretion rates); and environmental settings (soil characteristics and texture, geomorphology, and local climate conditions). Models were trained with the features of a year [YEAR (t<sub>0</sub>)], and then applied to new features obtained for the next year – [YEAR(t<sub>0+1</sub>)], performing k-fold cross-validation. Additionally, a further prediction was carried out for a present time – [YEAR(t<sub>0+n</sub>)], validating with an independent test. This methodology examined the use of a model, trained with previous nitrates concentrations and predictive features, for the prediction of current nitrates concentrations based on present features. Our findings showed an improvement in the predictive performance when using a wrapper with sequential search for FS when compared to the use alone of the Random Forest algorithm. Phenology features, derived from remotely sensed variables, were the most explanative features, performing better than the use of static land-use maps or vegetation index images (e.g., NDVI). They also provided much more comprehensive information, and more importantly, employing only extrinsic features of groundwater bodies.</p>


2020 ◽  
pp. 1-24
Author(s):  
TINGBIN BIAN ◽  
JIN CHEN ◽  
QU FENG ◽  
JINGYI LI

We aim to compare econometric analyses with machine learning approaches in the context of Singapore private property market using transaction data covering the period of 1995–2018. A hedonic model is employed to quantify the premiums of important attributes and amenities, with a focus on the premium of distance to nearest Mass Rapid Transit (MRT) stations. In the meantime, an investigation using machine learning algorithms under three categories — LASSO, random forest and artificial neural networks is conducted in the same context with deeper insights on importance of determinants of property prices. The results suggest that the MRT distance premium is significant and moving 100[Formula: see text]m closer from the mean distance point to the nearest MRT station would increase the overall transacted price by about 15,000 Singapore dollars (SGD). Machine learning approaches generally achieve higher prediction accuracy and heterogeneous property age premium is suggested by LASSO. Using random forest algorithm, we find that property prices are mostly affected by key macroeconomic factors, such as the time of sale, as well as the size and floor level of property. Finally, an appraisal on different approaches is provided for researchers to utilize additional data sources and data-driven approaches to exploit potential causal effects in economic studies.


Author(s):  
Krishna Kumar Mohbey

In any industry, attrition is a big problem, whether it is about employee attrition of an organization or customer attrition of an e-commerce site. If we can accurately predict which customer or employee will leave their current company or organization, then it will save much time, effort, and cost of the employer and help them to hire or acquire substitutes in advance, and it would not create a problem in the ongoing progress of an organization. In this chapter, a comparative analysis between various machine learning approaches such as Naïve Bayes, SVM, decision tree, random forest, and logistic regression is presented. The presented result will help us in identifying the behavior of employees who can be attired over the next time. Experimental results reveal that the logistic regression approach can reach up to 86% accuracy over other machine learning approaches.


Sign in / Sign up

Export Citation Format

Share Document