Predicting Chronic Kidney Disease Using KNN Algorithm

2021 ◽  
Vol 1 (2) ◽  
pp. 16-24
Author(s):  
V Mareeswari ◽  
Sunita Chalageri ◽  
Kavita K Patil

Chronic kidney disease (CKD) is a world heath issues, and that also includes damages and can’t filter blood the way it should be. since we cannot predict the early stages of CKD, patience will fail to recognise the disease. Pre detection of CKD will allow patience to get timely facility to ameliorate the progress of the disease. Machine learning models will effectively aid clinician’s progress this goal because of the early and accurate recognition performances. The CKD data set is collected from the University of California Irvine (UCI) Machine Learning Recognition. Multiple Machine and deep learning algorithm used to predict the chronic kidney disease.

PLoS ONE ◽  
2020 ◽  
Vol 15 (6) ◽  
pp. e0233976 ◽  
Author(s):  
Erik Dovgan ◽  
Anton Gradišek ◽  
Mitja Luštrek ◽  
Mohy Uddin ◽  
Aldilas Achmad Nursetyo ◽  
...  

2016 ◽  
Vol 5 (2) ◽  
pp. 64-72 ◽  
Author(s):  
Alexander Arman Serpen

This research study employed a machine learning algorithm on actual patient data to extract decision making rules that can be used to diagnose chronic kidney disease. The patient data set entails a number of health-related attributes or indicators and contains 250 patients positive for chronic kidney disease. The C4.5 decision tree algorithm was applied to the patient data to formulate a set of diagnosis rules for chronic kidney disease. The C4.5 algorithm utilizing 3-fold cross validation achieved 98.25% prediction accuracy and thus correctly classified 393 instances and incorrectly classified 7 instances for a total patient count of 400. The extracted rule set highlighted the need to monitor serum creatinine levels in patients as the primary indicator for the presence of disease. Secondary indicators were pedal edema, hemoglobin, diabetes mellitus and specific gravity. The set of rules provides a preliminary screening tool towards conclusive diagnosis of the chronic kidney disease by nephrologists following timely referral by the primary care providers or decision-making algorithms.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ye Rang Park ◽  
Young Jae Kim ◽  
Woong Ju ◽  
Kyehyun Nam ◽  
Soonyung Kim ◽  
...  

AbstractCervical cancer is the second most common cancer in women worldwide with a mortality rate of 60%. Cervical cancer begins with no overt signs and has a long latent period, making early detection through regular checkups vitally immportant. In this study, we compare the performance of two different models, machine learning and deep learning, for the purpose of identifying signs of cervical cancer using cervicography images. Using the deep learning model ResNet-50 and the machine learning models XGB, SVM, and RF, we classified 4119 Cervicography images as positive or negative for cervical cancer using square images in which the vaginal wall regions were removed. The machine learning models extracted 10 major features from a total of 300 features. All tests were validated by fivefold cross-validation and receiver operating characteristics (ROC) analysis yielded the following AUCs: ResNet-50 0.97(CI 95% 0.949–0.976), XGB 0.82(CI 95% 0.797–0.851), SVM 0.84(CI 95% 0.801–0.854), RF 0.79(CI 95% 0.804–0.856). The ResNet-50 model showed a 0.15 point improvement (p < 0.05) over the average (0.82) of the three machine learning methods. Our data suggest that the ResNet-50 deep learning algorithm could offer greater performance than current machine learning models for the purpose of identifying cervical cancer using cervicography images.


2021 ◽  
Vol 14 (3) ◽  
pp. 119
Author(s):  
Fabian Waldow ◽  
Matthias Schnaubelt ◽  
Christopher Krauss ◽  
Thomas Günter Fischer

In this paper, we demonstrate how a well-established machine learning-based statistical arbitrage strategy can be successfully transferred from equity to futures markets. First, we preprocess futures time series comprised of front months to render them suitable for our returns-based trading framework and compile a data set comprised of 60 futures covering nearly 10 trading years. Next, we train several machine learning models to predict whether the h-day-ahead return of each future out- or underperforms the corresponding cross-sectional median return. Finally, we enter long/short positions for the top/flop-k futures for a duration of h days and assess the financial performance of the resulting portfolio in an out-of-sample testing period. Thereby, we find the machine learning models to yield statistically significant out-of-sample break-even transaction costs of 6.3 bp—a clear challenge to the semi-strong form of market efficiency. Finally, we discuss sources of profitability and the robustness of our findings.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Martine De Cock ◽  
Rafael Dowsley ◽  
Anderson C. A. Nascimento ◽  
Davis Railsback ◽  
Jianwei Shen ◽  
...  

Abstract Background In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao’s garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.


Author(s):  
Maicon Herverton Lino Ferreira da Silva Barros ◽  
Geovanne Oliveira Alves ◽  
Lubnnia Morais Florêncio Souza ◽  
Élisson da Silva Rocha ◽  
João Fausto Lorenzato de Oliveira ◽  
...  

Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1,139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-layer Perceptron (MLP) models is the best model to predict the cure class.


2016 ◽  
Vol 23 (2) ◽  
pp. 124 ◽  
Author(s):  
Douglas Detoni ◽  
Cristian Cechinel ◽  
Ricardo Araujo Matsumura ◽  
Daniela Francisco Brauner

Student dropout is one of the main problems faced by distance learning courses. One of the major challenges for researchers is to develop methods to predict the behavior of students so that teachers and tutors are able to identify at-risk students as early as possible and provide assistance before they drop out or fail in their courses. Machine Learning models have been used to predict or classify students in these settings. However, while these models have shown promising results in several settings, they usually attain these results using attributes that are not immediately transferable to other courses or platforms. In this paper, we provide a methodology to classify students using only interaction counts from each student. We evaluate this methodology on a data set from two majors based on the Moodle platform. We run experiments consisting of training and evaluating three machine learning models (Support Vector Machines, Naive Bayes and Adaboost decision trees) under different scenarios. We provide evidences that patterns from interaction counts can provide useful information for classifying at-risk students. This classification allows the customization of the activities presented to at-risk students (automatically or through tutors) as an attempt to avoid students drop out.


CrystEngComm ◽  
2017 ◽  
Vol 19 (27) ◽  
pp. 3737-3745 ◽  
Author(s):  
Max Pillong ◽  
Corinne Marx ◽  
Philippe Piechon ◽  
Jerome G. P. Wicker ◽  
Richard I. Cooper ◽  
...  

A publicly available crystallisation database for clusters of highly similar compounds is used to build machine learning models.


2017 ◽  
Vol 11 (04) ◽  
pp. 497-511
Author(s):  
Elnaz Davoodi ◽  
Leila Kosseim ◽  
Matthew Mongrain

This paper evaluates the effect of the context of a target word on the identification of complex words in natural language texts. The approach automatically tags words as either complex or not, based on two sets of features: base features that only pertain to the target word, and contextual features that take the context of the target word into account. We experimented with several supervised machine learning models, and trained and tested the approach with the 2016 SemEval Word Complexity Data Set. Results show that when discriminating base features are used, the words around the target word can supplement those features and improve the recognition of complex words.


Sign in / Sign up

Export Citation Format

Share Document