Comparison of different Artificial Intelligence techniques to predict Diabetic Kidney Disease (Preprint)

Mapping Intimacies ◽

10.2196/preprints.22636 ◽

2020 ◽

Author(s):

Satish Kumar ◽

Mohamed Rafiullah ◽

Khalid Siddiqui

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

High Risk ◽

Random Forest ◽

Diabetic Kidney Disease ◽

Mean Squared Error ◽

Confusion Matrix ◽

Classification Techniques ◽

Squared Error ◽

Classification Technique

BACKGROUND Diabetic kidney disease (DKD) is a progressive disease that leads to loss of kidney function. As early intervention improves patient outcomes, it is essential to identify the patients who are at high risk of developing DKD. Artificial Intelligence methods apply different machine learning classification techniques to identify high-risk patients by building a predictive model from a given dataset. OBJECTIVE This study aims to find an accurate classification technique for predicting DKD by comparing different classification techniques applied to a DKD dataset using WEKA machine learning software. METHODS We analyzed the performance of nine different classification techniques on a DKD dataset with 410 instances and 18 attributes. 66% of the dataset was used to build a model, and 33% of the data was used for evaluating the model. The performance of classification techniques were assessed based on their execution time, accuracy, correctly and incorrectly classified instances, kappa statistics (K), mean absolute error, root mean squared error and true values of the confusion matrix. RESULTS Random Forest classifier was found to be the best performing technique with an accuracy of 76.5854% and a higher K value (0.5306) in comparison to other classifiers. Besides, it also showed the lowest root mean squared error rate (0.4007). From the confusion matrix, it was found that there were 46 false-positive instances and 50 false-negative instances from the Random Forest technique. CONCLUSIONS This study identified the Random Forest classification technique as the best performing classifier and accurate prediction method for DKD. CLINICALTRIAL NA

Download Full-text

Linear Regression Algorithm in Machine Learning through MATLAB

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39410 ◽

2021 ◽

Vol 9 (12) ◽

pp. 989-995

Author(s):

Kalva Sindhu Priya

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Deep Learning ◽

Linear Regression ◽

Mean Squared Error ◽

Critical Part ◽

Squared Error ◽

Master Level ◽

Predicted Model ◽

The Given

Abstract: In the present scenario, it is quite aware that almost every field is moving into machine based automation right from fundamentals to master level systems. Among them, Machine Learning (ML) is one of the important tool which is most similar to Artificial Intelligence (AI) by allowing some well known data or past experience in order to improve automatically or estimate the behavior or status of the given data through various algorithms. Modeling a system or data through Machine Learning is important and advantageous as it helps in the development of later and newer versions. Today most of the information technology giants such as Facebook, Uber, Google maps made Machine learning as a critical part of their ongoing operations for the better view of users. In this paper, various available algorithms in ML is given briefly and out of all the existing different algorithms, Linear Regression algorithm is used to predict a new set of values by taking older data as reference. However, a detailed predicted model is discussed clearly by building a code with the help of Machine Learning and Deep Learning tool in MATLAB/ SIMULINK. Keywords: Machine Learning (ML), Linear Regression algorithm, Curve fitting, Root Mean Squared Error

Download Full-text

Protein Classification using Machine Learning and Statistical Techniques

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666190925163758 ◽

2019 ◽

Vol 13 ◽

Author(s):

Chhote Lal Prasad Gupta ◽

Anand Bihari ◽

Sudhakar Tripathi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Protein Classification ◽

Classification Techniques ◽

Random Forest Classification ◽

Machine Learning Classification ◽

Classification Technique ◽

Human Enzyme ◽

Clinical Verification ◽

Enzyme Class

Background: In recent era prediction of enzyme class from an unknown protein is one of the challenging tasks in bioinformatics. Day to day the number of proteins increases that causes difficulties in clinical verification and classification; as a result, the prediction of enzyme class gives a new opportunity to bioinformatics scholars. The machine learning classification technique helps in protein classification and predictions. But it is imperative to know which classification technique is more suited for protein classification. This study used human proteins data that is extracted from UniProtKB databank. Total 4368 protein data with 45 identified features has been used for experimental analysis. Objective: The prime objective of this article is to find an appropriate classification technique to classify the reviewed as well as un-reviewed human enzyme class of protein data. Also find the significance of different features in protein classification and prediction. Method: In this article, the ten most significant classification techniques such as CRT, QUEST, CHAID, C5.0, ANN, SVM, Bayesian, Random Forest, XgBoost and CatBoost has been used to classify the data and know the importance of features. To validate the result of different classification technique, the accuracy, precision, recall, F-measures, sensitivity, specificity, MCC, ROC and AUROC has been used. All experiment has been done with the help of SPSS Clementine and Python. Result: Above discussed classification techniques give different results and found that the data are imbalanced for class C4, C5, and C6. As a result, all of the classification technique gives acceptable accuracy above of 60% for these classes of data, but their precision value is very less or negligible. The experimental results highlight that the Random forest gives highest accuracy as well as AUROC among all, i.e., 96.84% and 0.945 respectively. And also have high precision and recall value. Conclusion: The experiment conducted and analyzed in this article highlight that the Random Forest classification technique can be used for protein of human enzyme classification and predictions.

Download Full-text

Random Forest Regression-Based Machine Learning Model for Accurate Estimation of Fluid Flow in Curved Pipes

Processes ◽

10.3390/pr9112095 ◽

2021 ◽

Vol 9 (11) ◽

pp. 2095

Author(s):

Ganesh N. ◽

Paras Jain ◽

Amitava Choudhury ◽

Prasun Dutta ◽

Kanak Kalita ◽

...

Keyword(s):

Machine Learning ◽

Fluid Flow ◽

Random Forest ◽

Mean Squared Error ◽

Flow Characteristics ◽

Accurate Estimation ◽

Random Forest Regression ◽

Curved Pipes ◽

Squared Error ◽

Computationally Expensive

In industrial piping systems, turbomachinery, heat exchangers etc., pipe bends are essential components. Computational fluid dynamics (CFD), which is frequently used to analyse the flow behaviour in such systems, provides extremely precise estimates but is computationally expensive. As a result, a computationally efficient method is developed in this paper by leveraging machine learning for such computationally expensive CFD problems. Random forest regression (RFR) is used as the machine learning algorithm in this work. Four different fluid flow characteristics (i.e., axial velocity, x-velocity, y-velocity and z-velocity) are studied in this work. The accuracy of the RFR models is assessed by using a number of statistical metrics such as mean-absolute error (MAE), mean-squared-error (MSE), root-mean-squared-error (RMSE), maximum error (Max.Error) and median error (Med.Error) etc. It is observed that the RFR models can produce considerable cost reductions in computing by surrogating the CFD model. Minor loss in estimation accuracy as compared to the CFD models is observed. While the magnitude of intricate flow characteristics such as the additional vortices are correctly predicted, some error in their location is observed.

Download Full-text

Predicting Future Products Rate using Machine Learning Algorithms

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2020.05.04 ◽

2020 ◽

Vol 12 (5) ◽

pp. 41-51

Author(s):

Shaimaa Mahmoud ◽

◽

Mahmoud Hussein ◽

Arabi Keshk

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Mean Squared Error ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Regression ◽

Data Set ◽

Squared Error

Opinion mining in social networks data is considered as one of most important research areas because a large number of users interact with different topics on it. This paper discusses the problem of predicting future products rate according to users’ comments. Researchers interacted with this problem by using machine learning algorithms (e.g. Logistic Regression, Random Forest Regression, Support Vector Regression, Simple Linear Regression, Multiple Linear Regression, Polynomial Regression and Decision Tree). However, the accuracy of these techniques still needs to be improved. In this study, we introduce an approach for predicting future products rate using LR, RFR, and SVR. Our data set consists of tweets and its rate from 1:5. The main goal of our approach is improving the prediction accuracy about existing techniques. SVR can predict future product rate with a Mean Squared Error (MSE) of 0.4122, Linear Regression model predict with a Mean Squared Error of 0.4986 and Random Forest Regression can predict with a Mean Squared Error of 0.4770. This is better than the existing approaches accuracy.

Download Full-text

Predictors of Newborn’s Weight for Height: A Machine Learning Study Using Nationwide Multicenter Ultrasound Data

Diagnostics ◽

10.3390/diagnostics11071280 ◽

2021 ◽

Vol 11 (7) ◽

pp. 1280

Author(s):

Ki Ahn ◽

Kwang-Sig Lee ◽

Se Lee ◽

Sung Kwon ◽

Sunghun Na ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Performance Measures ◽

Gestational Age ◽

Mean Squared Error ◽

Variable Importance ◽

Learning Study ◽

Delivery Time ◽

Fetal Biometry ◽

Squared Error

There has been no machine learning study with a rich collection of clinical, sonographic markers to compare the performance measures for a variety of newborns’ weight-for-height indicators. This study compared the performance measures for a variety of newborns’ weight-for-height indicators based on machine learning, ultrasonographic data and maternal/delivery information. The source of data for this study was a multi-center retrospective study with 2949 mother–newborn pairs. The mean-squared-error-over-variance measures of five machine learning approaches were compared for newborn’s weight, newborn’s weight/height, newborn’s weight/height2 and newborn’s weight/hieght3. Random forest variable importance, the influence of a variable over average node impurity, was used to identify major predictors of these newborns’ weight-for-height indicators among ultrasonographic data and maternal/delivery information. Regarding ultrasonographic fetal biometry, newborn’s weight, newborn’s weight/height and newborn’s weight/height2 were better indicators with smaller mean-squared-error-over-variance measures than newborn’s weight/height3. Based on random forest variable importance, the top six predictors of newborn’s weight were the same as those of newborn’s weight/height and those of newborn’s weight/height2: gestational age at delivery time, the first estimated fetal weight and abdominal circumference in week 36 or later, maternal weight and body mass index at delivery time, and the first biparietal diameter in week 36 or later. These six predictors also ranked within the top seven for large-for-gestational-age and the top eight for small-for-gestational-age. In conclusion, newborn’s weight, newborn’s weight/height and newborn’s weight/height2 are more suitable for ultrasonographic fetal biometry with smaller mean-squared-error-over-variance measures than newborn’s weight/height3. Machine learning with ultrasonographic data would be an effective noninvasive approach for predicting newborn’s weight, weight/height and weight/height2.

Download Full-text

Um sistema baseado em machine learning para apoio à decisão no gerenciamento de produção apícola

REVISTA BRASILEIRA DE AGROTECNOLOGIA ◽

10.18378/rebagro.v11i1.8679 ◽

2021 ◽

Vol 11 (1) ◽

pp. 08-19

Author(s):

Weskley Damasceno Silva ◽

Silas Santiago Lopes Pereira ◽

Daniel Santiago Pereira ◽

Michell Olívio Xavier da Costa

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Support Vector Regression ◽

Multilayer Perceptron ◽

Mean Squared Error ◽

Support Vector ◽

Root Mean Squared Error ◽

Squared Error

O setor apícola tem ganhado grandes proporções nos últimos tempos em termos de produção e comercialização de produtos, como o mel e seus derivados. O Brasil, apesar de ter acompanhado esse crescimento e possuir boas características para o desenvolvimento da apicultura, ainda sofre com a limitação no uso de ferramentas tecnológicas, o que afeta diretamente os níveis de produção. Este artigo propõe o desenvolvimento de uma ferramenta tecnológica que auxilie o apicultor no gerenciamento eficiente da produção apícola e na tomada de decisão a partir de modelos preditivos baseados em Machine Learning (ML) e integrados a um sistema web. Para tanto, foram utilizados diferentes algoritmos de ML para predição de produção de mel, tais como a Regressão Linear Múltipla, Decision Tree, Random Forest, Multilayer Perceptron (MLP) e Support Vector Regression (SVR). Os modelos gerados foram avaliados com base no coeficiente de determinação (R2 ou Score) e o cálculo de erro das predições utilizando a Root Mean Squared Error (RMSE). Os resultados desta pesquisa contam com um sistema web em desenvolvimento e resultados dos experimentos realizados, que mostram uma melhor performance da técnica MLP com Score de 0.98 e RMSE de 711196 libras.

Download Full-text

Vorhersage der Fließgewässertemperaturen in österreichischen Einzugsgebieten mittels Machine Learning-Verfahren

Österreichische Wasser- und Abfallwirtschaft ◽

10.1007/s00506-021-00771-3 ◽

2021 ◽

Author(s):

Moritz Feigl ◽

Katharina Lebiedzinski ◽

Mathew Herrnegger ◽

Karsten Schulz

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Random Forest ◽

Recurrent Neural Networks ◽

Mean Squared Error ◽

Feedforward Neural Networks ◽

Gradient Boosting ◽

Squared Error ◽

Extreme Gradient Boosting ◽

Lineare Regression

ZusammenfassungDie Fließgewässertemperatur ist ein essenzieller Umweltfaktor, der das Potenzial hat, sowohl ökologische als auch sozio-ökonomische Rahmenbedingungen im Umfeld eines Gewässers zu verändern. Um Fließgewässertemperaturen als Grundlage für effektive Anpassungsstrategien für zukünftige Veränderungen (z. B. durch den Klimawandel) berechnen zu können, sind adäquate Modellierungskonzepte notwendig. Die vorliegende Studie untersucht hierfür 6 Machine Learning-Modelle: Schrittweise Lineare Regression, Random Forest, eXtreme Gradient Boosting, Feedforward Neural Networks und zwei Arten von Recurrent Neural Networks. Die Modelle wurden an 10 österreichischen Einzugsgebieten mit unterschiedlichen physiographischen Eigenschaften und Eingangsdatenkombinationen getestet. Die Hyperparameter der angewandten Modelle wurden mittels Bayes’scher Hyperparameteroptimierung optimiert. Um die Ergebnisse mit anderen Studien vergleichbar zu machen, wurden die Vorhersagen der 6 Machine Learning-Modelle den Ergebnissen der linearen Regression und dem häufig verwendeten und bekannten Wassertemperaturmodell air2stream gegenübergestellt.Von den 6 getesteten Modellen zeigten die Feedforward Neural Networks und das eXtreme Gradient Boosting die besten Vorhersagen in jeweils 4 von 10 Einzugsgebieten. Mit einem durchschnittlichen RMSE (Wurzel der mittleren Fehlerquadratsumme; root mean squared error) von 0,55 °C konnten die getesteten Modelle die Fließgewässertemperaturen deutlich besser prognostizieren als die lineare Regression (1,55 °C) und air2stream (0,98 °C). Generell zeigten die Ergebnisse der 6 Modelle eine sehr vergleichbare Leistung mit lediglich einer mittleren Abweichung um den Medianwert von 0,08 °C zwischen den einzelnen Modellen. Im größten untersuchten Einzugsgebiet – Donau bei Kienstock – wiesen Recurrent Neural Networks die höchste Modellgüte auf, was darauf hinweist, dass sie sich am besten eignen, wenn im Einzugsgebiet Prozesse mit langfristigen Abhängigkeiten ausschlaggebend sind. Die Wahl der Hyperparameter beeinflusste die Vorhersagefähigkeit der Modelle stark, was die Bedeutung der Hyperparameteroptimierung besonders hervorhebt.Die Ergebnisse dieser Studie fassen die Bedeutung unterschiedlicher Eingangsdaten, Modelle und Trainingscharakteristiken für die Modellierung von mittleren täglichen Fließgewässertemperaturen zusammen. Gleichzeitig dient diese Studie als Basis für die Entwicklung zukünftiger Modelle für eine regionale Fließgewässertemperaturvorhersage. Die getesteten Modelle stehen im open source R‑Paket wateRtemp allen AnwenderInnen der Forschungsgemeinschaft und der Praxis zur Verfügung.

Download Full-text

Experimental Evaluation of Computer Vision and Machine Learning-Based UAV Detection and Ranging

Drones ◽

10.3390/drones5020037 ◽

2021 ◽

Vol 5 (2) ◽

pp. 37

Author(s):

Bingsheng Wei ◽

Martin Barczyk

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Tracking System ◽

Ground Truth ◽

White Background ◽

Cascade Classifier ◽

Detection Algorithms ◽

Squared Error ◽

Test Conditions ◽

Video Feed

We consider the problem of vision-based detection and ranging of a target UAV using the video feed from a monocular camera onboard a pursuer UAV. Our previously published work in this area employed a cascade classifier algorithm to locate the target UAV, which was found to perform poorly in complex background scenes. We thus study the replacement of the cascade classifier algorithm with newer machine learning-based object detection algorithms. Five candidate algorithms are implemented and quantitatively tested in terms of their efficiency (measured as frames per second processing rate), accuracy (measured as the root mean squared error between ground truth and detected location), and consistency (measured as mean average precision) in a variety of flight patterns, backgrounds, and test conditions. Assigning relative weights of 20%, 40% and 40% to these three criteria, we find that when flying over a white background, the top three performers are YOLO v2 (76.73 out of 100), Faster RCNN v2 (63.65 out of 100), and Tiny YOLO (59.50 out of 100), while over a realistic background, the top three performers are Faster RCNN v2 (54.35 out of 100, SSD MobileNet v1 (51.68 out of 100) and SSD Inception v2 (50.72 out of 100), leading us to recommend Faster RCNN v2 as the recommended solution. We then provide a roadmap for further work in integrating the object detector into our vision-based UAV tracking system.

Download Full-text

A new generic method to improve machine learning applications in official statistics

Statistical Journal of the IAOS ◽

10.3233/sji-210885 ◽

2021 ◽

pp. 1-16

Author(s):

Kevin Kloos

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Statistical Properties ◽

Machine Learning Algorithms ◽

Official Statistics ◽

Academic Literature ◽

Misclassification Bias ◽

Squared Error ◽

Machine Learning Applications ◽

Applications Of Machine Learning

The use of machine learning algorithms at national statistical institutes has increased significantly over the past few years. Applications range from new imputation schemes to new statistical output based entirely on machine learning. The results are promising, but recent studies have shown that the use of machine learning in official statistics always introduces a bias, known as misclassification bias. Misclassification bias does not occur in traditional applications of machine learning and therefore it has received little attention in the academic literature. In earlier work, we have collected existing methods that are able to correct misclassification bias. We have compared their statistical properties, including bias, variance and mean squared error. In this paper, we present a new generic method to correct misclassification bias for time series and we derive its statistical properties. Moreover, we show numerically that it has a lower mean squared error than the existing alternatives in a wide variety of settings. We believe that our new method may improve machine learning applications in official statistics and we aspire that our work will stimulate further methodological research in this area.

Download Full-text

Machine learning and Grad-Cam based vascular aging assessment using photoplethysmogram (Preprint)

10.2196/preprints.31709 ◽

2021 ◽

Author(s):

Hangsik Shin

Keyword(s):

Machine Learning ◽

Correlation Coefficient ◽

Age Estimation ◽

Mean Squared Error ◽

Mean Absolute Error ◽

Absolute Error ◽

Coefficient Of Determination ◽

Vascular Aging ◽

Squared Error ◽

Vascular Age

BACKGROUND Arterial stiffness due to vascular aging is a major indicator for evaluating cardiovascular risk. OBJECTIVE In this study, we propose a method of estimating age by applying machine learning to photoplethysmogram for non-invasive vascular age assessment. METHODS The machine learning-based age estimation model that consists of three convolutional layers and two-layer fully connected layers, was developed using segmented photoplethysmogram by pulse from a total of 752 adults aged 19–87 years. The performance of the developed model was quantitatively evaluated using mean absolute error, root-mean-squared-error, Pearson’s correlation coefficient, coefficient of determination. The Grad-Cam was used to explain the contribution of photoplethysmogram waveform characteristic in vascular age estimation. RESULTS Mean absolute error of 8.03, root mean squared error of 9.96, 0.62 of correlation coefficient, and 0.38 of coefficient of determination were shown through 10-fold cross validation. Grad-Cam, used to determine the weight that the input signal contributes to the result, confirmed that the contribution to the age estimation of the photoplethysmogram segment was high around the systolic peak. CONCLUSIONS The machine learning-based vascular aging analysis method using the PPG waveform showed comparable or superior performance compared to previous studies without complex feature detection in evaluating vascular aging. CLINICALTRIAL 2015-0104

Download Full-text