A Comparison of the Performance of Supervised Learning Algorithms for Solar Power Prediction

Science seeks strategies to mitigate global warming and reduce the negative impacts of the long-term use of fossil fuels for power generation. In this sense, implementing and promoting renewable energy in different ways becomes one of the most effective solutions. The inaccuracy in the prediction of power generation from photovoltaic (PV) systems is a significant concern for the planning and operational stages of interconnected electric networks and the promotion of large-scale PV installations. This study proposes the use of Machine Learning techniques to model the photovoltaic power production for a system in Medellín, Colombia. Four forecasting models were generated from techniques compatible with Machine Learning and Artificial Intelligence methods: K-Nearest Neighbors (KNN), Linear Regression (LR), Artificial Neural Networks (ANN) and Support Vector Machines (SVM). The results obtained indicate that the four methods produced adequate estimations of photovoltaic energy generation. However, the best estimate according to RMSE and MAE is the ANN forecasting model. The proposed Machine Learning-based models were demonstrated to be practical and effective solutions to forecast PV power generation in Medellin.

Download Full-text

Feature Selection from Lyme Disease Patient Survey Using Machine Learning

Algorithms ◽

10.3390/a13120334 ◽

2020 ◽

Vol 13 (12) ◽

pp. 334

Author(s):

Joshua Vendrow ◽

Jamie Haddock ◽

Deanna Needell ◽

Lorraine Johnson

Keyword(s):

Machine Learning ◽

Lyme Disease ◽

Large Scale ◽

Disease Patient ◽

Patient Survey ◽

Machine Learning Techniques ◽

Medical Community ◽

Support Vector ◽

Global Rating ◽

K Nearest Neighbors

Lyme disease is a rapidly growing illness that remains poorly understood within the medical community. Critical questions about when and why patients respond to treatment or stay ill, what kinds of treatments are effective, and even how to properly diagnose the disease remain largely unanswered. We investigate these questions by applying machine learning techniques to a large scale Lyme disease patient registry, MyLymeData, developed by the nonprofit LymeDisease.org. We apply various machine learning methods in order to measure the effect of individual features in predicting participants’ answers to the Global Rating of Change (GROC) survey questions that assess the self-reported degree to which their condition improved, worsened, or remained unchanged following antibiotic treatment. We use basic linear regression, support vector machines, neural networks, entropy-based decision tree models, and k-nearest neighbors approaches. We first analyze the general performance of the model and then identify the most important features for predicting participant answers to GROC. After we identify the “key” features, we separate them from the dataset and demonstrate the effectiveness of these features at identifying GROC. In doing so, we highlight possible directions for future study both mathematically and clinically.

Download Full-text

Heart disease prediction using machine learning techniques : a survey

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.8.10557 ◽

2018 ◽

Vol 7 (2.8) ◽

pp. 684 ◽

Cited By ~ 12

Author(s):

V V. Ramalingam ◽

Ayantan Dandapath ◽

M Karthik Raja

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Complex Data ◽

Learning Techniques ◽

Vector Machines ◽

Supervised Learning Algorithms ◽

Life Threatening

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.

Download Full-text

Prediction of Liver Diseases by Using Few Machine Learning Based Approaches

Australian Journal of Engineering and Innovative Technology ◽

10.34104/ajeit.020.085090 ◽

2020 ◽

pp. 85-90

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Liver Diseases ◽

Model Building ◽

Medical Science ◽

Machine Learning Techniques ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbors ◽

Learning Techniques

Advancement in medical science has always been one of the most vital aspects of the human race. With the progress in technology, the use of modern techniques and equipment is always imposed on treatment purposes. Nowadays, machine learning techniques have widely been used in medical science for assuring accuracy. In this work, we have constructed computational model building techniques for liver disease prediction accurately. We used some efficient classification algorithms: Random Forest, Perceptron, Decision Tree, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) for predicting liver diseases. Our works provide the implementation of hybrid model construction and comparative analysis for improving prediction performance. At first, classification algorithms are applied to the original liver patient datasets collected from the UCI repository. Then we analyzed features and tweaked to improve the performance of our predictor and made a comparative analysis among the classifiers. We examined that, KNN algorithm outperformed all other techniques with feature selection.

Download Full-text

Reliable photometric membership (RPM) of galaxies in clusters – I. A machine learning method and its performance in the local universe

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa486 ◽

2020 ◽

Vol 493 (3) ◽

pp. 3429-3441

Author(s):

Paulo A A Lopes ◽

André L B Ribeiro

Keyword(s):

Machine Learning ◽

Galaxy Evolution ◽

Large Scale ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Validation Data ◽

Membership Probability ◽

Cluster Membership ◽

Stochastic Gradient Boosting

ABSTRACT We introduce a new method to determine galaxy cluster membership based solely on photometric properties. We adopt a machine learning approach to recover a cluster membership probability from galaxy photometric parameters and finally derive a membership classification. After testing several machine learning techniques (such as stochastic gradient boosting, model averaged neural network and k-nearest neighbours), we found the support vector machine algorithm to perform better when applied to our data. Our training and validation data are from the Sloan Digital Sky Survey main sample. Hence, to be complete to $M_r^* + 3$, we limit our work to 30 clusters with $z$phot-cl ≤ 0.045. Masses (M200) are larger than $\sim 0.6\times 10^{14} \, \mathrm{M}_{\odot }$ (most above $3\times 10^{14} \, \mathrm{M}_{\odot }$). Our results are derived taking in account all galaxies in the line of sight of each cluster, with no photometric redshift cuts or background corrections. Our method is non-parametric, making no assumptions on the number density or luminosity profiles of galaxies in clusters. Our approach delivers extremely accurate results (completeness, C $\sim 92{\rm{ per\ cent}}$ and purity, P $\sim 87{\rm{ per\ cent}}$) within R200, so that we named our code reliable photometric membership. We discuss possible dependencies on magnitude, colour, and cluster mass. Finally, we present some applications of our method, stressing its impact to galaxy evolution and cosmological studies based on future large-scale surveys, such as eROSITA, EUCLID, and LSST.

Download Full-text

Computational Identification of Chemical Compounds with Potential Activity against Leishmania amazonensis using Nonlinear Machine Learning Techniques

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666181130121558 ◽

2019 ◽

Vol 18 (27) ◽

pp. 2347-2354 ◽

Cited By ~ 3

Author(s):

Juan Alberto Castillo-Garit ◽

Naivi Flores-Balmaseda ◽

Orlando Álvarez ◽

Hai Pham-The ◽

Virginia Pérez-Doñate ◽

...

Keyword(s):

Machine Learning ◽

Computational Models ◽

Neglected Tropical Diseases ◽

Chemical Compounds ◽

Theoretical Models ◽

Leishmania Amazonensis ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbors ◽

Data Set

Leishmaniasis is a poverty-related disease endemic in 98 countries worldwide, with morbidity and mortality increasing daily. All currently used first-line and second-line drugs for the treatment of leishmaniasis exhibit several drawbacks including toxicity, high costs and route of administration. Consequently, the development of new treatments for leishmaniasis is a priority in the field of neglected tropical diseases. The aim of this work is to develop computational models those allow the identification of new chemical compounds with potential anti-leishmanial activity. A data set of 116 organic chemicals, assayed against promastigotes of Leishmania amazonensis, is used to develop the theoretical models. The cutoff value to consider a compound as active one was IC50≤1.5μM. For this study, we employed Dragon software to calculate the molecular descriptors and WEKA to obtain machine learning (ML) models. All ML models showed accuracy values between 82% and 91%, for the training set. The models developed with k-nearest neighbors and classification trees showed sensitivity values of 97% and 100%, respectively; while the models developed with artificial neural networks and support vector machine showed specificity values of 94% and 92%, respectively. In order to validate our models, an external test-set was evaluated with good behavior for all models. A virtual screening was performed and 156 compounds were identified as potential anti-leishmanial by all the ML models. This investigation highlights the merits of ML-based techniques as an alternative to other more traditional methods to find new chemical compounds with anti-leishmanial activity.

Download Full-text

Detection of Loss Zones while Drilling Using Different Machine Learning Techniques

Journal of Energy Resources Technology ◽

10.1115/1.4051553 ◽

2021 ◽

pp. 1-29

Author(s):

Ahmed Alsaihati ◽

Mahmoud Abughaban ◽

Salaheldin Elkatatny ◽

Abdulazeez Abdulraheem

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Random Forests ◽

Nearest Neighbors ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbors ◽

Learning Techniques ◽

Vector Machines ◽

Testing Set

Abstract Fluid loss into formations is a common operational issue that is frequently encountered when drilling across naturally or induced fractured formations. This could pose significant operational risks, such as well-control, stuck pipe, and wellbore instability, which, in turn, lead to an increase of well time and cost. This research aims to use and evaluate different machine learning techniques, namely: support vector machines, random forests, and K-nearest neighbors in detecting loss circulation occurrences while drilling using solely drilling surface parameters. Actual field data of seven wells, which had suffered partial or severe loss circulation, were used to build predictive models, while Well-8 was used to compare the performance of the developed models. Different performance metrics were used to evaluate the performance of the developed models. Recall, precision, and F1-score measures were used to evaluate the ability of the developed model to detect loss circulation occurrences. The results showed the K-nearest neighbors classifier achieved a high F1-score of 0.912 in detecting loss circulation occurrence in the testing set, while the random forests was the second-best classifier with almost the same F1-score of 0.910. The support vector machines achieved an F1-score of 0.83 in predicting the loss circulation occurrence in the testing set. The K-nearest neighbors outperformed other models in detecting the loss circulation occurrences in Well-8 with an F1-score of 0.80. The main contribution of this research as compared to previous studies is that it identifies losses events based on real-time measurements of the active pit volume.

Download Full-text

Automated Cobble Mapping of a Mixed Sand-Cobble Beach Using a Mobile LiDAR System

Remote Sensing ◽

10.3390/rs10081253 ◽

2018 ◽

Vol 10 (8) ◽

pp. 1253 ◽

Cited By ~ 4

Author(s):

Hironori Matsumoto ◽

Adam Young

Keyword(s):

Maximum Likelihood ◽

Large Scale ◽

Spatial Location ◽

Machine Learning Techniques ◽

Support Vector ◽

Lidar System ◽

K Nearest Neighbors ◽

Terrestrial Lidar ◽

Topographic Roughness ◽

Cobble Beach

Cobbles (64–256 mm) are found on beaches throughout the world, influence beach morphology, and can provide shoreline stability. Detailed, frequent, and spatially large-scale quantitative cobble observations at beaches are vital toward a better understanding of sand-cobble beach systems. This study used a truck-mounted mobile terrestrial LiDAR system and a raster-based classification approach to map cobbles automatically. Rasters of LiDAR intensity, intensity deviation, topographic roughness, and slope were utilized for cobble classification. Four machine learning techniques including maximum likelihood, decision tree, support vector machine, and k-nearest neighbors were tested on five raster resolutions ranging from 5–50 cm. The cobble mapping capability varied depending on pixel size, classification technique, surface cobble density, and beach setting. The best performer was a maximum likelihood classification using 20 cm raster resolution. Compared to manual mapping at 15 control sites (size ranging from a few to several hundred square meters), automated mapping errors were <12% (best fit line). This method mapped the spatial location of dense cobble regions more accurately compared to sparse and moderate density cobble areas. The method was applied to a ~40 km section of coast in southern California, and successfully generated temporal and spatial cobble distributions consistent with previous observations.

Download Full-text

On the Analysis of Machine Learning Classifiers to Detect Traffic Congestion in Vehicular Networks

10.5753/eniac.2019.9290 ◽

2019 ◽

Author(s):

Lucas Carvalho ◽

Maycon Silva ◽

Edimilson Santos ◽

Daniel Guidoni

Keyword(s):

Machine Learning ◽

Traffic Congestion ◽

Vehicular Networks ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbors ◽

Applied Machine Learning ◽

Routing Methods

Problems related to traffic congestion and management have become common in many cities. Thus, vehicle re-routing methods have been proposed to minimize the congestion. Some of these methods have applied machine learning techniques, more specifically classifiers, to verify road conditions and detect congestion. However, better results may be obtained by applying a classifier more suitable to domain. In this sense, this paper presents an evaluation of different classifiers applied to the identification of the level of road congestion. Our main goal is to analyze the characteristics of each classifier in this task. The classifiers involved in the experiments here are: Multiple Layer Neural Network (MLP), K-Nearest Neighbors (KNN), Decision Trees (J48), Support Vector Machines (SVM), Naive Bayes and Tree Augment Naive Bayes.

Download Full-text

Analysis of Educational Robotics Activities Using a Machine Learning Approach

Makers at School, Educational Robotics and Innovative Learning Environments - Lecture Notes in Networks and Systems ◽

10.1007/978-3-030-77040-2_27 ◽

2021 ◽

pp. 203-211

Author(s):

Lorenzo Cesaretti ◽

Laura Screpanti ◽

David Scaradozzi ◽

Eleni Mangina

Keyword(s):

Machine Learning ◽

Learning Styles ◽

Machine Learning Techniques ◽

Support Vector ◽

Educational Robotics ◽

School Students ◽

K Nearest Neighbors ◽

Log Files ◽

Learning Techniques ◽

Mixed Approach

AbstractThis paper presents the preliminary results of using machine learning techniques to analyze educational robotics activities. An experiment was conducted with 197 secondary school students in Italy: the authors updated Lego Mindstorms EV3 programming blocks to record log files with coding sequences students had designed in teams. The activities were part of a preliminary robotics exercise. We used four machine learning techniques—logistic regression, support-vector machine (SVM), K-nearest neighbors and random forests—to predict the students’ performance, comparing a supervised approach (using twelve indicators extracted from the log files as input for the algorithms) and a mixed approach (applying a k-means algorithm to calculate the machine learning features). The results showed that the mixed approach with SVM outperformed the other techniques, and that three predominant learning styles emerged from the data mining analysis.

Download Full-text

Machine Learning Techniques for Tree Species Classification Using Co-Registered LiDAR and Hyperspectral Data

Remote Sensing ◽

10.3390/rs11070819 ◽

2019 ◽

Vol 11 (7) ◽

pp. 819 ◽

Cited By ~ 10

Author(s):

Julia Marrs ◽

Wenge Ni-Meister

Keyword(s):

Machine Learning ◽

Tree Species ◽

Large Scale ◽

Deciduous Forest ◽

Explanatory Power ◽

The United States ◽

Hyperspectral Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Thermal Imager

The use of light detection and ranging (LiDAR) techniques for recording and analyzing tree and forest structural variables shows strong promise for improving established hyperspectral-based tree species classifications; however, previous multi-sensoral projects were often limited by error resulting from seasonal or flight path differences. The National Aeronautics and Space Administration (NASA) Goddard’s LiDAR, hyperspectral, and thermal imager (G-LiHT) is now providing co-registered data on experimental forests in the United States, which are associated with established ground truths from existing forest plots. Free, user-friendly machine learning applications like the Orange Data Mining Extension for Python recently simplified the process of combining datasets, handling variable redundancy and noise, and reducing dimensionality in remotely sensed datasets. Neural networks, CN2 rules, and support vector machine methods are used here to achieve a final classification accuracy of 67% for dominant tree species in experimental plots of Howland Experimental Forest, a mixed coniferous–deciduous forest with ten dominant tree species, and 59% for plots in Penobscot Experimental Forest, a mixed coniferous–deciduous forest with 15 dominant tree species. These accuracies are higher than those produced using LiDAR or hyperspectral datasets separately, suggesting that combined spectral and structural data have a greater richness of complementary information than either dataset alone. Using greatly simplified datasets created by our dimensionality reduction methodology, machine learner performance remains comparable or higher to that using the full dataset. Across forests, the identification of shared structural and spectral variables suggests that this methodology can successfully identify parameters with high explanatory power for differentiating among tree species, and opens the possibility of addressing large-scale forestry questions using optimized remote sensing workflows.

Download Full-text