Principal Component Analysis and Machine Learning Approaches for Photovoltaic Power Prediction: A Comparative Study

Nowadays, in the context of the industrial revolution 4.0, considerable volumes of data are being generated continuously from intelligent sensors and connected objects. The proper understanding and use of these amounts of data are crucial levers of performance and innovation. Machine learning is the technology that allows the full potential of big datasets to be exploited. As a branch of artificial intelligence, it enables us to discover patterns and make predictions from data based on statistics, data mining, and predictive analysis. The key goal of this study was to use machine learning approaches to forecast the hourly power produced by photovoltaic panels. A comparison analysis of various predictive models including elastic net, support vector regression, random forest, and Bayesian regularized neural networks was carried out to identify the models providing the best predicting results. The principal components analysis used to reduce the dimensionality of the input data revealed six main factor components that could explain up to 91.95% of the variation in all variables. Finally, performance metrics demonstrated that Bayesian regularized neural networks achieved the best results, giving an accuracy of R2 = 99.99% and RMSE = 0.002 kW.

Download Full-text

Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040199 ◽

2021 ◽

Vol 10 (4) ◽

pp. 199

Author(s):

Francisco M. Bellas Aláez ◽

Jesus M. Torres Palenzuela ◽

Evangelos Spyrakos ◽

Luis González Vilas

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Prediction Models ◽

Support Vector ◽

False Alarms ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Rías Baixas ◽

New Algorithms

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.

Download Full-text

Monitoring Mixing Processes Using Ultrasonic Sensors and Machine Learning

Sensors ◽

10.3390/s20071813 ◽

2020 ◽

Vol 20 (7) ◽

pp. 1813 ◽

Cited By ~ 3

Author(s):

Alexander L. Bowler ◽

Serafim Bakalis ◽

Nicholas J. Watson

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Real Time ◽

Short Term Memory ◽

Support Vector ◽

Learning Approaches ◽

Learning Models ◽

Ultrasonic Sensors ◽

Multisensor Data Fusion ◽

Machine Learning Models

Mixing is one of the most common processes across food, chemical, and pharmaceutical manufacturing. Real-time, in-line sensors are required for monitoring, and subsequently optimising, essential processes such as mixing. Ultrasonic sensors are low-cost, real-time, in-line, and applicable to characterise opaque systems. In this study, a non-invasive, reflection-mode ultrasonic measurement technique was used to monitor two model mixing systems. The two systems studied were honey-water blending and flour-water batter mixing. Classification machine learning models were developed to predict if materials were mixed or not mixed. Regression machine learning models were developed to predict the time remaining until mixing completion. Artificial neural networks, support vector machines, long short-term memory neural networks, and convolutional neural networks were tested, along with different methods for engineering features from ultrasonic waveforms in both the time and frequency domain. Comparisons between using a single sensor and performing multisensor data fusion between two sensors were made. Classification accuracies of up to 96.3% for honey-water blending and 92.5% for flour-water batter mixing were achieved, along with R2 values for the regression models of up to 0.977 for honey-water blending and 0.968 for flour-water batter mixing. Each prediction task produced optimal performance with different algorithms and feature engineering methods, vindicating the extensive comparison between different machine learning approaches.

Download Full-text

Multitask fMRI and machine learning approach improve prediction of differential brain activity pattern in patients with insomnia disorder

Scientific Reports ◽

10.1038/s41598-021-88845-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Mi Hyun Lee ◽

Nambeom Kim ◽

Jaeeun Yoo ◽

Hang-Keun Kim ◽

Young-Don Son ◽

...

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Brain Activity ◽

Inferior Frontal Gyrus ◽

Principal Component ◽

Classification Performance ◽

Support Vector ◽

Spatial Covariance ◽

Single Task ◽

Bold Responses

AbstractWe investigated the differential spatial covariance pattern of blood oxygen level-dependent (BOLD) responses to single-task and multitask functional magnetic resonance imaging (fMRI) between patients with psychophysiological insomnia (PI) and healthy controls (HCs), and evaluated features generated by principal component analysis (PCA) for discrimination of PI from HC, compared to features generated from BOLD responses to single-task fMRI using machine learning methods. In 19 patients with PI and 21 HCs, the mean beta value for each region of interest (ROIbval) was calculated with three contrast images (i.e., sleep-related picture, sleep-related sound, and Stroop stimuli). We performed discrimination analysis and compared with features generated from BOLD responses to single-task fMRI. We applied support vector machine analysis with a least absolute shrinkage and selection operator to evaluate five performance metrics: accuracy, recall, precision, specificity, and F2. Principal component features showed the best classification performance in all aspects of metrics compared to BOLD response to single-task fMRI. Bilateral inferior frontal gyrus (orbital), right calcarine cortex, right lingual gyrus, left inferior occipital gyrus, and left inferior temporal gyrus were identified as the most salient areas by feature selection. Our approach showed better performance in discriminating patients with PI from HCs, compared to single-task fMRI.

Download Full-text

An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments

Sustainability ◽

10.3390/su11030699 ◽

2019 ◽

Vol 11 (3) ◽

pp. 699 ◽

Cited By ~ 13

Author(s):

Lkhagvadorj Munkhdalai ◽

Tsendsuren Munkhdalai ◽

Oyun-Erdene Namsrai ◽

Jong Lee ◽

Keun Ryu

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Random Forest ◽

Deep Neural Networks ◽

Credit Scoring ◽

Support Vector ◽

Learning Approaches ◽

Learning Methods ◽

Human Expert ◽

Machine Learning Methods

Machine learning and artificial intelligence have achieved a human-level performance in many application domains, including image classification, speech recognition and machine translation. However, in the financial domain expert-based credit risk models have still been dominating. Establishing meaningful benchmark and comparisons on machine-learning approaches and human expert-based models is a prerequisite in further introducing novel methods. Therefore, our main goal in this study is to establish a new benchmark using real consumer data and to provide machine-learning approaches that can serve as a baseline on this benchmark. We performed an extensive comparison between the machine-learning approaches and a human expert-based model—FICO credit scoring system—by using a Survey of Consumer Finances (SCF) data. As the SCF data is non-synthetic and consists of a large number of real variables, we applied two variable-selection methods: the first method used hypothesis tests, correlation and random forest-based feature importance measures and the second method was only a random forest-based new approach (NAP), to select the best representative features for effective modelling and to compare them. We then built regression models based on various machine-learning algorithms ranging from logistic regression and support vector machines to an ensemble of gradient boosted trees and deep neural networks. Our results demonstrated that if lending institutions in the 2001s had used their own credit scoring model constructed by machine-learning methods explored in this study, their expected credit losses would have been lower, and they would be more sustainable. In addition, the deep neural networks and XGBoost algorithms trained on the subset selected by NAP achieve the highest area under the curve (AUC) and accuracy, respectively.

Download Full-text

Detection of Ovarian Tumor Using Machine Learning Approaches A Review

10.46532/978-81-950008-1-4_103 ◽

2020 ◽

pp. 471-476

Author(s):

Gitanjali Wadhwa ◽

Mansi Mathur

Keyword(s):

Machine Learning ◽

Ovarian Cancer ◽

Deep Learning ◽

Nearest Neighbor ◽

Performance Metrics ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Female Sex Hormones

The important part of female reproductive system is ovaries. The importance of these tiny glands is derived from the production of female sex hormones and female gametes. The place of these ductless almond shaped tiny glandular organs is on just opposite sides of uterus attached with ovarian ligament. There are several reasons due to which ovarian cancer can arise but it can be classified by using different number of techniques. Early prediction of ovarian cancer will decrease its progress rate and may possibly save countless lives. CAD systems (Computer-aided diagnosis) is a noninvasive routine for finding ovarian cancer in its initial stages of cancer which can keep away patients’ anxiety and unnecessary biopsy. This review paper states us about how we can use different techniques to classify the ovarian cancer tumor. In this survey effort we have also deliberate about the comparison of different machine learning algorithms like K-Nearest Neighbor, Support Vector Machine and deep learning techniques used in classification process of ovarian cancer. Later comparing the different techniques for this type of cancer detection, it gives the impression that Deep Learning Technique has provided good results and come out with good accuracy and other performance metrics.

Download Full-text

Nanosecond Photodynamics Simulations of a Cis-Trans Isomerization Are Enabled by Machine Learning

10.26434/chemrxiv.13047863 ◽

2020 ◽

Author(s):

Jingbai Li ◽

Patrick Reiser ◽

André Eberhard ◽

Pascal Friederich ◽

Steven Lopez

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Excited State ◽

Adaptive Sampling ◽

Computational Cost ◽

Ground Truth ◽

Absolute Error ◽

Photochemical Reactions ◽

Computational Techniques ◽

Full Potential

Photochemical reactions are being increasingly used to construct complex molecular architectures with mild and straightforward reaction conditions. Computational techniques are increasingly important to understand the reactivities and chemoselectivities of photochemical isomerization reactions because they offer molecular bonding information along the excited-state(s) of photodynamics. These photodynamics simulations are resource-intensive and are typically limited to 1–10 picoseconds and 1,000 trajectories due to high computational cost. Most organic photochemical reactions have excited-state lifetimes exceeding 1 picosecond, which places them outside possible computational studies. Westermeyr et al. demonstrated that a machine learning approach could significantly lengthen photodynamics simulation times for a model system, methylenimmonium cation (CH2NH2+).We have developed a Python-based code, Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics (PyRAI2MD), to accomplish the unprecedented 10 ns cis-trans photodynamics of trans-hexafluoro-2-butene (CF3–CH=CH–CF3) in 3.5 days. The same simulation would take approximately 58 years with ground-truth multiconfigurational dynamics. We proposed an innovative scheme combining Wigner sampling, geometrical interpolations, and short-time quantum chemical trajectories to effectively sample the initial data, facilitating the adaptive sampling to generate an informative and data-efficient training set with 6,232 data points. Our neural networks achieved chemical accuracy (mean absolute error of 0.032 eV). Our 4,814 trajectories reproduced the S1 half-life (60.5 fs), the photochemical product ratio (trans: cis = 2.3: 1), and autonomously discovered a pathway towards a carbene. The neural networks have also shown the capability of generalizing the full potential energy surface with chemically incomplete data (trans → cis but not cis → trans pathways) that may offer future automated photochemical reaction discoveries.

Download Full-text

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Download Full-text

Application of Machine Learning in Animal Disease Analysis and Prediction

Current Bioinformatics ◽

10.2174/1574893615999200728195613 ◽

2020 ◽

Vol 15 ◽

Author(s):

Shuwen Zhang ◽

Qiang Su ◽

Qin Chen

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Clustering Algorithm ◽

Principal Component ◽

Support Vector ◽

Animal Disease ◽

Human Beings ◽

Animal Diseases ◽

Disease Analysis

Abstract: Major animal diseases pose a great threat to animal husbandry and human beings. With the deepening of globalization and the abundance of data resources, the prediction and analysis of animal diseases by using big data are becoming more and more important. The focus of machine learning is to make computers learn how to learn from data and use the learned experience to analyze and predict. Firstly, this paper introduces the animal epidemic situation and machine learning. Then it briefly introduces the application of machine learning in animal disease analysis and prediction. Machine learning is mainly divided into supervised learning and unsupervised learning. Supervised learning includes support vector machines, naive bayes, decision trees, random forests, logistic regression, artificial neural networks, deep learning, and AdaBoost. Unsupervised learning has maximum expectation algorithm, principal component analysis hierarchical clustering algorithm and maxent. Through the discussion of this paper, people have a clearer concept of machine learning and understand its application prospect in animal diseases.

Download Full-text

Practical CO2—WAG Field Operational Designs Using Hybrid Numerical-Machine-Learning Approaches

Energies ◽

10.3390/en14041055 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1055

Author(s):

Qian Sun ◽

William Ampomah ◽

Junyu You ◽

Martha Cather ◽

Robert Balch

Keyword(s):

Machine Learning ◽

Oil Recovery ◽

History Matching ◽

Optimization Problems ◽

Learning Technologies ◽

Petroleum Engineering ◽

Support Vector ◽

Learning Approaches ◽

Field Development ◽

Proxy Models

Machine-learning technologies have exhibited robust competences in solving many petroleum engineering problems. The accurate predictivity and fast computational speed enable a large volume of time-consuming engineering processes such as history-matching and field development optimization. The Southwest Regional Partnership on Carbon Sequestration (SWP) project desires rigorous history-matching and multi-objective optimization processes, which fits the superiorities of the machine-learning approaches. Although the machine-learning proxy models are trained and validated before imposing to solve practical problems, the error margin would essentially introduce uncertainties to the results. In this paper, a hybrid numerical machine-learning workflow solving various optimization problems is presented. By coupling the expert machine-learning proxies with a global optimizer, the workflow successfully solves the history-matching and CO2 water alternative gas (WAG) design problem with low computational overheads. The history-matching work considers the heterogeneities of multiphase relative characteristics, and the CO2-WAG injection design takes multiple techno-economic objective functions into accounts. This work trained an expert response surface, a support vector machine, and a multi-layer neural network as proxy models to effectively learn the high-dimensional nonlinear data structure. The proposed workflow suggests revisiting the high-fidelity numerical simulator for validation purposes. The experience gained from this work would provide valuable guiding insights to similar CO2 enhanced oil recovery (EOR) projects.

Download Full-text

414 Deep Neural Networks: A Survey Tool for Obstructive Sleep Apnea Prediction

SLEEP ◽

10.1093/sleep/zsab072.413 ◽

2021 ◽

Vol 44 (Supplement_2) ◽

pp. A164-A164

Author(s):

Pahnwat Taweesedt ◽

JungYoon Kim ◽

Jaehyun Park ◽

Jangwoon Park ◽

Munish Sharma ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Obstructive Sleep Apnea ◽

Sleep Apnea ◽

Deep Neural Networks ◽

Support Vector ◽

Learning Models ◽

Obstructive Sleep ◽

Screening Questionnaires ◽

Machine Learning Models

Abstract Introduction Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder with an estimation of one billion people. Full-night polysomnography is considered the gold standard for OSA diagnosis. However, it is time-consuming, expensive and is not readily available in many parts of the world. Many screening questionnaires and scores have been proposed for OSA prediction with high sensitivity and low specificity. The present study is intended to develop models with various machine learning techniques to predict the severity of OSA by incorporating features from multiple questionnaires. Methods Subjects who underwent full-night polysomnography in Torr sleep center, Texas and completed 5 OSA screening questionnaires/scores were included. OSA was diagnosed by using Apnea-Hypopnea Index ≥ 5. We trained five different machine learning models including Deep Neural Networks with the scaled principal component analysis (DNN-PCA), Random Forest (RF), Adaptive Boosting classifier (ABC), and K-Nearest Neighbors classifier (KNC) and Support Vector Machine Classifier (SVMC). Training:Testing subject ratio of 65:35 was used. All features including demographic data, body measurement, snoring and sleepiness history were obtained from 5 OSA screening questionnaires/scores (STOP-BANG questionnaires, Berlin questionnaires, NoSAS score, NAMES score and No-Apnea score). Performance parametrics were used to compare between machine learning models. Results Of 180 subjects, 51.5 % of subjects were male with mean (SD) age of 53.6 (15.1). One hundred and nineteen subjects were diagnosed with OSA. Area Under the Receiver Operating Characteristic Curve (AUROC) of DNN-PCA, RF, ABC, KNC, SVMC, STOP-BANG questionnaire, Berlin questionnaire, NoSAS score, NAMES score, and No-Apnea score were 0.85, 0.68, 0.52, 0.74, 0.75, 0.61, 0.63, 0,61, 0.58 and 0,58 respectively. DNN-PCA showed the highest AUROC with sensitivity of 0.79, specificity of 0.67, positive-predictivity of 0.93, F1 score of 0.86, and accuracy of 0.77. Conclusion Our result showed that DNN-PCA outperforms OSA screening questionnaires, scores and other machine learning models. Support (if any):

Download Full-text