A Data-Driven Approach for Winter Precipitation Classification Using Weather Radar and NWP Data

Bong-Chul Seo

doi:10.3390/atmos11070701

A Data-Driven Approach for Winter Precipitation Classification Using Weather Radar and NWP Data

Atmosphere ◽

10.3390/atmos11070701 ◽

2020 ◽

Vol 11 (7) ◽

pp. 701

Author(s):

Bong-Chul Seo

Keyword(s):

Random Forest ◽

Binary Classification ◽

Weather Prediction ◽

Model Development ◽

Winter Precipitation ◽

Ensemble Classification ◽

Supervised Machine Learning ◽

Data Driven ◽

Support Vector ◽

Data Driven Approach

This study describes a framework that provides qualitative weather information on winter precipitation types using a data-driven approach. The framework incorporates the data retrieved from weather radars and the numerical weather prediction (NWP) model to account for relevant precipitation microphysics. To enable multimodel-based ensemble classification, we selected six supervised machine learning models: k-nearest neighbors, logistic regression, support vector machine, decision tree, random forest, and multi-layer perceptron. Our model training and cross-validation results based on Monte Carlo Simulation (MCS) showed that all the models performed better than our baseline method, which applies two thresholds (surface temperature and atmospheric layer thickness) for binary classification (i.e., rain/snow). Among all six models, random forest presented the best classification results for the basic classes (rain, freezing rain, and snow) and the further refinement of the snow classes (light, moderate, and heavy). Our model evaluation, which uses an independent dataset not associated with model development and learning, led to classification performance consistent with that from the MCS analysis. Based on the visual inspection of the classification maps generated for an individual radar domain, we confirmed the improved classification capability of the developed models (e.g., random forest) compared to the baseline one in representing both spatial variability and continuity.

Get full-text (via PubEx)

Phybrata Sensors and Machine Learning for Enhanced Neurophysiological Diagnosis and Treatment

Sensors ◽

10.3390/s21217417 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7417

Author(s):

Alex J. Hope ◽

Utkarsh Vashisth ◽

Matthew J. Parker ◽

Andreas B. Ralston ◽

Joshua M. Roper ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Random Forest ◽

Binary Classification ◽

Classification Performance ◽

Support Vector ◽

Use Case ◽

Signal Features ◽

Test Population

Concussion injuries remain a significant public health challenge. A significant unmet clinical need remains for tools that allow related physiological impairments and longer-term health risks to be identified earlier, better quantified, and more easily monitored over time. We address this challenge by combining a head-mounted wearable inertial motion unit (IMU)-based physiological vibration acceleration (“phybrata”) sensor and several candidate machine learning (ML) models. The performance of this solution is assessed for both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments. Results are compared with previously reported approaches to ML-based concussion diagnostics. Using phybrata data from a previously reported concussion study population, four different machine learning models (Support Vector Machine, Random Forest Classifier, Extreme Gradient Boost, and Convolutional Neural Network) are first investigated for binary classification of the test population as healthy vs. concussion (Use Case 1). Results are compared for two different data preprocessing pipelines, Time-Series Averaging (TSA) and Non-Time-Series Feature Extraction (NTS). Next, the three best-performing NTS models are compared in terms of their multiclass prediction performance for specific concussion-related impairments: vestibular, neurological, both (Use Case 2). For Use Case 1, the NTS model approach outperformed the TSA approach, with the two best algorithms achieving an F1 score of 0.94. For Use Case 2, the NTS Random Forest model achieved the best performance in the testing set, with an F1 score of 0.90, and identified a wider range of relevant phybrata signal features that contributed to impairment classification compared with manual feature inspection and statistical data analysis. The overall classification performance achieved in the present work exceeds previously reported approaches to ML-based concussion diagnostics using other data sources and ML models. This study also demonstrates the first combination of a wearable IMU-based sensor and ML model that enables both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments.

Get full-text (via PubEx)

Random Forest Regressor-Based Approach for Detecting Fault Location and Duration in Power Systems

Sensors ◽

10.3390/s22020458 ◽

2022 ◽

Vol 22 (2) ◽

pp. 458

Author(s):

Zakaria El Mrabet ◽

Niroop Sugunaraj ◽

Prakash Ranganathan ◽

Shrirang Abhyankar

Keyword(s):

Neural Network ◽

Random Forest ◽

Power Systems ◽

Real Time ◽

Fault Location ◽

State Of The Art ◽

Economic Consequences ◽

Support Vector ◽

Detection Accuracy ◽

Data Driven Approach

Power system failures or outages due to short-circuits or “faults” can result in long service interruptions leading to significant socio-economic consequences. It is critical for electrical utilities to quickly ascertain fault characteristics, including location, type, and duration, to reduce the service time of an outage. Existing fault detection mechanisms (relays and digital fault recorders) are slow to communicate the fault characteristics upstream to the substations and control centers for action to be taken quickly. Fortunately, due to availability of high-resolution phasor measurement units (PMUs), more event-driven solutions can be captured in real time. In this paper, we propose a data-driven approach for determining fault characteristics using samples of fault trajectories. A random forest regressor (RFR)-based model is used to detect real-time fault location and its duration simultaneously. This model is based on combining multiple uncorrelated trees with state-of-the-art boosting and aggregating techniques in order to obtain robust generalizations and greater accuracy without overfitting or underfitting. Four cases were studied to evaluate the performance of RFR: 1. Detecting fault location (case 1), 2. Predicting fault duration (case 2), 3. Handling missing data (case 3), and 4. Identifying fault location and length in a real-time streaming environment (case 4). A comparative analysis was conducted between the RFR algorithm and state-of-the-art models, including deep neural network, Hoeffding tree, neural network, support vector machine, decision tree, naive Bayesian, and K-nearest neighborhood. Experiments revealed that RFR consistently outperformed the other models in detection accuracy, prediction error, and processing time.

Get full-text (via PubEx)

Data-Driven Machine-Learning Model in District Heating System for Heat Load Prediction: A Comparison Study

Applied Computational Intelligence and Soft Computing ◽

10.1155/2016/3403150 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 10

Author(s):

Fisnik Dalipi ◽

Sule Yildirim Yayilgan ◽

Alemayehu Gebremedhin

Keyword(s):

Machine Learning ◽

Heat Load ◽

District Heating ◽

Heating System ◽

Partial Least Square ◽

Supervised Machine Learning ◽

Data Driven ◽

Support Vector ◽

Load Prediction ◽

District Heating System

We present our data-driven supervised machine-learning (ML) model to predict heat load for buildings in a district heating system (DHS). Even though ML has been used as an approach to heat load prediction in literature, it is hard to select an approach that will qualify as a solution for our case as existing solutions are quite problem specific. For that reason, we compared and evaluated three ML algorithms within a framework on operational data from a DH system in order to generate the required prediction model. The algorithms examined are Support Vector Regression (SVR), Partial Least Square (PLS), and random forest (RF). We use the data collected from buildings at several locations for a period of 29 weeks. Concerning the accuracy of predicting the heat load, we evaluate the performance of the proposed algorithms using mean absolute error (MAE), mean absolute percentage error (MAPE), and correlation coefficient. In order to determine which algorithm had the best accuracy, we conducted performance comparison among these ML algorithms. The comparison of the algorithms indicates that, for DH heat load prediction, SVR method presented in this paper is the most efficient one out of the three also compared to other methods found in the literature.

Get full-text (via PubEx)

Classification models using circulating neutrophil transcripts can detect unruptured intracranial aneurysm

Journal of Translational Medicine ◽

10.1186/s12967-020-02550-2 ◽

2020 ◽

Vol 18 (1) ◽

Author(s):

Kerry E. Poppenberg ◽

Vincent M. Tutino ◽

Lu Li ◽

Muhammad Waqas ◽

Armond June ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

Model Performance ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Training Cohort ◽

Network Analyses ◽

Machine Learning Methods

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.

Get full-text (via PubEx)

A data-driven approach of load monitoring on laminated composite plates using support vector machine

Smart Structures and NDE for Industry 4.0 ◽

10.1117/12.2305840 ◽

2018 ◽

Author(s):

Hadi Fekrmandi ◽

Yun Seok Gwon

Keyword(s):

Support Vector Machine ◽

Laminated Composite ◽

Composite Plates ◽

Data Driven ◽

Support Vector ◽

Laminated Composite Plates ◽

Load Monitoring ◽

Data Driven Approach

Get full-text (via PubEx)

Water cut/salt content forecasting in oil wells using a novel data-driven approach

Oil & Gas Science and Technology – Revue d’IFP Energies nouvelles ◽

10.2516/ogst/2019040 ◽

2019 ◽

Vol 74 ◽

pp. 68

Author(s):

Rouhollah Ahmadi ◽

Jamal Shahrabi ◽

Babak Aminshahidy

Keyword(s):

Classification Tree ◽

Salt Content ◽

Oil Wells ◽

Data Driven ◽

Var Model ◽

Support Vector ◽

Water Production ◽

Linear Discriminant ◽

Data Driven Approach ◽

Water Cut

Water cut is an important parameter in reservoir management and surveillance. Unlike traditional approaches, including numerical simulation and analytical techniques, which were developed for predicting water production in oil wells based on some assumptions and limitations, a new data-driven approach is proposed for forecasting water cut in two different types of oil wells in this article. First, a classification approach is presented for water cut prediction in sweet oil wells with discontinuous salt production patterns. Different classification algorithms including Support Vector Machine (SVM), Classification Tree (CT), Random Forest (RF), Multi-Layer Perceptron (MLP), Linear Discriminant Analysis (LDA) and Naïve Bayes (NB) are investigated in this regard. According to the results of a case study on a real Iranian sweet oil well, RF, CT, MLP and SVM can provide the best performance measures, respectively. Next, a Vector Autoregressive (VAR) model is proposed for forecasting water cut in salty oil wells with continuous water production during the life of the well. The proposed VAR model is verified using data of two real salty oil wells. The results confirm that the well-tuned proposed VAR model could provide reliable and acceptable results with very good accuracy in forecasting water production for the near future days.

Get full-text (via PubEx)

Automatic recognition of self-acknowledged limitations in clinical research literature

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy038 ◽

2018 ◽

Vol 25 (7) ◽

pp. 855-861 ◽

Cited By ~ 4

Author(s):

Halil Kilicoglu ◽

Graciela Rosemblat ◽

Mario Malički ◽

Gerben ter Riet

Keyword(s):

Machine Learning ◽

Clinical Research ◽

Binary Classification ◽

Classification Performance ◽

Research Literature ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Rule Based ◽

Research Transparency

Abstract Objective To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency. Methods To develop our recognition methods, we used a set of 8431 sentences from 1197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing, and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM). Results Annotators had good agreement in labeling limitation sentences (Krippendorff’s α = 0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs 89.6%, 95% CI [88.1-91.1]). Conclusions The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.

Get full-text (via PubEx)

Application of Natural Language Processing with Supervised Machine Learning Techniques to Predict the Overall Drugs Performance

AJIT-e Online Academic Journal of Information Technology ◽

10.5824/ajite.2020.01.001.x ◽

2020 ◽

Vol 11 (40) ◽

pp. 8-23

Author(s):

Pius MARTHIN ◽

Duygu İÇEN

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Semantic Analysis ◽

Classification Tree ◽

Supervised Machine Learning ◽

Training Dataset ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

Online product reviews have become a valuable source of information which facilitate customer decision with respect to a particular product. With the wealthy information regarding user's satisfaction and experiences about a particular drug, pharmaceutical companies make the use of online drug reviews to improve the quality of their products. Machine learning has enabled scientists to train more efficient models which facilitate decision making in various fields. In this manuscript we applied a drug review dataset used by (Gräβer, Kallumadi, Malberg,& Zaunseder, 2018), available freely from machine learning repository website of the University of California Irvine (UCI) to identify best machine learning model which provide a better prediction of the overall drug performance with respect to users' reviews. Apart from several manipulations done to improve model accuracy, all necessary procedures required for text analysis were followed including text cleaning and transformation of texts to numeric format for easy training machine learning models. Prior to modeling, we obtained overall sentiment scores for the reviews. Customer's reviews were summarized and visualized using a bar plot and word cloud to explore the most frequent terms. Due to scalability issues, we were able to use only the sample of the dataset. We randomly sampled 15000 observations from the 161297 training dataset and 10000 observations were randomly sampled from the 53766 testing dataset. Several machine learning models were trained using 10 folds cross-validation performed under stratified random sampling. The trained models include Classification and Regression Trees (CART), classification tree by C5.0, logistic regression (GLM), Multivariate Adaptive Regression Spline (MARS), Support vector machine (SVM) with both radial and linear kernels and a classification tree using random forest (Random Forest). Model selection was done through a comparison of accuracies and computational efficiency. Support vector machine (SVM) with linear kernel was significantly best with an accuracy of 83% compared to the rest. Using only a small portion of the dataset, we managed to attain reasonable accuracy in our models by applying the TF-IDF transformation and Latent Semantic Analysis (LSA) technique to our TDM.

Get full-text (via PubEx)

Enhanced Changeover Detection in Industry 4.0 Environments with Machine Learning

Sensors ◽

10.3390/s21175896 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5896

Author(s):

Eddi Miller ◽

Vladyslav Borysenko ◽

Moritz Heusinger ◽

Niklas Niedner ◽

Bastian Engelmann ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Binary Classification ◽

Model Performance ◽

Support Vector ◽

Milling Machine ◽

Vector Machines ◽

Changeover Times ◽

Flow Power

Changeover times are an important element when evaluating the Overall Equipment Effectiveness (OEE) of a production machine. The article presents a machine learning (ML) approach that is based on an external sensor setup to automatically detect changeovers in a shopfloor environment. The door statuses, coolant flow, power consumption, and operator indoor GPS data of a milling machine were used in the ML approach. As ML methods, Decision Trees, Support Vector Machines, (Balanced) Random Forest algorithms, and Neural Networks were chosen, and their performance was compared. The best results were achieved with the Random Forest ML model (97% F1 score, 99.72% AUC score). It was also carried out that model performance is optimal when only a binary classification of a changeover phase and a production phase is considered and less subphases of the changeover process are applied.

Get full-text (via PubEx)

Data-Driven-Based Forecasting of Two-Phase Flow Parameters in Rectangular Channel

Frontiers in Energy Research ◽

10.3389/fenrg.2021.641661 ◽

2021 ◽

Vol 9 ◽

Author(s):

Qingyu Huang ◽

Yang Yu ◽

Yaoyi Zhang ◽

Bo Pang ◽

Yafeng Wang ◽

...

Keyword(s):

Random Forest ◽

Two Phase Flow ◽

Interfacial Area ◽

Machine Learning Algorithms ◽

Data Driven ◽

Support Vector ◽

Phase Flow ◽

Random Forest Regression ◽

Two Phase ◽

Flow Parameters

In the current nuclear reactor system analysis codes, the interfacial area concentration and void fraction are mainly obtained through empirical relations based on different flow regime maps. In the present research, the data-driven method has been proposed, using four machine learning algorithms (lasso regression, support vector regression, random forest regression and back propagation neural network) in the field of artificial intelligence to predict some important two-phase flow parameters in rectangular channels, and evaluate the performance of different models through multiple metrics. The random forest regression algorithm was found to have the strongest ability to learn from the experimental data in this study. Test results show that the prediction errors of the random forest regression model for interfacial area concentrations and void fractions are all less than 20%, which means the target parameters have been forecasted with good accuracy.

Get full-text (via PubEx)