Predicting arrival time for CMEs: Machine learning and ensemble methods

Author(s):  
Ajay Tiwari ◽  
Enrico Camporeale ◽  
Jannis Teunissen ◽  
Raffaello Foldes ◽  
Gianluca Napoletano ◽  
...  

<p>Coronal mass ejections (CMEs) are arguably one of the most violent explosions in our solar system. CMEs are also one of the most important drivers for space weather. CMEs can have direct adverse effects on several human activities. Reliable and fast prediction of the CMEs arrival time is crucial to minimize such damage from a CME. We present a new pipeline combining machine learning (ML) with a physical drag-based model of CME propagation to predict the arrival time of the CME. We evaluate both standard ML approaches and a combination of ML + probabilistic drag based model (PDBM, Napoletano et al. 2018). More than 200 previously observed geo-effective partial-/full-halo CMEs make up the database for this study (with information extracted from the Richardson & Cane 2010 catalogue, the CDAW data centre CME list, the LASCO coronagraphic images, and the HEK database - Hurlburt et al. 2010). The P-DBM provides us with a reduced computation time, which is promising for space weather forecasts. We analyzed and compared various machine learning algorithms to identify the best performing algorithm for this database of the CMEs. We also examine the relative importance of various features such as mass, CME propagation speed, and height above the solar limb of the observed CMEs in the prediction of the arrival time. The model is able to accurately predict the arrival times of the CMEs with a mean square error of about 9 hours.  We also explore the differences in prediction from ML models and emblem prediction method namely P-DBM model.</p>

2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Junyi Li ◽  
Huinian Li ◽  
Xiao Ye ◽  
Li Zhang ◽  
Qingzhe Xu ◽  
...  

Abstract Background The prediction of long non-coding RNA (lncRNA) has attracted great attention from researchers, as more and more evidence indicate that various complex human diseases are closely related to lncRNAs. In the era of bio-med big data, in addition to the prediction of lncRNAs by biological experimental methods, many computational methods based on machine learning have been proposed to make better use of the sequence resources of lncRNAs. Results We developed the lncRNA prediction method by integrating information-entropy-based features and machine learning algorithms. We calculate generalized topological entropy and generate 6 novel features for lncRNA sequences. By employing these 6 features and other features such as open reading frame, we apply supporting vector machine, XGBoost and random forest algorithms to distinguish human lncRNAs. We compare our method with the one which has more K-mer features and results show that our method has higher area under the curve up to 99.7905%. Conclusions We develop an accurate and efficient method which has novel information entropy features to analyze and classify lncRNAs. Our method is also extendable for research on the other functional elements in DNA sequences.


2021 ◽  
Vol 10 (1) ◽  
pp. 42
Author(s):  
Kieu Anh Nguyen ◽  
Walter Chen ◽  
Bor-Shiun Lin ◽  
Uma Seeboonruang

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.


2017 ◽  
Vol 7 (5) ◽  
pp. 2073-2082 ◽  
Author(s):  
A. G. Armaki ◽  
M. F. Fallah ◽  
M. Alborzi ◽  
A. Mohammadzadeh

Financial institutions are exposed to credit risk due to issuance of consumer loans. Thus, developing reliable credit scoring systems is very crucial for them. Since, machine learning techniques have demonstrated their applicability and merit, they have been extensively used in credit scoring literature. Recent studies concentrating on hybrid models through merging various machine learning algorithms have revealed compelling results. There are two types of hybridization methods namely traditional and ensemble methods. This study combines both of them and comes up with a hybrid meta-learner model. The structure of the model is based on the traditional hybrid model of ‘classification + clustering’ in which the stacking ensemble method is employed in the classification part. Moreover, this paper compares several versions of the proposed hybrid model by using various combinations of classification and clustering algorithms. Hence, it helps us to identify which hybrid model can achieve the best performance for credit scoring purposes. Using four real-life credit datasets, the experimental results show that the model of (KNN-NN-SVMPSO)-(DL)-(DBSCAN) delivers the highest prediction accuracy and the lowest error rates.


With the rapid development of artificial intelligence, various machine learning algorithms have been widely used in the task of football match result prediction and have achieved certain results. However, traditional machine learning methods usually upload the results of previous competitions to the cloud server in a centralized manner, which brings problems such as network congestion, server computing pressure and computing delay. This paper proposes a football match result prediction method based on edge computing and machine learning technology. Specifically, we first extract some game data from the results of the previous games to construct the common features and characteristic features, respectively. Then, the feature extraction and classification task are deployed to multiple edge nodes.Finally, the results in all the edge nodes are uploaded to the cloud server and fused to make a decision. Experimental results have demonstrated the effectiveness of the proposed method.


2013 ◽  
Vol 2013 ◽  
pp. 1-15 ◽  
Author(s):  
Wentao Mao ◽  
Guirong Yan ◽  
Longlei Dong

In practical engineerings, structures are often excited by different kinds of loads at the same time. How to effectively analyze and simulate this kind of dynamic environment of structure, named combined dynamic environment, is one of the key issues. In this paper, a novel prediction method of combined dynamic environment is proposed from the perspective of data analysis. First, the existence of dynamic similarity between vibration responses of the same structure under different boundary conditions is theoretically proven. It is further proven that this similarity can be established by a multiple-input multiple-output regression model. Second, two machine learning algorithms, multiple-dimensional support vector machine and extreme learning machine, are introduced to establish this model. To test the effectiveness of this method, shock and stochastic white noise excitations are acted on a cylindrical shell with two clamps to simulate different dynamic environments. The prediction errors on various measuring points are all less than ±3 dB, which shows that the proposed method can predict the structural vibration response under one boundary condition by means of the response under another condition in terms of precision and numerical stability.


Materials ◽  
2021 ◽  
Vol 14 (3) ◽  
pp. 542
Author(s):  
José P. S. Aniceto ◽  
Bruno Zêzere ◽  
Carlos M. Silva

Experimental diffusivities are scarcely available, though their knowledge is essential to model rate-controlled processes. In this work various machine learning models to estimate diffusivities in polar and nonpolar solvents (except water and supercritical CO2) were developed. Such models were trained on a database of 90 polar systems (1431 points) and 154 nonpolar systems (1129 points) with data on 20 properties. Five machine learning algorithms were evaluated: multilinear regression, k-nearest neighbors, decision tree, and two ensemble methods (random forest and gradient boosted). For both polar and nonpolar data, the best results were found using the gradient boosted algorithm. The model for polar systems contains 6 variables/parameters (temperature, solvent viscosity, solute molar mass, solute critical pressure, solvent molar mass, and solvent Lennard-Jones energy constant) and showed an average deviation (AARD) of 5.07%. The nonpolar model requires five variables/parameters (the same of polar systems except the Lennard-Jones constant) and presents AARD = 5.86%. These results were compared with four classic models, including the 2-parameter correlation of Magalhães et al. (AARD = 5.19/6.19% for polar/nonpolar) and the predictive Wilke-Chang equation (AARD = 40.92/29.19%). Nonetheless Magalhães et al. requires two parameters per system that must be previously fitted to data. The developed models are coded and provided as command line program.


2021 ◽  
Author(s):  
Cong Cao

In this paper, we explore the impact of changes in traffic flow on local air pollution under specific meteorological conditions by integrating hourly traffic flow data, air pollution data and meteorological data, using generalized linear regression models and advanced machine learning algorithms: support vector machines and decision trees. The geographical location is Oslo, the capital of Norway, and the time we selected is from February 2020 to September 2020; We also selected 24-hour data for May 11 and 16 of the same year, representing weekday and holiday traffic flow, respectively, as a subset to further explore. Finally, we selected data from July 2020 for robustness testing, and algorithm performance verification.We found that: the maximum traffic flow on holidays is significantly higher than that on weekdays, but the holidays produce less concentration of {NO}_x throughout the month; the peak arrival time of {NO}_x,\ {NO}_2and NO concentrations is later than the peak arrival time of traffic flow. Among them, {NO}_x has a very significant variation, so we choose {NO}_x concentration as an air pollution indicator to measure the effect of traffic flow variation on air pollution; we also find that {NO}_xconcentration is negatively correlated with hourly precipitation, and the variation trend is like that of minimum air temperature. We used multiple imputation methods to interpolate the missing values. The decision tree results yield that when traffic volumes are high (>81%), low temperatures generate more concentrations of {NO}_x than high temperatures (an increase of 3.1%). Higher concentrations of {NO}_x (2.4%) are also generated when traffic volumes are low (no less than 22%) but there is some precipitation ≥ 0.27%.In the evaluation of the prediction accuracy of the machine learning algorithms, the support vector machine has the best prediction performance with high R-squared and small MAE, MSE and RMSE, indicating that the support vector machine has a better explanation for air pollution caused by traffic flow, while the decision tree is the second best, and the generalized linear regression model is the worst.The selected data for July 2020 obtained results consistent with the overall dataset.


2021 ◽  
Vol 5 (2) ◽  
pp. 20-25
Author(s):  
Azhi Abdalmohammed Faraj ◽  
Didam Ahmed Mahmud ◽  
Bilal Najmaddin Rashid

Credit card defaults pause a business-critical threat in banking systems thus prompt detection of defaulters is a crucial and challenging research problem. Machine learning algorithms must deal with a heavily skewed dataset since the ratio of defaulters to non-defaulters is very small. The purpose of this research is to apply different ensemble methods and compare their performance in detecting the probability of defaults customer’s credit card default payments in Taiwan from the UCI Machine learning repository. This is done on both the original skewed dataset and then on balanced dataset several studies have showed the superiority of neural networks as compared to traditional machine learning algorithms, the results of our study show that ensemble methods consistently outperform Neural Networks and other machine learning algorithms in terms of F1 score and area under receiver operating characteristic curve regardless of balancing the dataset or ignoring the imbalance


2018 ◽  
Vol 855 (2) ◽  
pp. 109 ◽  
Author(s):  
Jiajia Liu ◽  
Yudong Ye ◽  
Chenglong Shen ◽  
Yuming Wang ◽  
Robert Erdélyi

Sign in / Sign up

Export Citation Format

Share Document