scholarly journals A Novel Integration of Hodrick-Prescott Filter and Harmonic Analysis with Machine Learning Methods to Enhance Time Series Prediction Accuracy of Daily and Monthly Wind Speeds

Author(s):  
Chigbogu Godwin Ozoegwu

Abstract In this work, a new hybrid algorithm for modelling time series of daily and monthly wind speed is proposed. The method utilizes Hodrick-Prescott Filter (HPF) to decompose raw wind speed data into trend and cyclic components, and harmonic analysis (HA) is thereafter used to decompose the cyclic component into the periodic and stochastic sub-components. Machine learning (ML) methods are then used to model the time series of both the trend and stochastic components. The predicted wind speeds are finally summed from the individual predictions of the ML methods and harmonic analyses. To highlight the considerably higher predictive accuracy that results from the introduced data pre-treatments with HPF and HA, the proposed hybrid algorithm is compared against the traditional ML methods that are not subjected to the pre-treatments. The proposed hybrid algorithms are highly accurate relative to the traditional ML methods reflecting much higher coefficients of determination and correlation coefficients, and much lower error indices. Artificial neural networks (ANNs), linear regression with interactions (LRI), support vector machine (SVM), rational quadratic Gaussian process regression (RQGPR), fine regression trees (FRTs) and boosted ensembles of trees (BETs) are used as the illustrative machine learning methods. To guarantee both versatility and robustness, the methods are tested on example data drawn from both temperate and tropical conditions.

Author(s):  
Yusuf S. Türkan ◽  
Hacer Yumurtacı Aydoğmuş ◽  
Hamit Erdal

In Turkey, many enterprisers started to make investment on renewable energy systems after new legal regulations and stimulus packages about production of renewable energy were introduced. Out of many alternatives, production of electricity via wind farms is one of the leading systems. For these systems, the wind speed values measured prior to the establishment of the farms are extremely important in both decision making and in the projection of the investment. However, the measurement of the wind speed at different heights is a time consuming and expensive process. For this reason, the success of the techniques predicting the wind speeds is fairly important in fast and reliable decision-making for investment in wind farms. In this study, the annual wind speed values of Kutahya, one of the regions in Turkey that has potential for wind energy at two different heights, were used and with the help of speed values at 10 m, wind speed values at 30 m of height were predicted by seven different machine learning methods. The results of the analysis were compared with each other. The results show that support vector machines is a successful technique in the prediction of the wind speed for different heights. 


2019 ◽  
Vol 252 ◽  
pp. 06006
Author(s):  
Andrzej Puchalski ◽  
Iwona Komorska

Data-driven diagnostic methods allow to obtain a statistical model of time series and to identify deviations of recorded data from the pattern of the monitored system. Statistical analysis of time series of mechanical vibrations creates a new quality in the monitoring of rotating machines. Most real vibration signals exhibit nonlinear properties well described by scaling exponents. Multifractal analysis, which relies mainly on assessing local singularity exponents, has become a popular tool for statistical analysis of empirical data. There are many methods to study time series in terms of their fractality. Comparing computational complexity, a wavelet leaders algorithm was chosen. Using Wavelet Leaders Multifractal Formalism, multifractal parameters were estimated, taking them as diagnostic features in the pattern recognition procedure, using machine learning methods. The classification was performed using neural network, k-nearest neighbours’ algorithm and support vector machine. The article presents the results of vibration acceleration tests in a demonstration transmission system that allows simulations of assembly errors and teeth wear.


2021 ◽  
Author(s):  
Laleh Parviz ◽  
Kabir Rasouli ◽  
Ali Torabi

Abstract Precipitation forecast, especially on monthly and annual scales, is a key for optimal water resources management and planning, especially in semiarid climates with scarce water. The traditional hybrid models, in which two statistical models are used to separate and simulate linear and nonlinear components of precipitation time series, are still unable to provide accurate precipitation forecasts. This research aims to improve hybrid forecast models by combining one linear model and three nonlinear models with two preprocessing configurations: 1) using residuals of a linear model, representing the nonlinear component with different time steps and 2) using original time series of observations with different time steps, linear model simulations and residuals. Gene Expression Programming (GEP), Support Vector Regression (SVR) and Group Method of Data Handling (GMDH) models were used individually as in the traditional hybrid models and combinedly as in the proposed hybrid models in this study. The performance of the hybrid models was improved by different methods such as inverse variance (Iv) as an error-based method, least square regression, genetic algorithm and SVR. Two weather stations of Tabriz (annual) and Rasht (monthly) in Iran were selected to test the developed models. The results showed that Theil’s coefficient, UII, decreased in configuration one for the Tabriz station by 9% and 15% for SVR and GMDH relative to GEP, suggesting that these two models performed better than GEP in the precipitation forecast. The error criteria used in developing the proposed hybrid models with all forecast combination methods better represent observations than the hybrid model. MSE decreased by 67% and Nash Sutcliffe increased by 5% in the Rasht station in configuration two when we combined the three models using GA to obtain the improved hybrid model relative to the hybrid model combined with SVR. Generally, the hybrid models when SVR, the error based methods and GA were incorporated showed better performance than traditional hybrid models. The developed models have implications for modeling highly nonlinear systems using full advantages of machine learning methods.


2021 ◽  
Author(s):  
Pak Wai Chan ◽  
Wu Wen ◽  
Lei Li

Haze pollution, mainly characterized by low visibility, is one of the main environmental problems currently faced by China. Accurate haze forecasts facilitate the implementation of preventive measures to control the emission of air pollutants and, thereby mitigate haze pollution. However, it is not easy to accurately predict the low visibility events induced by haze, which requires not only accurate prediction for weather elements, but also refined and real-time updated source emission inventory. In order to obtain reliable forecasting tools, this paper studies the usability of several popular machine learning methods, such as support vector machine, k-nearest neighbor, random forest, as well as several deep learning methods, on the visibility forecasting. Starting from the main factors related to visibility, the relationships between wind speed, wind direction, temperature, humidity, and visibility are discussed. Training and forecasting were performed using the machine learning methods. The accuracy of these methods in visibility forecasting was confirmed through several parameters (i.e., root-mean-square error, mean absolute error, and mean absolute percentage error). The results show that: (1) Among all meteorological parameters, wind speed was the best at reflecting the visibility change patterns; (2) RNN LSTM, and GRU methods performs almost equally well on short-term visibility forecasts(i.e. 1h, 3h, and 6h); (3) A classical machine learning method (i.e. the SVM) performs well in mid- and long-term visibility forecasts; (4) The machine learning methods also have a certain degree of forecast accuracy even for long time periods (e.g. of 72h).


2019 ◽  
Vol 19 (25) ◽  
pp. 2301-2317 ◽  
Author(s):  
Ruirui Liang ◽  
Jiayang Xie ◽  
Chi Zhang ◽  
Mengying Zhang ◽  
Hai Huang ◽  
...  

In recent years, the successful implementation of human genome project has made people realize that genetic, environmental and lifestyle factors should be combined together to study cancer due to the complexity and various forms of the disease. The increasing availability and growth rate of ‘big data’ derived from various omics, opens a new window for study and therapy of cancer. In this paper, we will introduce the application of machine learning methods in handling cancer big data including the use of artificial neural networks, support vector machines, ensemble learning and naïve Bayes classifiers.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jing Xu ◽  
Xiangdong Liu ◽  
Qiming Dai

Abstract Background Hypertrophic cardiomyopathy (HCM) represents one of the most common inherited heart diseases. To identify key molecules involved in the development of HCM, gene expression patterns of the heart tissue samples in HCM patients from multiple microarray and RNA-seq platforms were investigated. Methods The significant genes were obtained through the intersection of two gene sets, corresponding to the identified differentially expressed genes (DEGs) within the microarray data and within the RNA-Seq data. Those genes were further ranked using minimum-Redundancy Maximum-Relevance feature selection algorithm. Moreover, the genes were assessed by three different machine learning methods for classification, including support vector machines, random forest and k-Nearest Neighbor. Results Outstanding results were achieved by taking exclusively the top eight genes of the ranking into consideration. Since the eight genes were identified as candidate HCM hallmark genes, the interactions between them and known HCM disease genes were explored through the protein–protein interaction (PPI) network. Most candidate HCM hallmark genes were found to have direct or indirect interactions with known HCM diseases genes in the PPI network, particularly the hub genes JAK2 and GADD45A. Conclusions This study highlights the transcriptomic data integration, in combination with machine learning methods, in providing insight into the key hallmark genes in the genetic etiology of HCM.


2021 ◽  
Vol 10 (4) ◽  
pp. 199
Author(s):  
Francisco M. Bellas Aláez ◽  
Jesus M. Torres Palenzuela ◽  
Evangelos Spyrakos ◽  
Luis González Vilas

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.


2021 ◽  
Vol 13 (5) ◽  
pp. 974
Author(s):  
Lorena Alves Santos ◽  
Karine Ferreira ◽  
Michelle Picoli ◽  
Gilberto Camara ◽  
Raul Zurita-Milla ◽  
...  

The use of satellite image time series analysis and machine learning methods brings new opportunities and challenges for land use and cover changes (LUCC) mapping over large areas. One of these challenges is the need for samples that properly represent the high variability of land used and cover classes over large areas to train supervised machine learning methods and to produce accurate LUCC maps. This paper addresses this challenge and presents a method to identify spatiotemporal patterns in land use and cover samples to infer subclasses through the phenological and spectral information provided by satellite image time series. The proposed method uses self-organizing maps (SOMs) to reduce the data dimensionality creating primary clusters. From these primary clusters, it uses hierarchical clustering to create subclusters that recognize intra-class variability intrinsic to different regions and periods, mainly in large areas and multiple years. To show how the method works, we use MODIS image time series associated to samples of cropland and pasture classes over the Cerrado biome in Brazil. The results prove that the proposed method is suitable for identifying spatiotemporal patterns in land use and cover samples that can be used to infer subclasses, mainly for crop-types.


2021 ◽  
Vol 20 (1) ◽  
Author(s):  
Xiaoya Guo ◽  
Akiko Maehara ◽  
Mitsuaki Matsumura ◽  
Liang Wang ◽  
Jie Zheng ◽  
...  

Abstract Background Coronary plaque vulnerability prediction is difficult because plaque vulnerability is non-trivial to quantify, clinically available medical image modality is not enough to quantify thin cap thickness, prediction methods with high accuracies still need to be developed, and gold-standard data to validate vulnerability prediction are often not available. Patient follow-up intravascular ultrasound (IVUS), optical coherence tomography (OCT) and angiography data were acquired to construct 3D fluid–structure interaction (FSI) coronary models and four machine-learning methods were compared to identify optimal method to predict future plaque vulnerability. Methods Baseline and 10-month follow-up in vivo IVUS and OCT coronary plaque data were acquired from two arteries of one patient using IRB approved protocols with informed consent obtained. IVUS and OCT-based FSI models were constructed to obtain plaque wall stress/strain and wall shear stress. Forty-five slices were selected as machine learning sample database for vulnerability prediction study. Thirteen key morphological factors from IVUS and OCT images and biomechanical factors from FSI model were extracted from 45 slices at baseline for analysis. Lipid percentage index (LPI), cap thickness index (CTI) and morphological plaque vulnerability index (MPVI) were quantified to measure plaque vulnerability. Four machine learning methods (least square support vector machine, discriminant analysis, random forest and ensemble learning) were employed to predict the changes of three indices using all combinations of 13 factors. A standard fivefold cross-validation procedure was used to evaluate prediction results. Results For LPI change prediction using support vector machine, wall thickness was the optimal single-factor predictor with area under curve (AUC) 0.883 and the AUC of optimal combinational-factor predictor achieved 0.963. For CTI change prediction using discriminant analysis, minimum cap thickness was the optimal single-factor predictor with AUC 0.818 while optimal combinational-factor predictor achieved an AUC 0.836. Using random forest for predicting MPVI change, minimum cap thickness was the optimal single-factor predictor with AUC 0.785 and the AUC of optimal combinational-factor predictor achieved 0.847. Conclusion This feasibility study demonstrated that machine learning methods could be used to accurately predict plaque vulnerability change based on morphological and biomechanical factors from multi-modality image-based FSI models. Large-scale studies are needed to verify our findings.


Sign in / Sign up

Export Citation Format

Share Document