scholarly journals DEVELOPMENT OF RAINFALL FORECASTING MODEL USING MACHINE LEARNING WITH SINGULAR SPECTRUM ANALYSIS

2022 ◽  
Vol 23 (1) ◽  
pp. 172-186
Author(s):  
Pundru Chandra Shaker Reddy ◽  
Sucharitha Yadala ◽  
Surya Narayana Goddumarri

Agriculture is the key point for survival for developing nations like India. For farming, rainfall is generally significant. Rainfall updates are help for evaluate water assets, farming, ecosystems and hydrology. Nowadays rainfall anticipation has become a foremost issue. Forecast of rainfall offers attention to individuals and knows in advance about rainfall to avoid potential risk to shield their crop yields from severe rainfall. This study intends to investigate the dependability of integrating a data pre-processing technique called singular-spectrum-analysis (SSA) with supervised learning models called least-squares support vector regression (LS-SVR), and Random-Forest (RF), for rainfall prediction. Integrating SSA with LS-SVR and RF, the combined framework is designed and contrasted with the customary approaches (LS-SVR and RF). The presented frameworks were trained and tested utilizing a monthly climate dataset which is separated into 80:20 ratios for training and testing respectively. Performance of the model was assessed using Root Mean Square Error (RMSE) and Nash–Sutcliffe Efficiency (NSE) and the proposed model produces the values as 71.6 %, 90.2 % respectively. Experimental outcomes illustrate that the proposed model can productively predict the rainfall. ABSTRAK:Pertanian adalah titik utama kelangsungan hidup negara-negara membangun seperti India. Untuk pertanian, curah hujan pada amnya ketara. Kemas kini hujan adalah bantuan untuk menilai aset air, pertanian, ekosistem dan hidrologi. Kini, jangkaan hujan telah menjadi isu utama. Ramalan hujan memberikan perhatian kepada individu dan mengetahui terlebih dahulu mengenai hujan untuk menghindari potensi risiko untuk melindungi hasil tanaman mereka dari hujan lebat. Kajian ini bertujuan untuk menyelidiki kebolehpercayaan mengintegrasikan teknik pra-pemprosesan data yang disebut analisis-spektrum tunggal (SSA) dengan model pembelajaran yang diawasi yang disebut regresi vektor sokongan paling rendah (LS-SVR), dan Random-Forest (RF), ramalan hujan. Menggabungkan SSA dengan LS-SVR dan RF, kerangka gabungan dirancang dan dibeza-bezakan dengan pendekatan biasa (LS-SVR dan RF). Kerangka kerja yang disajikan dilatih dan diuji dengan menggunakan set data iklim bulanan yang masing-masing dipisahkan menjadi nisbah 80:20 untuk latihan dan ujian. Prestasi model dinilai menggunakan Root Mean Square Error (RMSE) dan Nash – Sutcliffe Efficiency (NSE) dan model yang dicadangkan menghasilkan nilai masing-masing sebanyak 71.6%, 90.2%. Hasil eksperimen menggambarkan bahawa model yang dicadangkan dapat meramalkan hujan secara produktif.

2020 ◽  
Vol 12 (11) ◽  
pp. 1814
Author(s):  
Phamchimai Phan ◽  
Nengcheng Chen ◽  
Lei Xu ◽  
Zeqiang Chen

Tea is a cash crop that improves the quality of life for people in the Tanuyen District of Laichau Province, Vietnam. Tea yield, however, has stagnated in recent years, due to changes in temperature, precipitation, the age of the tea bushes, and diseases. Developing an approach for monitoring tea bushes by remote sensing and Geographic Information Systems (GIS) might be a way to alleviate this problem. Using multi-temporal remote sensing data, the paper details an investigation of the changes in tea health and yield forecasting through the normalized difference vegetation index (NDVI). In this study, we used NDVI as a support tool to demonstrate the temporal and spatial changes in NDVI through the extract tea NDVI value and calculate the mean NDVI value. The results of the study showed that the minimum NDVI value was 0.42 during January 2013 and February 2015 and 2016. The maximum NDVI value was in August 2015 and June 2017. We indicate that the linear relationship between NDVI value and mean temperature was strong with R 2 = 0.79 Our results confirm that the combination of meteorological data and NDVI data can achieve a high performance of yield prediction. Three models to predict tea yield were conducted: support vector machine (SVM), random forest (RF), and the traditional linear regression model (TLRM). For period 2009 to 2018, the prediction tea yield by the RF model was the best with a R 2 = 0.73 , by SVM it was 0.66, and 0.57 with the TLRM. Three evaluation indicators were used to consider accuracy: the coefficient of determination ( R 2 ), root-mean-square error (RMSE), and percentage error of tea yield (PETY). The highest accuracy for the three models was in 2015 with a R 2 ≥ 0.87, RMSE < 50 kg/ha, and PETY less 3% error. In the other years, the prediction accuracy was higher in the SVM and RF models. Meanwhile, the RF algorithm was better than PETY (≤10%) and the root mean square error for this algorithm was significantly less (≤80 kg/ha). RMSE and PETY showed relatively good values in the TLRM model with a RMSE from 80 to 100 kg/ha and a PETY from 8 to 15%.


2017 ◽  
Vol 71 (11) ◽  
pp. 2427-2436 ◽  
Author(s):  
Mi Lei ◽  
Long Chen ◽  
Bisheng Huang ◽  
Keli Chen

In this research paper, a fast, quantitative, analytical model for magnesium oxide (MgO) content in medicinal mineral talcum was explored based on near-infrared (NIR) spectroscopy. MgO content in each sample was determined by ethylenediaminetetraacetic acid (EDTA) titration and taken as reference value of NIR spectroscopy, and then a variety of processing methods of spectra data were compared to establish a good NIR spectroscopy model. To start, 50 batches of talcum samples were categorized into training set and test set using the Kennard–Stone (K-S) algorithm. In a partial least squares regression (PLSR) model, both leave-one-out cross-validation (LOOCV) and training set validation (TSV) were used to screen spectrum preprocessing methods from multiplicative scatter correction (MSC), and finally the standard normal variate transformation (SNV) was chosen as the optimal pretreatment method. The modeling spectrum bands and ranks were optimized using PLSR method, and the characteristic spectrum ranges were determined as 11995–10664, 7991–6661, and 4326–3999 cm−1, with four optimal ranks. In the support vector machine (SVM) model, the radical basis function (RBF) kernel function was used. Moreover, the full spectrum data of samples pretreated with SNV, the characteristic spectrum data screened using synergy interval partial least squares (SiPLS), and the scoring data of the first four ranks obtained by a partial least squares (PLS) dimension reduction of characteristic spectrum were taken as input variables of SVM, and the MgO content reference values of various sample were taken as output values. In addition, the SVM model internal parameters were optimized using the grid optimization method (GRID), particle swarm optimization (PSO), and genetic algorithm (GA) so that the optimal C and g-values were determined and the validation model was established. By comprehensively comparing the validation effects of different models, it can be concluded that the scoring data of the first four ranks obtained by PLS dimension reduction of characteristic spectrum were taken as input variables of SVM, and the PLS-SVM regression model established using GRID was the optimal NIR spectroscopy quantitative model of talc. This PLS-SVM regression model (rank = 4) measured that the MgO content of talcum was in the range of 17.42–33.22%, with root mean square error of cross validation (RMSECV) of 2.2127%, root mean square error of calibration (RMSEC) of 0.6057%, and root mean square error of prediction (RMSEP) of 1.2901%. This model showed high accuracy and strong prediction capacity, which can be used for rapid prediction of MgO content in talcum.


2020 ◽  
Vol 81 (5) ◽  
pp. 1090-1098
Author(s):  
Chen Xin ◽  
Xueqing Shi ◽  
Dongsheng Wang ◽  
Chong Yang ◽  
Qian Li ◽  
...  

Abstract The real time estimation of effluent indices of papermaking wastewater is vital to environmental conservation. Ensemble methods have significant advantages over conventional single models in terms of prediction accuracy. As an ensemble method, multi-grained cascade forest (gcForest) is implemented for the prediction of wastewater indices. Compared with the conventional modeling methods including partial least squares, support vector regression, and artificial neural networks, the gcForest model shows prediction superiority for effluent suspended solid (SSeff) and effluent chemical oxygen demand (CODeff). In terms of SSeff, gcForest achieves the highest correlation coefficient with a value of 0.86 and the lowest root-mean-square error (RMSE) value of 0.41. In comparison with the conventional models, the RMSE value using gcForest is reduced by approximately 46.05% to 50.60%. In terms of CODeff, gcForest achieves the highest correlation coefficient with a value of 0.83 and the lowest root-mean-square error value of 4.05. In comparison with the conventional models, the RMSE value using gcForest is reduced by approximately 10.60% to 18.51%.


2020 ◽  
Vol 9 (8) ◽  
pp. 479
Author(s):  
Viet-Ha Nhu ◽  
Himan Shahabi ◽  
Ebrahim Nohani ◽  
Ataollah Shirzadi ◽  
Nadhir Al-Ansari ◽  
...  

Zrebar Lake is one of the largest freshwater lakes in Iran and it plays an important role in the ecosystem of the environment, while its desiccation has a negative impact on the surrounded ecosystem. Despite this, this lake provides an interesting recreation setting in terms of ecotourism. The prediction and forecasting of the water level of the lake through simple but practical methods can provide a reliable tool for future lake water resource management. In the present study, we predict the daily water level of Zrebar Lake in Iran through well-known decision tree-based algorithms, including the M5 pruned (M5P), random forest (RF), random tree (RT) and reduced error pruning tree (REPT). We used five different water input combinations to find the most effective one. For our modeling, we chose 70% of the dataset for training (from 2011 to 2015) and 30% for model evaluation (from 2015 to 2017). We evaluated the models’ performances using different quantitative (root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R2), percent bias (PBIAS) and ratio of the root mean square error to the standard deviation of measured data (RSR)) and visual frameworks (Taylor diagram and box plot). Our results showed that water level with a one-day lag time had the highest effect on the result and, by increasing the lag time, its effect on the result was decreased. This result indicated that all the developed models had a good prediction capability, but the M5P model outperformed the others, followed by RF and RT equally and then REPT. Our results showed that these algorithms can predict water level accurately only with a one-day lag time in water level as an input and they are cost-effective tools for future predictions.


2020 ◽  
Vol 20 (3) ◽  
pp. 1016-1034
Author(s):  
Zhongda Tian

Abstract The accurate prediction of crop water requirement is of great significance for the development of regional agriculture. Based on the wavelet transform, a combined prediction approach for crop water requirement is proposed. Firstly, the Mallat wavelet transform algorithm is used to decompose and reconstruct the crop water requirement series. The approximate and detail components of the original series can be obtained. The characteristics of approximate components and detail components are analyzed by Hurst index. Then, according to the different characteristics of the components, the particle swarm optimization algorithm optimized support vector machine is used to predict the approximate component, and the autoregressive moving average model is used to predict the detail components. Three-fold cross-validation is used to improve the generalization ability of the forecasting model. Finally, combined with the prediction value of each prediction model, the final prediction value of crop water requirement is obtained. The crop water requirement data from 1983 to 2018 in Liaoning Province of China are collected as the research object. The simulation results indicate that the proposed combined prediction approach has high prediction accuracy for crop water requirement. The comparison of performance indicators shows that the root mean square error of the proposed prediction approach reduced by 45.40% to 57.16%, mean absolute error reduced by 32.96% to 52.07%, mean absolute percentile error reduced by 33.02% to 52.37%, relative root mean square error reduced by 45.26% to 57.38%, square sum error reduced by 70.18% to 80.42%, and the Theil inequality coefficient reduced by 59.02% to 80.77%. R square increased by 16.46% to 54.77%, and the index of agreement increased by 3.82% to 23.37%. The results of Pearson's test and the DM test show that the association strength between the actual value and the prediction value of the crop water requirement is stronger. Moreover, the proposed prediction approach in this paper has higher reliability under the same confidence level. The effectiveness of the proposed prediction approach for crop water requirement is verified. The proposed prediction approach has great significance for the rational use of water resources, planning and management, promoting social and economic sustainable development.


Many factors have led to the increase of suicide-proneness in the present era. As a consequence, many novel methods have been proposed in recent times for prediction of the probability of suicides, using different metrics. The current work reviews a number of models and techniques proposed recently, and offers a novel Bayesian machine learning (ML) model for prediction of suicides, involving classification of the data into separate categories. The proposed model is contrasted against similar computationally-inexpensive techniques such as spline regression. The model is found to generate appreciably accurate results for the dataset considered in this work. The application of Bayesian estimation allows the prediction of causation to a greater degree than the standard spline regression models, which is reflected by the comparatively low root mean square error (RMSE) for all estimates obtained by the proposed model.


2018 ◽  
Vol 4 (1) ◽  
Author(s):  
Agustian Noor

Gempa merupakan fenomena alam secara periodik yang terjadi di seluruh belahan bumi akibat adanya gaya pembangkit pasang surut yang utamanya berasal dari matahari dan bulan. Tujuan penelitian ini adalah untuk menganalisa hasil gempa bumi di Sumara Utara. Metode yang diusulkan adalahmembandingkan SVM dan SVM-PSO yang menggunakan data dari instansi terkait khususnya di daerah Sumatra Utara, Masing-masing algoritma akan implementasikan dengan menggunakan RapidMiner 5.1 Pengukuran kinerja dilakukan dengan menghitung rata-rata error yang terjadi melalui besaran Root Mean Square Error (RMSE). Semakin kecil nilai dari masing-masing parameter kinerja ini menyatakan semakin dekat nilai prediksi dengan nilai sebenarnya. Dengan demikian dapat diketahui algoritma yang lebih akurat.


Author(s):  
Parveen Bhola ◽  
Saurabh Bhardwaj

Many applications including power trading and planning require the accurate estimation of solar power in real time. As the power output of the solar panels degrades over the time period, so its real-time estimation is tough without the degradation parameter. In the proposed method, the effect of degradation in terms of performance ratio is incorporated along with other meteorological parameters. The degradation is calculated in real time using the clustering-based technique without physical inspection on site. Initially, the power is estimated using Support Vector Regression (SVR) model with the meteorological parameters. The estimation is further fine-tuned in sync with the degradation rate. The model is validated on the real data (Meteorological parameters and Solar power) procured from the solar plant. After refinement, the estimation results show significant improvement in terms of statistical measures. Now, the estimation accuracy in terms of coefficient of determination R2 is 92% and the error metrics normalized root mean square error (NMRSE), mean absolute percentage error (MAPE), root mean square error (RMSE) are 7.13, 5.92 and 14.54, respectively.


2018 ◽  
Vol 14 (2) ◽  
pp. 225
Author(s):  
Indriyanti Indriyanti ◽  
Agus Subekti

Konsumsi energi bangunan yang semakin meningkat mendorong para peneliti untuk membangun sebuah model prediksi dengan menerapkan metode machine learning, namun masih belum diketahui model yang paling akurat. Model prediktif untuk konsumsi energi bangunan komersial penting untuk konservasi energi. Dengan menggunakan model yang tepat, kita dapat membuat desain bangunan yang lebih efisien dalam penggunaan energi. Dalam tulisan ini, kami mengusulkan model prediktif berdasarkan metode pembelajaran mesin untuk mendapatkan model terbaik dalam memprediksi total konsumsi energi. Algoritma yang digunakan yaitu SMOreg dan LibSVM dari kelas Support Vector Machine, kemudian untuk evaluasi model berdasarkan nilai Mean Absolute Error dan Root Mean Square Error. Dengan menggunakan dataset publik yang tersedia, kami mengembangkan model berdasarkan pada mesin vektor pendukung untuk regresi. Hasil pengujian kedua algoritma tersebut diketahui bahwa algoritma SMOreg memiliki akurasi lebih baik karena memiliki nilai MAE dan RMSE sebesar 4,70 dan 10,15, sedangkan untuk model LibSVM memiliki nilai MAE dan RMSE sebesar 9,37 dan 14,45. Kami mengusulkan metode berdasarkan algoritma SMOreg karena kinerjanya lebih baik.


2021 ◽  
Vol 2108 (1) ◽  
pp. 012067
Author(s):  
Ke Chen ◽  
Hongkai Wang ◽  
Zhangchi Ying ◽  
Chengxin Zhang ◽  
Jiaqi Wang

Abstract Aiming at the problem of high root mean square error of traditional power grid energy anomaly data online cleaning, a power grid energy anomaly data online cleaning method based on improved random forest is designed. Firstly, an outlier data recognition model of isolated forest is designed to identify outliers in the data. Secondly, an improved random forest regression model is established to improve the adaptability of random forest to mixed abnormal data, and the data trend is fitted and predicted. Finally, the improved random forest data cleaning method is used to compensate the missing data after removing the mixed abnormal data, so as to clean the abnormal energy data of the power grid. The experimental results show that when the amount of power grid energy anomaly data increases, the cleaning root mean square error of the experimental group is significantly lower than that of the control group. The method in this paper solves the problem of high root-mean-square error in the online cleaning of abnormal data of traditional grid energy.


Sign in / Sign up

Export Citation Format

Share Document