scholarly journals A novel prediction algorithm for multivariate data sets

2021 ◽  
Vol 4 (2) ◽  
pp. 225-240
Author(s):  
Pinki Sagar ◽  
◽  
Prinima Gupta ◽  
Rohit Tanwar ◽  
◽  
...  

Regression analysis is a statistical technique that is most commonly used for forecasting. Data sets are becoming very large due to continuous transactions in today's high-paced world. The data is difficult to manage and interpret. All the independent variables can’t be considered for the prediction because it costs high for maintenance of the data set. A novel algorithm for prediction has been implemented in this paper. Its emphasis is on extraction of efficient independent variables from various variables of the data set. The selection of variables is based on Mean Square Errors (MSE) as well as on the coefficient of determination r2p, after that the final prediction equation for the algorithm is framed on the basis of deviation of actual mean. This is a statistical based prediction algorithm which is used to evaluate the prediction based on four parameters: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and residuals. This algorithm has been implemented for a multivariate data set with low maintenance costs, preprocessing costs, lower root mean square error and residuals. For one dimensional, two-dimensional, frequent stream data, time series data and continuous data, the proposed prediction algorithm can also be used. The impact of this algorithm is to enhance the accuracy rate of forecasting and minimized average error rate.

2017 ◽  
Vol 48 (1) ◽  
Author(s):  
Josana Andreia Langner ◽  
Nereu Augusto Streck ◽  
Angelica Durigon ◽  
Stefanía Dalmolin da Silva ◽  
Isabel Lago ◽  
...  

ABSTRACT: The objective of this study was to compare the simulations of leaf appearance of landrace and improved maize cultivars using the CSM-CERES-Maize (linear) and the Wang and Engel models (nonlinear). The coefficients of the models were calibrated using a data set of total leaf number collected in the 11/04/2013 sowing date for the landrace varieties ‘Cinquentinha’ and ‘Bico de Ouro’ and the simple hybrid ‘AS 1573PRO’. For the ‘BRS Planalto’ variety, model coefficients were estimated with data from 12/13/2014 sowing date. Evaluation of the models was with independent data sets collected during the growing seasons of 2013/2014 (Experiment 1) and 2014/2015 (Experiment 2) in Santa Maria, RS, Brazil. Total number of leaves for both landrace and improved maize varieties was better estimated with the Wang and Engel model, with a root mean square error of 1.0 leaf, while estimations with the CSM-CERES-Maize model had a root mean square error of 1.5 leaf.


Author(s):  
OCTAVIANUS BUDI SANTOSA ◽  
MICHAEL RAHARJA GANI ◽  
SRI HARTATI YULIANI

Objective: The objective of this study was to develop a UV spectroscopy method in combination with multivariate analysis for determining vitexin in binahong (Anredera cordifolia (Ten.) Steenis) leaves extract. Methods: The partial least square (PLS) regression and the principal component regression (PCR) was performed in this study to evaluate several statistical performances such as coefficient of determination (R2), root mean square error of calibration (RMSEC), root mean square error of cross-validation (RMSECV), root mean square error of prediction (RMSEP) and relative error of prediction (REP). Cross-validation in this study was performed using leave one out technique. Results: The R2 values of calibration data sets resulted from PLS ​​and PCR method were 0.9675 and 0.9648, respectively. The low values of RMSEC and RMSECV both for PLS ​​and PCR method indicated the minimum error of the calibration models. The R2 values of validation data sets resulted from PLS ​​and PCR method were 0.9778 and 0.9820, respectively. The low values of RMSEP both for PLS ​​and PCR method indicated the minimum error of prediction generated from the calibration data sets. Multivariate calibration techniques were applied to determine the content of vitexin in binahong leaves extract. Predicted values from the multivariate calibration models were compared to the actual values determined from a validated HPLC method. It was found that PLS models resulted in the lowest REP values compared to the PCR models. Conclusion: The chemometrics technique can be applied as an alternative method for determining vitexin levels in the ethanol solution of binahong leaves extract.


2005 ◽  
Vol 68 (11) ◽  
pp. 2301-2309 ◽  
Author(s):  
DANILO T. CAMPOS ◽  
BRADLEY P. MARKS ◽  
MARK R. POWELL ◽  
MARK L. TAMPLIN

The robustness of a microbial growth model must be assessed before the model can be applied to new food matrices; therefore, a methodology for quantifying robustness was developed. A robustness index (RI) was computed as the ratio of the standard error of prediction to the standard error of calibration for a given model, where the standard error of calibration was defined as the root mean square error of the growth model against the data (log CFU per gram versus time) used to parameterize the model and the standard error of prediction was defined as the root mean square error of the model against an independent data set. This technique was used to evaluate the robustness of a broth-based model for aerobic growth of Escherichia coli O157:H7 (in the U.S Department of Agriculture Agricultural Research Service Pathogen Modeling Program) in predicting growth in ground beef under different conditions. Comparison against previously published data (132 data sets with 1,178 total data points) from experiments in ground beef at various experimental conditions (4.8 to 45°C and pH 5.5 to 5.9) yielded RI values ranging from 0.11 to 2.99. The estimated overall RI was 1.13. At temperatures between 15 and 40°C, the RI was close to and smaller than 1, indicating that the growth model is relatively robust in that temperature range. However, the RI also was related (P < 0.05) to temperature. By quantifying the predictive accuracy relative to the expected accuracy, the RI could be a useful tool for comparing various models under different conditions.


2019 ◽  
Vol 50 (3) ◽  
pp. 120-126
Author(s):  
Homayoon Ganji ◽  
Takamitsu Kajisa

Estimation of reference evapotranspiration (ET0) with the Food and Agricultural Organisation (FAO) Penman-Monteith model requires temperature, relative humidity, solar radiation, and wind speed data. The lack of availability of the complete data set at some meteorological stations is a severe restriction for the application of this model. To overcome this problem, ET0 can be calculated using alternative data, which can be obtained via procedures proposed in FAO paper No.56. To confirm the validity of reference evapotranspiration calculated using alternative data (ET0(Alt)), the root mean square error (RMSE) needs to be estimated; lower values of RMSE indicate better validity. However, RMSE does not explain the mechanism of error formation in a model equation; explaining the mechanism of error formation is useful for future model improvement. Furthermore, for calculating RMSE, ET0 calculations based on both complete and alternative data are necessary. An error propagation approach was introduced in this study both for estimating RMSE and for explaining the mechanism of error formation by using data from a 30-year period from 48 different locations in Japan. From the results, RMSE was confirmed to be proportional to the value produced by the error propagation approach (ΔET0). Therefore, the error propagation approach is applicable to estimating the RMSE of ET0(Alt) in the range of 12%. Furthermore, the error of ET0(Alt) is not only related to the variables’ uncertainty but also to the combination of the variables in the equation.


2003 ◽  
Vol 57 (3) ◽  
pp. 309-316 ◽  
Author(s):  
Kelly J. Anderson ◽  
John H. Kalivas

Recent work has shown that ridge regression (RR) is Pareto to partial least squares (PLS) and principal component regression (PCR) when the variance indicator Euclidian norm of the regression coefficients, ‖p̂‖, is plotted against the bias indicator root mean square error of calibration (RMSEC). Simplex optimization demonstrates that RR is Pareto for several other spectral data sets when ‖p̂‖ is used with RMSEC and the root mean square error of evaluation (RMSEE) as optimization criteria. From this investigation, it was observed that while RR is Pareto optimal, PLS and PCR harmonious models are near equivalent to harmonious RR models. Additionally, it was found that RR is Pareto robust, i.e., models formed at one temperature were then used to predict samples at another temperature. Wavelength selection is commonly performed to improve analysis results such that bias indicators RMSEC, RMSEE, root mean square error of validation, or root mean square error of cross-validation decrease using a subset of wavelengths. Just as critical to an analysis of selected wavelengths is an assessment of variance. Using wavelengths deemed optimal in a previous study, this paper reports on the variance/bias tradeoff. An approach that forms the Pareto model with a Pareto wavelength subset is suggested.


Author(s):  
DJR Laurence ◽  
Susan C Kirkland ◽  
Morag L Ellison

A method of estimating doses from multiple dilutions of an unknown sample in radioimmunoassay is described. It uses a computer program to minimise the root mean square error about a standard curve. The confidence limits of the estimates were evaluated from the sum of squares error as a function of dose. An attached subroutine reported and suppressed points in the data set that showed unexpectedly large errors of fit. The program agreed well with an alternative point-by-point evaluation for most practical data but also responded robustly to errors at the limits of the assay curve. By studying patterns of point rejection, it is possible to build up information about the relation between the standard curve parameters and those in the unknown sets. The method is illustrated by studies of laboratory-based materials and by an assay of alphafetoprotein in samples of body fluids.


2021 ◽  
Vol 9 (1) ◽  
pp. 1-21
Author(s):  
Kayode Oshinubi ◽  
◽  
Augustina Amakor ◽  
Olumuyiwa James Peter ◽  
Mustapha Rachdi ◽  
...  

<abstract> <p>This article focuses on the application of deep learning and spectral analysis to epidemiology time series data, which has recently piqued the interest of some researchers. The COVID-19 virus is still mutating, particularly the delta and omicron variants, which are known for their high level of contagiousness, but policymakers and governments are resolute in combating the pandemic's spread through a recent massive vaccination campaign of their population. We used extreme machine learning (ELM), multilayer perceptron (MLP), long short-term neural network (LSTM), gated recurrent unit (GRU), convolution neural network (CNN) and deep neural network (DNN) methods on time series data from the start of the pandemic in France, Russia, Turkey, India, United states of America (USA), Brazil and United Kingdom (UK) until September 3, 2021 to predict the daily new cases and daily deaths at different waves of the pandemic in countries considered while using root mean square error (RMSE) and relative root mean square error (rRMSE) to measure the performance of these methods. We used the spectral analysis method to convert time (days) to frequency in order to analyze the peaks of frequency and periodicity of the time series data. We also forecasted the future pandemic evolution by using ELM, MLP, and spectral analysis. Moreover, MLP achieved best performance for both daily new cases and deaths based on the evaluation metrics used. Furthermore, we discovered that errors for daily deaths are much lower than those for daily new cases. While the performance of models varies, prediction and forecasting during the period of vaccination and recent cases confirm the pandemic's prevalence level in the countries under consideration. Finally, some of the peaks observed in the time series data correspond with the proven pattern of weekly peaks that is unique to the COVID-19 time series data.</p> </abstract>


2018 ◽  
Vol 0 (0) ◽  
Author(s):  
Abdelhalim Rabehi ◽  
Ali Djebbari ◽  
Ahmed Hafaifa ◽  
Abdelkerim Souahlia ◽  
Abdelmalik Taleb-Ahmed

Abstract In this paper, artificial neural network-based adaptive optimal threshold estimation for a two-dimensional optical code division multiple access conventional correlation receiver is proposed. A multilayer perceptron neural network with back-propagation learning algorithm is considered. This estimator uses the weight (w) and the length (F) of the code word, the number of active users (Ν) and the signal to noise ratio as inputs to estimate the required optimal threshold. We have evaluated the proposed approach on a data set of 46,200 samples. We have found that it gives accurate results: 0.029 for the root mean square error, 0.37% for the relative root mean square error and 99.984% for the correlation coefficient (R), which reflects the efficiency of the proposed optimal threshold estimator.


2021 ◽  
Vol 13 (9) ◽  
pp. 1630
Author(s):  
Yaohui Zhu ◽  
Guijun Yang ◽  
Hao Yang ◽  
Fa Zhao ◽  
Shaoyu Han ◽  
...  

With the increase in the frequency of extreme weather events in recent years, apple growing areas in the Loess Plateau frequently encounter frost during flowering. Accurately assessing the frost loss in orchards during the flowering period is of great significance for optimizing disaster prevention measures, market apple price regulation, agricultural insurance, and government subsidy programs. The previous research on orchard frost disasters is mainly focused on early risk warning. Therefore, to effectively quantify orchard frost loss, this paper proposes a frost loss assessment model constructed using meteorological and remote sensing information and applies this model to the regional-scale assessment of orchard fruit loss after frost. As an example, this article examines a frost event that occurred during the apple flowering period in Luochuan County, Northwestern China, on 17 April 2020. A multivariable linear regression (MLR) model was constructed based on the orchard planting years, the number of flowering days, and the chill accumulation before frost, as well as the minimum temperature and daily temperature difference on the day of frost. Then, the model simulation accuracy was verified using the leave-one-out cross-validation (LOOCV) method, and the coefficient of determination (R2), the root mean square error (RMSE), and the normalized root mean square error (NRMSE) were 0.69, 18.76%, and 18.76%, respectively. Additionally, the extended Fourier amplitude sensitivity test (EFAST) method was used for the sensitivity analysis of the model parameters. The results show that the simulated apple orchard fruit number reduction ratio is highly sensitive to the minimum temperature on the day of frost, and the chill accumulation and planting years before the frost, with sensitivity values of ≥0.74, ≥0.25, and ≥0.15, respectively. This research can not only assist governments in optimizing traditional orchard frost prevention measures and market price regulation but can also provide a reference for agricultural insurance companies to formulate plans for compensation after frost.


Forests ◽  
2021 ◽  
Vol 12 (8) ◽  
pp. 1020
Author(s):  
Yanqi Dong ◽  
Guangpeng Fan ◽  
Zhiwu Zhou ◽  
Jincheng Liu ◽  
Yongguo Wang ◽  
...  

The quantitative structure model (QSM) contains the branch geometry and attributes of the tree. AdQSM is a new, accurate, and detailed tree QSM. In this paper, an automatic modeling method based on AdQSM is developed, and a low-cost technical scheme of tree structure modeling is provided, so that AdQSM can be freely used by more people. First, we used two digital cameras to collect two-dimensional (2D) photos of trees and generated three-dimensional (3D) point clouds of plot and segmented individual tree from the plot point clouds. Then a new QSM-AdQSM was used to construct tree model from point clouds of 44 trees. Finally, to verify the effectiveness of our method, the diameter at breast height (DBH), tree height, and trunk volume were derived from the reconstructed tree model. These parameters extracted from AdQSM were compared with the reference values from forest inventory. For the DBH, the relative bias (rBias), root mean square error (RMSE), and coefficient of variation of root mean square error (rRMSE) were 4.26%, 1.93 cm, and 6.60%. For the tree height, the rBias, RMSE, and rRMSE were—10.86%, 1.67 m, and 12.34%. The determination coefficient (R2) of DBH and tree height estimated by AdQSM and the reference value were 0.94 and 0.86. We used the trunk volume calculated by the allometric equation as a reference value to test the accuracy of AdQSM. The trunk volume was estimated based on AdQSM, and its bias was 0.07066 m3, rBias was 18.73%, RMSE was 0.12369 m3, rRMSE was 32.78%. To better evaluate the accuracy of QSM’s reconstruction of the trunk volume, we compared AdQSM and TreeQSM in the same dataset. The bias of the trunk volume estimated based on TreeQSM was −0.05071 m3, and the rBias was −13.44%, RMSE was 0.13267 m3, rRMSE was 35.16%. At 95% confidence interval level, the concordance correlation coefficient (CCC = 0.77) of the agreement between the estimated tree trunk volume of AdQSM and the reference value was greater than that of TreeQSM (CCC = 0.60). The significance of this research is as follows: (1) The automatic modeling method based on AdQSM is developed, which expands the application scope of AdQSM; (2) provide low-cost photogrammetric point cloud as the input data of AdQSM; (3) explore the potential of AdQSM to reconstruct forest terrestrial photogrammetric point clouds.


Sign in / Sign up

Export Citation Format

Share Document