mean squared error
Recently Published Documents


TOTAL DOCUMENTS

2882
(FIVE YEARS 1642)

H-INDEX

54
(FIVE YEARS 17)

Author(s):  
Özerk Yavuz

Epidemic diseases can be extremely dangerous with its hazarding influences. They may have negative effects on economies, businesses, environment, humans, and workforce. In this paper, some of the factors that are interrelated with COVID-19 pandemic have been examined using data mining methodologies and approaches. As a result of the analysis some rules and insights have been discovered and performances of the data mining algorithms have been evaluated. According to the analysis results, JRip algorithmic technique had the most correct classification rate and the lowest root mean squared error (RMSE). Considering classification rate and RMSE measure, JRip can be considered as an effective method in understanding factors that are related with corona virus caused deaths.


2022 ◽  
Vol 34 (2) ◽  
pp. 1-17
Author(s):  
Rahman A. B. M. Salman ◽  
Lee Myeongbae ◽  
Lim Jonghyun ◽  
Yongyun Cho ◽  
Shin Changsun

Energy has been obtained as one of the key inputs for a country's economic growth and social development. Analysis and modeling of industrial energy are currently a time-insertion process because more and more energy is consumed for economic growth in a smart factory. This study aims to present and analyse the predictive models of the data-driven system to be used by appliances and find out the most significant product item. With repeated cross-validation, three statistical models were trained and tested in a test set: 1) General Linear Regression Model (GLM), 2) Support Vector Machine (SVM), and 3) boosting Tree (BT). The performance of prediction models measured by R2 error, Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Variation (CV). The best model from the study is the Support Vector Machine (SVM) that has been able to provide R2 of 0.86 for the training data set and 0.85 for the testing data set with a low coefficient of variation, and the most significant product of this smart factory is Skelp.


2024 ◽  
Vol 84 ◽  
Author(s):  
A. Yousafzai ◽  
W. Manzoor ◽  
G. Raza ◽  
T. Mahmood ◽  
F. Rehman ◽  
...  

Abstract This study aimed to develop and evaluate data driven models for prediction of forest yield under different climate change scenarios in the Gallies forest division of district Abbottabad, Pakistan. The Random Forest (RF) and Kernel Ridge Regression (KRR) models were developed and evaluated using yield data of two species (Blue pine and Silver fir) as an objective variable and climate data (temperature, humidity, rainfall and wind speed) as predictive variables. Prediction accuracy of both the models were assessed by means of root mean squared error (RMSE), mean absolute error (MAE), correlation coefficient (r), relative root mean squared error (RRMSE), Legates-McCabe’s (LM), Willmott’s index (WI) and Nash-Sutcliffe (NSE) metrics. Overall, the RF model outperformed the KRR model due to its higher accuracy in forecasting of forest yield. The study strongly recommends that RF model should be applied in other regions of the country for prediction of forest growth and yield, which may help in the management and future planning of forest productivity in Pakistan.


2022 ◽  
Vol 2022 ◽  
pp. 1-10
Author(s):  
He Ma ◽  
Yi Zuo ◽  
Tieshan Li

With the increasing application and utility of automatic identification systems (AISs), large volumes of AIS data are collected to record vessel navigation. In recent years, the prediction of vessel trajectories has become one of the hottest research issues. In contrast to existing studies, most researchers have focused on the single-trajectory prediction of vessels. This article proposes a multiple-trajectory prediction model and makes two main contributions. First, we propose a novel method of trajectory feature representation that uses a hierarchical clustering algorithm to analyze and extract the vessel navigation behavior for multiple trajectories. Compared with the classic methods, e.g., Douglas–Peucker (DP) and least-squares cubic spline curve approximation (LCSCA) algorithms, the mean loss of trajectory features extracted by our method is approximately 0.005, and it is reduced by 50% and 30% compared to the DP and LCSCA algorithms, respectively. Second, we design an integrated model for simultaneous prediction of multiple trajectories using the proposed features and employ the long short-term memory (LSTM)-based neural network and recurrent neural network (RNN) to pursue this time series task. Furthermore, the comparative experiments prove that the mean value and standard deviation of root mean squared error (RMSE) using the LSTM are 4% and 14% lower than those using the RNN, respectively.


2022 ◽  
Author(s):  
Chen Wei ◽  
Kui Xu ◽  
Zhexian Shen ◽  
Xiaochen Xia ◽  
Wei Xie ◽  
...  

Abstract In this paper, we investigate the uplink transmission for user-centric cell-free massive multiple-input multiple-output (MIMO) systems. The largest-large-scale-fading-based access point (AP) selection method is adopted to achieve a user-centric operation. Under this user-centric framework, we propose a novel inter-cluster interference-based (IC-IB) pilot assignment scheme to alleviate pilot contamination. Considering the local characteristics of channel estimates and statistics, we propose a location-aided distributed uplink combining scheme based on a novel proposed metric representing inter-user interference to balance the relationship among the spectral efficiency (SE), user equipment (UE) fairness and complexity, in which the normalized local partial minimum mean-squared error (LP-MMSE) combining is adopted for some APs, while the normalized maximum ratio (MR) combining is adopted for the remaining APs. A new closed-form SE expression using the normalized MR combining is derived and a novel metric to indicate the UE fairness is also proposed. Moreover, the max-min fairness (MMF) power control algorithm is utilized to further ensure uniformly good service to the UEs. Simulation results demonstrate that the channel estimation accuracy of our proposed IC-IB pilot assignment scheme outperforms that of the conventional pilot assignment schemes. Furthermore, although the proposed location-aided uplink combining scheme is not always the best in terms of the per-UE SE, it can provide the more fairness among UEs and can achieve a good trade-off between the average SE and computational complexity.


Symmetry ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 160
Author(s):  
Pyae-Pyae Phyo ◽  
Yung-Cheol Byun ◽  
Namje Park

Meeting the required amount of energy between supply and demand is indispensable for energy manufacturers. Accordingly, electric industries have paid attention to short-term energy forecasting to assist their management system. This paper firstly compares multiple machine learning (ML) regressors during the training process. Five best ML algorithms, such as extra trees regressor (ETR), random forest regressor (RFR), light gradient boosting machine (LGBM), gradient boosting regressor (GBR), and K neighbors regressor (KNN) are trained to build our proposed voting regressor (VR) model. Final predictions are performed using the proposed ensemble VR and compared with five selected ML benchmark models. Statistical autoregressive moving average (ARIMA) is also compared with the proposed model to reveal results. For the experiments, usage energy and weather data are gathered from four regions of Jeju Island. Error measurements, including mean absolute percentage error (MAPE), mean absolute error (MAE), and mean squared error (MSE) are computed to evaluate the forecasting performance. Our proposed model outperforms six baseline models in terms of the result comparison, giving a minimum MAPE of 0.845% on the whole test set. This improved performance shows that our approach is promising for symmetrical forecasting using time series energy data in the power system sector.


PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0262131
Author(s):  
Adil Aslam Mir ◽  
Kimberlee Jane Kearfott ◽  
Fatih Vehbi Çelebi ◽  
Muhammad Rafique

A new methodology, imputation by feature importance (IBFI), is studied that can be applied to any machine learning method to efficiently fill in any missing or irregularly sampled data. It applies to data missing completely at random (MCAR), missing not at random (MNAR), and missing at random (MAR). IBFI utilizes the feature importance and iteratively imputes missing values using any base learning algorithm. For this work, IBFI is tested on soil radon gas concentration (SRGC) data. XGBoost is used as the learning algorithm and missing data are simulated using R for different missingness scenarios. IBFI is based on the physically meaningful assumption that SRGC depends upon environmental parameters such as temperature and relative humidity. This assumption leads to a model obtained from the complete multivariate series where the controls are available by taking the attribute of interest as a response variable. IBFI is tested against other frequently used imputation methods, namely mean, median, mode, predictive mean matching (PMM), and hot-deck procedures. The performance of the different imputation methods was assessed using root mean squared error (RMSE), mean squared log error (MSLE), mean absolute percentage error (MAPE), percent bias (PB), and mean squared error (MSE) statistics. The imputation process requires more attention when multiple variables are missing in different samples, resulting in challenges to machine learning methods because some controls are missing. IBFI appears to have an advantage in such circumstances. For testing IBFI, Radon Time Series Data (RTS) has been used and data was collected from 1st March 2017 to the 11th of May 2018, including 4 seismic activities that have taken place during the data collection time.


2022 ◽  
Vol 12 (2) ◽  
pp. 717
Author(s):  
Sahar Alwadei ◽  
Ashraf Farahat ◽  
Moataz Ahmed ◽  
Harry D. Kambezidis

Data from a moderate resolution imaging spectroradiometer instrument onboard the Terra satellite along with a radiative transfer model and a machine learning technique were integrated to predict direct solar irradiance on a horizontal surface over the Arabian Peninsula (AP). In preparation for building appropriate residual network (ResNet) prediction models, we conducted some exploratory data analysis (EDA) and came to some conclusions. We noted that aerosols in the atmosphere correlate with solar irradiance in the eastern region of the AP, especially near the coastlines of the Arabian Gulf and the Sea of Oman. We also found low solar irradiance during March 2016 and March 2017 in the central (~20% less) and eastern regions (~15% less) of the AP, which could be attributed to the high frequency of dust events during those months. Compared to other locations in the AP, high solar irradiance was recorded in the Rub Al Khali desert during winter and spring. The effect of major dust outbreaks over the AP during March 2009 and March 2012 was also noted. The EDA indicated a correlation between high aerosol loading and a decrease in solar irradiance. The analysis showed that the Rub Al Khali desert is one of the best locations in the AP to harvest solar radiation. The analysis also showed the ResNet prediction model achieves high test accuracy scores, indicated by a mean absolute error of ~0.02, a mean squared error of ~0.005, and an R2 of 0.99.


2022 ◽  
Vol 12 (2) ◽  
pp. 749
Author(s):  
Yunfei Gao ◽  
Albert No

Finding a biomarker that indicates the subject’s age is one of the most important topics in biology. Several recent studies tried to extract a biomarker from brain imaging data including fMRI data. However, most of them focused on MRI data, which do not provide dynamics and lack attempts to apply recently proposed deep learning models. We propose a deep neural network model that estimates the age of a subject from fMRI images using a recurrent neural network (RNN), more precisely, a gated recurrent unit (GRU). However, applying neural networks is not trivial due to the high dimensional nature of fMRI data. In this work, we propose a novel preprocessing technique using the Automated Anatomical Labeling (AAL) atlas, which significantly reduces the input dimension. The proposed dimension reduction technique allows us to train our model with 640 training and validation samples from different projects under mean squared error (MSE). Finally, we obtain the correlation value of 0.905 between the predicted age and the actual age on 155 test samples. The proposed model estimates the age within the range of ±12 on most of the test samples. Our model is written in Python and is freely available for download.


2022 ◽  
Author(s):  
Ye Zhao ◽  
Xiang zhang ◽  
feng xiong ◽  
Shuying Liu ◽  
yao wang ◽  
...  

Abstract High-density precipitation data is always desired to capture the heterogeneity of precipitation to accurately describe the components of the hydrological cycle. However, equipping and maintaining a high-density rain gauge network involves high costs, and the existing rain gauges are often unable to meet the density requirements. The objective of this study is to provide a new method to analyze the spatiotemporal variability of the precipitation field and to solve the problem of insufficient site density. To this end, the Proper Orthogonal Decomposition (POD) method is proposed, which can analyze the spatial distribution characteristics of rainfall fields to solve data shortages. To demonstrate the feasibility and advantages of the proposed methodology, four districts and counties (Hongshan District, Jianli County, Sui County, and Xuanen County) in Hubei province in China were selected as case studies. The principal results are as follows. (1) The proposed method is effective in analyzing the spatiotemporal variability of the rainfall field to reconstruct rainfall data in ungauged basins. (2) Compared with the commonly used Thiessen Polygon method, the Inverse Distance Weighting method, and the Kriging method, POD is more accurate and convenient, and the root mean squared error is reduced from 3.22, 1.83, 2.19 to 2.09; the correlation coefficients are improved from 0.60, 0.85, 0.79 to 0.89, respectively. (3) The POD method performs particularly well in simulating the peak value and the peak time and can offer a meaningful reference for analyzing the spatial distribution of rainfall.


Sign in / Sign up

Export Citation Format

Share Document