scholarly journals Information in Missing Patterns: Enhancing Prediction Accuracy in Weighted Linear Regression with Missing Data Using Soft Clustering

Author(s):  
Ashkan Esmaeili ◽  
Mohammadamin Fakharian ◽  
Yasaman Amiri Abyaneh

The linear system with missing information is <div>investigated in this paper. New methods are </div><div>introduced to improve the Mean Squared Error (MSE) </div><div>on the test set in comparison to state-of-the-art method</div><div>s, through appropriate tuning of Bias-Variance </div><div>trade-off. The concept is to cluster the data and </div><div>adapt the learning model to each cluster. Hence, </div><div>we set forth a controlled bias into the problem and </div><div>positively utilize it to enhance learning capability on </div><div>the instances considered in some specific </div><div>neighborhood. To deal with missing infrormation, </div><div>we propose a novel algorithm "Missing-SCOP" based </div><div>on SCOP-KMEANS algorithm introduced by Wagstaff,</div><div> et al., utilizing the missing pattern of the dataset for </div><div>construction of a soft-constraint matrix and clustering </div><div>in missing scenario. It is shown that controlled </div><div>over-fitting suggested by our algorithm improves </div><div>prediction accuracy in various cases. </div><div>Numerical experiments approve the efficacy of our</div><div> proposed algorithm in enhancing the prediction</div><div> accuracy.</div>

2020 ◽  
Author(s):  
Ashkan Esmaeili ◽  
Mohammadamin Fakharian ◽  
Yasaman Amiri Abyaneh

The linear system with missing information is <div>investigated in this paper. New methods are </div><div>introduced to improve the Mean Squared Error (MSE) </div><div>on the test set in comparison to state-of-the-art method</div><div>s, through appropriate tuning of Bias-Variance </div><div>trade-off. The concept is to cluster the data and </div><div>adapt the learning model to each cluster. Hence, </div><div>we set forth a controlled bias into the problem and </div><div>positively utilize it to enhance learning capability on </div><div>the instances considered in some specific </div><div>neighborhood. To deal with missing infrormation, </div><div>we propose a novel algorithm "Missing-SCOP" based </div><div>on SCOP-KMEANS algorithm introduced by Wagstaff,</div><div> et al., utilizing the missing pattern of the dataset for </div><div>construction of a soft-constraint matrix and clustering </div><div>in missing scenario. It is shown that controlled </div><div>over-fitting suggested by our algorithm improves </div><div>prediction accuracy in various cases. </div><div>Numerical experiments approve the efficacy of our</div><div> proposed algorithm in enhancing the prediction</div><div> accuracy.</div>


Symmetry ◽  
2019 ◽  
Vol 11 (3) ◽  
pp. 338 ◽  
Author(s):  
Wei Tang ◽  
Yang Yang ◽  
Lanling Zeng ◽  
Yongzhao Zhan

Clustering is to group data so that the observations in the same group are more similar to each other than to those in other groups. k-means is a popular clustering algorithm in data mining. Its objective is to optimize the mean squared error (MSE). The traditional k-means algorithm is not suitable for applications where the sizes of clusters need to be balanced. Given n observations, our objective is to optimize the MSE under the constraint that the observations need to be evenly divided into k clusters. In this paper, we propose an iterative method for the task of clustering with balanced size constraints. Each iteration can be split into two steps, namely an assignment step and an update step. In the assignment step, the data are evenly assigned to each cluster. The balanced assignment task here is formulated as an integer linear program (ILP), and we prove that the constraint matrix of this ILP is totally unimodular. Thus the ILP is relaxed as a linear program (LP) which can be efficiently solved with the simplex algorithm. In the update step, the new centers are updated as the centroids of the observations in the clusters. Assuming that there are n observations and the algorithm needs m iterations to converge, we show that the average time complexity of the proposed algorithm is O ( m n 1 . 65 ) – O ( m n 1 . 70 ) . Experimental results indicate that, comparing with state-of-the-art methods, the proposed algorithm is efficient in deriving more accurate clustering.


2015 ◽  
Vol 5 (1) ◽  
pp. 79
Author(s):  
Raid B. Salha ◽  
Hazem I. El Shekh Ahmed ◽  
Hossam O. EL-Sayed

In this paper, we define the adaptive kernel estimation of the conditional distribution function (cdf) for independent and identically distributed (iid) data using varying bandwidth. The bias, variance and the mean squared error of the proposed estimator are investigated. Moreover, the asymptotic normality of the proposed estimator is investigated.<br /><br />The results of the simulation study show that the adaptive kernel estimation of the conditional quantiles with varying bandwidth have better performance than the kernel estimations with fixed bandwidth.


2005 ◽  
Vol 4 (1) ◽  
pp. 51
Author(s):  
I W. MANGKU ◽  
I. WIDIYASTUTI ◽  
I G. P. PURNABA

<p>An estimator of the intensity in the form of a power function of an inhomogeneous Poisson process is constructed and investigated. It is assumed that only a single realization of the Poisson process is observed in a bounded window. We prove that the proposed estimator is consistent when the size of the window indefinitely expands. The asymptotic bias, variance and the mean- squared error of the proposed estimator are computed. Asymptotic normality of the estimator is also established.</p>


2011 ◽  
Vol 60 (2) ◽  
pp. 248-255 ◽  
Author(s):  
Sangmun Shin ◽  
Funda Samanlioglu ◽  
Byung Rae Cho ◽  
Margaret M. Wiecek

2021 ◽  
Author(s):  
Ma Te ◽  
Tetsuya Inagaki ◽  
Masato Yoshida ◽  
Mayumi Ichino ◽  
Satoru Tsuchikawa

Abstract Wood has various mechanical properties, so stiffness evaluation is critical for quality management. Using conventional strain gauges constantly is high cost, also challenging to measure precious wood materials due to the use of strong adhesive. This study demonstrates the correlation between light scattering changes inside the wood cell walls and tensile strain. A multifiber-based visible-near-infrared (Vis–NIR) spatially resolved spectroscopy (SRS) system was designed to rapidly and conventiently acquire such light scattering changes. For the preliminary experiment, samples with different thicknesses were measured to evaluate the influence of thickness. The differences in Vis–NIR SRS spectral data diminish with an increase in sample thickness, which suggests that the SRS method can successfully measure the whole strain (i.e., surface and inside) of wood samples. Then, for the primary experiment, 18 wood samples with the same thickness (2 mm) were tested to construct a strain calibration model. The prediction accuracy was characterized by a determination coefficient (R2) of 0.86 with a root mean squared error (RMSE) of 297.89 με for five-fold cross-validation; for test validation, The prediction accuracy was characterized by an R2 of 0.82 and an RMSE of 345.44 με.


2018 ◽  
Vol 10 (12) ◽  
pp. 4863 ◽  
Author(s):  
Chao Huang ◽  
Longpeng Cao ◽  
Nanxin Peng ◽  
Sijia Li ◽  
Jing Zhang ◽  
...  

Photovoltaic (PV) modules convert renewable and sustainable solar energy into electricity. However, the uncertainty of PV power production brings challenges for the grid operation. To facilitate the management and scheduling of PV power plants, forecasting is an essential technique. In this paper, a robust multilayer perception (MLP) neural network was developed for day-ahead forecasting of hourly PV power. A generic MLP is usually trained by minimizing the mean squared loss. The mean squared error is sensitive to a few particularly large errors that can lead to a poor estimator. To tackle the problem, the pseudo-Huber loss function, which combines the best properties of squared loss and absolute loss, was adopted in this paper. The effectiveness and efficiency of the proposed method was verified by benchmarking against a generic MLP network with real PV data. Numerical experiments illustrated that the proposed method performed better than the generic MLP network in terms of root mean squared error (RMSE) and mean absolute error (MAE).


2016 ◽  
Vol 5 (1) ◽  
pp. 39 ◽  
Author(s):  
Abbas Najim Salman ◽  
Maymona Ameen

<p>This paper is concerned with minimax shrinkage estimator using double stage shrinkage technique for lowering the mean squared error, intended for estimate the shape parameter (a) of Generalized Rayleigh distribution in a region (R) around available prior knowledge (a<sub>0</sub>) about the actual value (a) as initial estimate in case when the scale parameter (l) is known .</p><p>In situation where the experimentations are time consuming or very costly, a double stage procedure can be used to reduce the expected sample size needed to obtain the estimator.</p><p>The proposed estimator is shown to have smaller mean squared error for certain choice of the shrinkage weight factor y(<strong>×</strong>) and suitable region R.</p><p>Expressions for Bias, Mean squared error (MSE), Expected sample size [E (n/a, R)], Expected sample size proportion [E(n/a,R)/n], probability for avoiding the second sample and percentage of overall sample saved  for the proposed estimator are derived.</p><p>Numerical results and conclusions for the expressions mentioned above were displayed when the consider estimator are testimator of level of significanceD.</p><p>Comparisons with the minimax estimator and with the most recent studies were made to shown the effectiveness of the proposed estimator.</p>


2020 ◽  
Vol 2020 ◽  
pp. 1-22
Author(s):  
Byung-Kwon Son ◽  
Do-Jin An ◽  
Joon-Ho Lee

In this paper, a passive localization of the emitter using noisy angle-of-arrival (AOA) measurements, called Brown DWLS (Distance Weighted Least Squares) algorithm, is considered. The accuracy of AOA-based localization is quantified by the mean-squared error. Various estimates of the AOA-localization algorithm have been derived (Doğançay and Hmam, 2008). Explicit expression of the location estimate of the previous study is used to get an analytic expression of the mean-squared error (MSE) of one of the various estimates. To validate the derived expression, we compare the MSE from the Monte Carlo simulation with the analytically derived MSE.


Sign in / Sign up

Export Citation Format

Share Document