scholarly journals Correcting the bias of the Root Mean Squared Error of Approximation under missing data

Methodology ◽  
2021 ◽  
Vol 17 (3) ◽  
pp. 189-204
Author(s):  
Cailey E. Fitzgerald ◽  
Ryne Estabrook ◽  
Daniel P. Martin ◽  
Andreas M. Brandmaier ◽  
Timo von Oertzen

Missing data are ubiquitous in psychological research. They may come about as an unwanted result of coding or computer error, participants' non-response or absence, or missing values may be intentional, as in planned missing designs. We discuss the effects of missing data on χ²-based goodness-of-fit indices in Structural Equation Modeling (SEM), specifically on the Root Mean Squared Error of Approximation (RMSEA). We use simulations to show that naive implementations of the RMSEA have a downward bias in the presence of missing data and, thus, overestimate model goodness-of-fit. Unfortunately, many state-of-the-art software packages report the biased form of RMSEA. As a consequence, the scientific community may have been accepting a much larger fraction of models with non-acceptable model fit. We propose a bias-correction for the RMSEA based on information-theoretic considerations that take into account the expected misfit of a person with fully observed data. The corrected RMSEA is asymptotically independent of the proportion of missing data for misspecified models. Importantly, results of the corrected RMSEA computation are identical to naive RMSEA if there are no missing data.

2018 ◽  
Author(s):  
Cailey Elizabeth Fitzgerald ◽  
Ryne Estabrook ◽  
Daniel Patrick Martin ◽  
Andreas Markus Brandmaier ◽  
Timo von Oertzen

Missing data are ubiquitous in both small and large datasets. Missing data may come about as a result of coding or computer error, participant absences, or it may be intentional, as in planned missing designs. We discuss missing data as it relates to goodness-of-fit indices in Structural Equation Modeling (SEM), specifically the effects of missing data on the Root Mean Squared Error of Approximation (RMSEA). We use simulations to show that naive implementations of the RMSEA have a downward bias in the presence of missing data and, thus, overestimate model goodness-of-fit. Unfortunately, many state-of-the-art software packages report the biased form of RMSEA. As a consequence, the community may have been accepting a much larger fraction of models with non-acceptable model fit. We propose a bias-correction for the RMSEA based on information-theoretic considerations that take into account the expected misfit of a person with fully observed data. This results in an RMSEA which is asymptotically independent of the proportion of missing data for misspecified models. Importantly, results of the corrected RMSEA computation are identical to naive RMSEA if there are no missing data.


1995 ◽  
Vol 20 (1) ◽  
pp. 69-82 ◽  
Author(s):  
David Kaplan

This article considers the impact of missing data arising from balanced incomplete block (BIB) spiraled designs on the chi-square goodness-of-fit test in factor analysis. Specifically, data arising from BIB designs possess a unique pattern of missing data that can be characterized as missing completely at random (MCAR). Standard approaches to factor analyzing such data rest on forming pairwise available case (PAC) covariance matrices. Developments in statistical theory for missing data show that PAC covariance matrices may not satisfy Wishart distribution assumptions underlying factor analysis, thus impacting tests of model fit. One approach, advocated by Muthén, Kaplan, and Hollis (1987) for handling missing data in structural equation modeling, is proposed as a possible solution to these problems. This study compares the new approach to the standard PAC approach in a Monte Carlo framework. Results show that tests of goodness-of-fit are very sensitive to PAC approaches even when data are MCAR, as is the case for BIB designs. The new approach is shown to outperform the PAC approach for continuous variables and is comparatively better for dichotomous variables.


2021 ◽  
Vol 13 (3) ◽  
pp. 478
Author(s):  
Víctor García-Gutiérrez ◽  
Claudio Stöckle ◽  
Pilar Macarena Gil ◽  
Francisco Javier Meza

Water scarcity is one of the most important problems of agroecosystems in Mediterranean and semiarid areas, especially for species such as vineyards that largely depend on irrigation. Actual evapotranspiration (ET) is a variable that represents water consumption of a crop, integrating climate and biophysical variables. Actual evapotranspiration models based on remote sensing data from visible bands of Sentinel-2, including Penman-Monteith–Stewart (RS-PMS) and Penman-Monteith–Leuning (RS-PML), were evaluated at different temporal scales in a Cabernet Sauvignon vineyard (Vitis vinifera L.) located in central Chile, and their performance compared with independent ET measurements from an eddy covariance system (EC) and outputs from models based on thermal infrared data from Landsat 7 and Landsat 8, such as Mapping EvapoTranspiration with high Resolution and Internalized Calibration (METRIC) and Priestley–Taylor Two-Source Model (TSEB-PT). The RS-PMS model showed the best goodness of fit for all temporal scales evaluated, especially at instantaneous and daily ET, with root mean squared error (RMSE) of 28.9 Wm−2 and 0.52 mm day−1, respectively, and Willmott agreement index (d1) values of 0.77 at instantaneous scale and 0.7 at daily scale. Additionally, both approaches of RS-PM model were evaluated incorporating a soil evaporation estimation method, one considering the soil water content (fSWC) and the other hand, using the ratio of accumulated precipitation and equivalent evaporation (fZhang), achieving the best fit at instantaneous scale for RS-PMS fSWC method with relative root mean squared error (%RMSE) of 15.2% in comparison to 58.8% of fZhang. Finally, the relevance of the RS-PMS model was highlighted in the assessment and monitoring of vineyard drip irrigation in terms of crop coefficient (Kc) estimation, which is one of the methods commonly used in irrigation planning, yielding a comparable Kc to the one obtained by the EC tower with a bias around 9%.


Author(s):  
Chisimkwuo John ◽  
Emmanuel J. Ekpenyong ◽  
Charles C. Nworu

This study assessed five approaches for imputing missing values. The evaluated methods include Singular Value Decomposition Imputation (svdPCA), Bayesian imputation (bPCA), Probabilistic imputation (pPCA), Non-Linear Iterative Partial Least squares imputation (nipalsPCA) and Local Least Squares imputation (llsPCA). A 5%, 10%, 15% and 20% missing data were created under a missing completely at random (MCAR) assumption using five (5) variables (Net Foreign Assets (NFA), Credit to Core Private Sector (CCP), Reserve Money (RM), Narrow Money (M1), Private Sector Demand Deposits (PSDD) from Nigeria quarterly monetary aggregate dataset from 1981 to 2019 using R-software. The data were collected from the Central Bank of Nigeria statistical bulletin. The five imputation methods were used to estimate the artificially generated missing values. The performances of the PCA imputation approaches were evaluated based on the Mean Forecast Error (MFE), Root Mean Squared Error (RMSE) and Normalized Root Mean Squared Error (NRMSE) criteria. The result suggests that the bPCA, llsPCA and pPCA methods performed better than other imputation methods with the bPCA being the more appropriate method and llsPCA, the best method as it appears to be more stable than others in terms of the proportion of missingness.


2018 ◽  
Vol 28 (5) ◽  
pp. 1311-1327 ◽  
Author(s):  
Faisal M Zahid ◽  
Christian Heumann

Missing data is a common issue that can cause problems in estimation and inference in biomedical, epidemiological and social research. Multiple imputation is an increasingly popular approach for handling missing data. In case of a large number of covariates with missing data, existing multiple imputation software packages may not work properly and often produce errors. We propose a multiple imputation algorithm called mispr based on sequential penalized regression models. Each variable with missing values is assumed to have a different distributional form and is imputed with its own imputation model using the ridge penalty. In the case of a large number of predictors with respect to the sample size, the use of a quadratic penalty guarantees unique estimates for the parameters and leads to better predictions than the usual Maximum Likelihood Estimation (MLE), with a good compromise between bias and variance. As a result, the proposed algorithm performs well and provides imputed values that are better even for a large number of covariates with small samples. The results are compared with the existing software packages mice, VIM and Amelia in simulation studies. The missing at random mechanism was the main assumption in the simulation study. The imputation performance of the proposed algorithm is evaluated with mean squared imputation error and mean absolute imputation error. The mean squared error ([Formula: see text]), parameter estimates with their standard errors and confidence intervals are also computed to compare the performance in the regression context. The proposed algorithm is observed to be a good competitor to the existing algorithms, with smaller mean squared imputation error, mean absolute imputation error and mean squared error. The algorithm’s performance becomes considerably better than that of the existing algorithms with increasing number of covariates, especially when the number of predictors is close to or even greater than the sample size. Two real-life datasets are also used to examine the performance of the proposed algorithm using simulations.


2013 ◽  
Vol 594-595 ◽  
pp. 889-895 ◽  
Author(s):  
M.N. Noor ◽  
A.S. Yahaya ◽  
N.A. Ramli ◽  
Abdullah Mohd Mustafa Al Bakri

The presence of missing values in statistical survey data is an important issue to deal with. These data usually contained missing values due to many factors such as machine failures, changes in the siting monitors, routine maintenance and human error. Incomplete data set usually cause bias due to differences between observed and unobserved data. Therefore, it is important to ensure that the data analyzed are of high quality. A straightforward approach to deal with this problem is to ignore the missing data and to discard those incomplete cases from the data set. This approach is generally not valid for time-series prediction, in which the value of a system typically depends on the historical time data of the system. One approach that commonly used for the treatment of this missing item is adoption of imputation technique. This paper discusses three interpolation methods that are linear, quadratic and cubic. A total of 8577 observations of PM10 data for a year were used to compare between the three methods when fitting the Gamma distribution. The goodness-of-fit were obtained using three performance indicators that are mean absolute error (MAE), root mean squared error (RMSE) and coefficient of determination (R2). The results shows that the linear interpolation method provides a very good fit to the data.


2012 ◽  
Vol 61 (2) ◽  
pp. 277-290 ◽  
Author(s):  
Ádám Csorba ◽  
Vince Láng ◽  
László Fenyvesi ◽  
Erika Michéli

Napjainkban egyre nagyobb igény mutatkozik olyan technológiák és módszerek kidolgozására és alkalmazására, melyek lehetővé teszik a gyors, költséghatékony és környezetbarát talajadat-felvételezést és kiértékelést. Ezeknek az igényeknek felel meg a reflektancia spektroszkópia, mely az elektromágneses spektrum látható (VIS) és közeli infravörös (NIR) tartományában (350–2500 nm) végzett reflektancia-mérésekre épül. Figyelembe véve, hogy a talajokról felvett reflektancia spektrum információban nagyon gazdag, és a vizsgált tartományban számos talajalkotó rendelkezik karakterisztikus spektrális „ujjlenyomattal”, egyetlen görbéből lehetővé válik nagyszámú, kulcsfontosságú talajparaméter egyidejű meghatározása. Dolgozatunkban, a reflektancia spektroszkópia alapjaira helyezett, a talajok ösz-szetételének meghatározását célzó módszertani fejlesztés első lépéseit mutatjuk be. Munkánk során talajok szervesszén- és CaCO3-tartalmának megbecslését lehetővé tévő többváltozós matematikai-statisztikai módszerekre (részleges legkisebb négyzetek módszere, partial least squares regression – PLSR) épülő prediktív modellek létrehozását és tesztelését végeztük el. A létrehozott modellek tesztelése során megállapítottuk, hogy az eljárás mindkét talajparaméter esetében magas R2értéket [R2(szerves szén) = 0,815; R2(CaCO3) = 0,907] adott. A becslés pontosságát jelző közepes négyzetes eltérés (root mean squared error – RMSE) érték mindkét paraméter esetében közepesnek mondható [RMSE (szerves szén) = 0,467; RMSE (CaCO3) = 3,508], mely a reflektancia mérési előírások standardizálásával jelentősen javítható. Vizsgálataink alapján arra a következtetésre jutottunk, hogy a reflektancia spektroszkópia és a többváltozós kemometriai eljárások együttes alkalmazásával, gyors és költséghatékony adatfelvételezési és -értékelési módszerhez juthatunk.


2021 ◽  
pp. 1-21
Author(s):  
Elsa Arrua-Duarte ◽  
Marta Migoya-Borja ◽  
Igor Barahona ◽  
Lena C. Quilty ◽  
Sakina J. Rizvi ◽  
...  

Abstract Objective: The Dimensional Anhedonia Rating Scale (DARS) is a novel questionnaire to assess anhedonia of recent validation. In this work we aim to study the equivalence between the traditional paper-and-pencil and the digital format of DARS. Methods: 69 patients filled the DARS in a paper-based and digital versions. We assessed differences between formats (Wilcoxon test), validity of the scales (Kappa and Intraclass Correlation Coefficients), and reliability (Cronbach’s alpha and Guttman’s coefficient). We calculated the Comparative Fit Index and the Root Mean Squared Error associated with the proposed one-factor structure. Results: Total scores were higher for paper-based format. Significant differences between both formats were found for three items. The weighted Kappa coefficient was approximately 0.40 for most of the items. Internal consistency was greater than 0.94, and the Intraclass Correlation Coefficient for the digital version was 0.95 and 0.94 for the paper-and-pencil version (F= 16.7, p < 0.001). Comparative Adjustment Index was 0.97 for the digital DARS and 0.97 for the paper-and-pencil DARS, and Root Mean Squared Error was 0.11 for the digital DARS and 0.10 for the paper-and-pencil DARS. Conclusion: The digital DARS is consistent in many respects to the paper-and-pencil questionnaire, but equivalence with this format cannot be assumed without caution.


2016 ◽  
Vol 9 (2) ◽  
pp. 166
Author(s):  
Majid Golzarpour ◽  
Meroe Vameghi ◽  
Homeira Sajjadi ◽  
Gholamreza Ghaedamini Harouni

<p><strong>BACKGROUND:</strong> Worldwide, much evidence exists on the influence of parents’ socioeconomic conditions, including employment, on children’s health. However, the mechanisms for this affect are still being investigated. Few studies have been conducted in Iran to investigate this issue. This study investigated working conditions, job satisfaction, and mental health of employed people and the association between these variables and their children’s health.<strong></strong></p><p><strong>MATERIALS &amp; METHODS:</strong> In this correlational work, 200 male and female staff of the official part of Educational Organization and the schools of Mashhad with children aged 5-18 years was randomly selected. The data were gathered using a demographic questionnaire, the 20-item Minnesota Job Satisfaction Questionnaire, the 28-item General Health Questionnaire, and the 28-item Child Health Questionnaire. The data were then analyzed using SPSS. The associations under study were investigated by structural equation modeling in AMOS.<strong></strong></p><p><strong>RESULTS:</strong> Approximately 17% of the variation in the parents’ job satisfaction could be explained by the parents’ insurance, income, and work hours; 6% of the variation in their mental health was explained by job satisfaction, and 26% of the variation in children’s health was directly explained by the parents’ job satisfaction and mental health. However, approximately 32.2% of the variation in children’s health could be explained in the light of the direct effect of the parents’ mental health and direct and indirect effects of the parents’ job satisfaction. The goodness of fit index was 0.94.</p><p><strong>CONCLUSION:</strong> Parents’ job satisfaction was associated with and considerably explained children’s health. Although this finding may be partially related to the job satisfaction effect on mental health, the reasons for the affect of job satisfaction on children’s health and the potential mechanisms of this association require further studies.<strong></strong></p>


2018 ◽  
Vol 4 (1) ◽  
pp. 24
Author(s):  
Imam Halimi ◽  
Wahyu Andhyka Kusuma

Investasi saham merupakan hal yang tidak asing didengar maupun dilakukan. Ada berbagai macam saham di Indonesia, salah satunya adalah Indeks Harga Saham Gabungan (IHSG) atau dalam bahasa inggris disebut Indonesia Composite Index, ICI, atau IDX Composite. IHSG merupakan parameter penting yang dipertimbangkan pada saat akan melakukan investasi mengingat IHSG adalah saham gabungan. Penelitian ini bertujuan memprediksi pergerakan IHSG dengan teknik data mining menggunakan algoritma neural network dan dibandingkan dengan algoritma linear regression, yang dapat dijadikan acuan investor saat akan melakukan investasi. Hasil dari penelitian ini berupa nilai Root Mean Squared Error (RMSE) serta label tambahan angka hasil prediksi yang didapatkan setelah dilakukan validasi menggunakan sliding windows validation dengan hasil paling baik yaitu pada pengujian yang menggunakan algoritma neural network yang menggunakan windowing yaitu sebesar 37,786 dan pada pengujian yang tidak menggunakan windowing sebesar 13,597 dan untuk pengujian algoritma linear regression yang menggunakan windowing yaitu sebesar 35,026 dan pengujian yang tidak menggunakan windowing sebesar 12,657. Setelah dilakukan pengujian T-Test menunjukan bahwa pengujian menggunakan neural network yang dibandingkan dengan linear regression memiliki hasil yang tidak signifikan dengan nilai T-Test untuk pengujian dengan windowing dan tanpa windowing hasilnya sama, yaitu sebesar 1,000.


Sign in / Sign up

Export Citation Format

Share Document