scholarly journals Exploratory Graph Analysis for Factor Retention: Simulation Results for Continuous and Binary Data

2021 ◽  
pp. 001316442110590
Author(s):  
Tim Cosemans ◽  
Yves Rosseel ◽  
Sarah Gelper

Exploratory graph analysis (EGA) is a commonly applied technique intended to help social scientists discover latent variables. Yet, the results can be influenced by the methodological decisions the researcher makes along the way. In this article, we focus on the choice regarding the number of factors to retain: We compare the performance of the recently developed EGA with various traditional factor retention criteria. We use both continuous and binary data, as evidence regarding the accuracy of such criteria in the latter case is scarce. Simulation results, based on scenarios resulting from varying sample size, communalities from major factors, interfactor correlations, skewness, and correlation measure, show that EGA outperforms the traditional factor retention criteria considered in most cases in terms of bias and accuracy. In addition, we show that factor retention decisions for binary data are preferably made using Pearson, instead of tetrachoric, correlations, which is contradictory to popular belief.

2019 ◽  
Author(s):  
Hudson Golino ◽  
Robert Glenn Moulder ◽  
Dingjing Shi ◽  
Alexander P. Christensen ◽  
Luis E. Garrido ◽  
...  

The accurate identification of the content and number of latent factors underlying multivariate data is an important endeavor in many areas of Psychology and related fields. Recently, a new dimensionality assessment technique based on network psychometrics was proposed (Exploratory Graph Analysis, EGA), but a measure to check the fit of the dimensionality structure to the data estimated via EGA is still lacking. Although traditional factor-analytic fit measures are widespread, recent research has identified limitations for their effectiveness in categorical variables. Here, we propose three new fit measures (termed entropy fit indices) that combines information theory, quantum information theory and structural analysis: Entropy Fit Index (EFI), EFI with Von Neumman Entropy (EFI.vn) and Total EFI.vn (TEFI.vn). The first can be estimated in complete datasets using Shannon entropy, while EFI.vn and TEFI.vn can be estimated in correlation matrices using quantum information metrics. We show, through several simulations, that TEFI.vn, EFI.vn and EFI are as accurate or more accurate than traditional fit measures when identifying the number of simulated latent factors. However, in conditions where more factors are extracted than the number of factors simulated, only TEFI.vn presents a very high accuracy. In addition, we provide an applied example that demonstrates how the new fit measures can be used with a real-world dataset, using exploratory graph analysis.


Author(s):  
Hongtao Yu ◽  
Reza Langari

This paper presents a data-driven method to detect vehicle problems related to unintended acceleration (UA). A diagnostic system is formulated by analyzing several specific vehicle events such as acceleration peaks and generating corresponding mathematical models. The diagnostic algorithm was implemented in the Simulink/dSpace environment for validation. Major factors that affect vehicles’ acceleration (e.g., changes of road grades and gear shifting) were included in the simulation. UA errors were added randomly when human drivers drove virtual cars. The simulation results show that the algorithm succeeds in detecting abnormal acceleration.


2021 ◽  
Author(s):  
Herdiantri Sufriyana ◽  
Yu Wei Wu ◽  
Emily Chia-Yu Su

Abstract We aimed to provide a resampling protocol for dimensional reduction resulting a few latent variables. The applicability focuses on but not limited for developing a machine learning prediction model in order to improve the number of sample size in relative to the number of candidate predictors. By this feature representation technique, one can improve generalization by preventing latent variables to overfit data used to conduct the dimensional reduction. However, this technique may warrant more computational capacity and time to conduct the procedure. The key stages consisted of derivation of latent variables from multiple resampling subsets, parameter estimation of latent variables in population, and selection of latent variables transformed by the estimated parameters.


2018 ◽  
Vol 7 (6) ◽  
pp. 68
Author(s):  
Karl Schweizer ◽  
Siegbert Reiß ◽  
Stefan Troche

An investigation of the suitability of threshold-based and threshold-free approaches for structural investigations of binary data is reported. Both approaches implicitly establish a relationship between binary data following the binomial distribution on one hand and continuous random variables assuming a normal distribution on the other hand. In two simulation studies we investigated: whether the fit results confirm the establishment of such a relationship, whether the differences between correct and incorrect models are retained and to what degree the sample size influences the results. Both approaches proved to establish the relationship. Using the threshold-free approach it was achieved by customary ML estimation whereas robust ML estimation was necessary in the threshold-based approach. Discrimination between correct and incorrect models was observed for both approaches. Larger CFI differences were found for the threshold-free approach than for the threshold-based approach. Dependency on sample size characterized the threshold-based approach but not the threshold-free approach. The threshold-based approach tended to perform better in large sample sizes, while the threshold-free approach performed better in smaller sample sizes.


2014 ◽  
Vol 9 (2) ◽  
pp. 1-28 ◽  
Author(s):  
Lian Duan ◽  
W. Nick Street ◽  
Yanchi Liu ◽  
Songhua Xu ◽  
Brook Wu

Author(s):  
John H. Perkins

In the years after the end of World War II, farmers, agricultural scientists, and policy makers in many countries all knew, or learned, that higher yields of wheat were what they wanted, and they were successful in achieving them. Their specific motivations were different, but their objectives were not. Not only were the objectives clear, but a central method by which the higher yields were to be achieved was plant breeding. Plant breeding itself was an applied science that had to be nested within organizations that supported it and its allies in the agricultural, biological, and engineering sciences. By 1950 wheat breeders believed that the number of factors governing yield was small, which meant that the research avenues likely to be fruitful were also few in number. The amount of water available and the responsiveness to soil fertility, especially nitrogen, were in most cases the key ingredients for higher yields. For wheat, the ability of the plant to resist invasion by fungal pathogens was almost as important as water and soil fertility. Water and fertility were needed in every crop year, but damage from fungal pathogens varied with weather. Thus plant disease was not necessarily a destructive factor every year. Control of water, soil fertility, and plant disease was therefore at the center of research programs in wheat breeding. A wheat breeder would find success if his or her program produced new varieties that gave higher yields within the context of water, soil fertility, and plant disease existing in the area. Ancillary questions also existed and in some cases matched the major factors in importance. Weed control was always a problem, so high-yielding wheat had to have some capacity to resist competition from weeds. Similarly, in some areas and some years, insects could cause damage. Wheat varieties therefore had to be able to withstand them somehow. Other factors of importance to wheat breeders were habit of growth and the color and quality of the grain. Winter wheats were useful in climates that had winters mild enough to allow planting in the fall and thus higher yields the next summer.


2020 ◽  
Vol 4 (3) ◽  
Author(s):  
Helen C Kline ◽  
Zachary D Weller ◽  
Temple Grandin ◽  
Ryan J Algino ◽  
Lily N Edwards-Callaway

Abstract Livestock bruising is both an animal welfare concern and a detriment to the economic value of carcasses. Understanding the causes of bruising is challenging due to the numerous factors that have been shown to be related to bruise prevalence. While most cattle bruising studies collect and analyze data on truckload lots of cattle, this study followed a large number (n = 585) of individual animals from unloading through postmortem processing at five different slaughter plants. Both visual bruise presence and location was recorded postmortem prior to carcass trimming. By linking postmortem data to animal sex, breed, trailer compartment, and traumatic events at unloading, a rich analysis of a number of factors related to bruise prevalence was developed. Results showed varying levels of agreement with other published bruising studies, underscoring the complexity of assessing the factors that affect bruising. Bruising prevalence varied across different sex class types (P < 0.001); 36.5% of steers [95% confidence interval (CI): 31.7, 41.6; n = 378], 52.8% of cows (45.6, 60.0; 193), and 64.3% of bulls (no CI calculated due to sample size; 14) were bruised. There was a difference in bruise prevalence by trailer compartment (P = 0.035) in potbelly trailers, indicating that cattle transported in the top deck were less likely to be bruised (95% CI: 26.6, 40.4; n = 63) compared to cattle that were transported in the bottom deck (95% CI: 39.6, 54.2; n = 89). Results indicated that visual assessment of bruising underestimated carcass bruise trimming. While 42.6% of the carcasses were visibly bruised, 57.9% of carcasses were trimmed due to bruising, suggesting that visual assessment is not able to capture all of the carcass loss associated with bruising. Furthermore, bruises that appeared small visually were often indicators of larger, subsurface bruising, creating an “iceberg effect” of trim loss due to bruising.


Sign in / Sign up

Export Citation Format

Share Document