scholarly journals Building Latent Class Trees, With an Application to a Study of Social Capital

Methodology ◽  
2017 ◽  
Vol 13 (Supplement 1) ◽  
pp. 13-22 ◽  
Author(s):  
Mattis van den Bergh ◽  
Verena D. Schmittmann ◽  
Jeroen K. Vermunt

Abstract. Researchers use latent class (LC) analysis to derive meaningful clusters from sets of categorical variables. However, especially when the number of classes required to obtain a good fit is large, interpretation of the latent classes may not be straightforward. To overcome this problem, we propose an alternative way of performing LC analysis, Latent Class Tree (LCT) modeling. For this purpose, a recursive partitioning procedure similar to divisive hierarchical cluster analysis is used: classes are split until a certain criterion indicates that the fit does not improve. The advantage of the LCT approach compared to the standard LC approach is that it gives a clear insight into how the latent classes are formed and how solutions with different numbers of classes relate. We also propose measures to evaluate the relative importance of the splits. The practical use of the approach is illustrated by the analysis of a data set on social capital.

2018 ◽  
Vol 48 (1) ◽  
pp. 303-336 ◽  
Author(s):  
Mattis van den Bergh ◽  
Geert H. van Kollenburg ◽  
Jeroen K. Vermunt

In recent studies, latent class tree (LCT) modeling has been proposed as a convenient alternative to standard latent class (LC) analysis. Instead of using an estimation method in which all classes are formed simultaneously given the specified number of classes, in LCT analysis a hierarchical structure of mutually linked classes is obtained by sequentially splitting classes into two subclasses. The resulting tree structure gives a clear insight into how the classes are formed and how solutions with different numbers of classes are substantively linked to one another. A limitation of the current LCT modeling approach is that it allows only for binary splits, which in certain situations may be too restrictive. Especially at the root node of the tree, where an initial set of classes is created based on the most dominant associations present in the data, it may make sense to use a model with more than two classes. In this article, we propose a modification of the LCT approach that allows for a nonbinary split at the root node, and we provide methods to determine the appropriate number of classes in this first split, based either on theoretical grounds or on a relative improvement of fit measure. This novel approach also can be seen as a hybrid of a standard LC model and a binary LCT model, in which an initial, oversimplified but interpretable model is refined using an LCT approach. Furthermore, we show how to apply an LCT model when a nonstandard LC model is required. These new approaches are illustrated using two empirical applications: one on social capital and the other on (post)materialism.


2021 ◽  
Vol 11 (8) ◽  
pp. 385
Author(s):  
Carmen Gloria Burgos-Videla ◽  
Wilson Andrés Castillo Rojas ◽  
Eloy López Meneses ◽  
Javiera Martínez

The objective of this study is to characterize Latent Classes emerging from the analysis of the level of digital competences, use and consumption of applications and/or services through the Internet. For this purpose, the results of the survey Basic Digital Competences (Competencias Básicas Digitales-COBADI®) applied to university students, with more than 60 categorical variables, were considered. A total of 4762 undergraduate and graduate students from five Spanish universities participated in this survey: Complutense University of Madrid (UCM), Pablo de Olavide University (UPO), Almeria University (UAL), National University of Distance Education (UNED) and Rey Juan Carlos University (URJC). The application of the questionnaire was done through the Internet, from the Institute for Research in Social Sciences and Education of University of Atacama—Chile. The methodology used is mixed, because the questions of the questionnaire provide qualitative information that can be interpreted and elaborated from the results. It is also quantitative because basic statistical techniques are used for the exploratory analysis of the data, and later Latent Class Analysis (LCA), to complement the description of the data set and the variables considered in the study, thus allowing us to group the classes of variables that do not appear explicitly in the set of observed variables, but which nevertheless affect them. The results of the study show that regardless of the gender and age range of the participants, there are four clearly differentiated groups or classes in the use and consumption of ICTs in different ways for their activities, both personal and academic, which allows for identifying different developments of digital competences. This study allows establishing a baseline in order to be able to elaborate later, in the development of the digital competences currently needed, which should be developed by university students.


2020 ◽  
Vol 29 (11) ◽  
pp. 3294-3307
Author(s):  
Eleni-Rosalina Andrinopoulou ◽  
Kazem Nasserinejad ◽  
Rhonda Szczesniak ◽  
Dimitris Rizopoulos

Cystic fibrosis is a chronic lung disease requiring frequent lung-function monitoring to track acute respiratory events (pulmonary exacerbations). The association between lung-function trajectory and time-to-first exacerbation can be characterized using joint longitudinal-survival modeling. Joint models specified through the shared parameter framework quantify the strength of association between such outcomes but do not incorporate latent sub-populations reflective of heterogeneous disease progression. Conversely, latent class joint models explicitly postulate the existence of sub-populations but do not directly quantify the strength of association. Furthermore, choosing the optimal number of classes using established metrics like deviance information criterion is computationally intensive in complex models. To overcome these limitations, we integrate latent classes in the shared parameter joint model through a fully Bayesian approach. To choose the optimal number of classes, we construct a mixture model assuming more latent classes than present in the data, thereby asymptotically “emptying” superfluous latent classes, provided the Dirichlet prior on class proportions is sufficiently uninformative. Model properties are evaluated in simulation studies. Application to data from the US Cystic Fibrosis Registry supports the existence of three sub-populations corresponding to lung-function trajectories with high initial forced expiratory volume in 1 s ( FEV1), rapid FEV1 decline, and low but steady FEV1 progression. The association between FEV1 and hazard of exacerbation was negative in each class, but magnitude varied.


2021 ◽  
Vol 16 (4) ◽  
pp. 485-499
Author(s):  
M. Nowakowska ◽  
M. Pajecki

The objective of the analysis is identifying profiles of occupational accident casualties as regards production companies to provide the necessary knowledge to facilitate the preparation and management of a safe work environment. Qualitative data characterizing employees injured in accidents registered in Polish wood processing plants over a period of 10 years were the subject of the research. The latent class analysis (LCA) method was employed in the investigation. This statistical modelling technique, based on the values of selected indicators (observed variables) divides the data set into separate groups, called latent classes, which enable the definition of patterns. A procedure which supports the decision as regards the number of classes was presented. The procedure considers the quality of the LCA model and the distinguishability of the classes. Moreover, a method of assessing the importance of indicators in the patterns description was proposed. Seven latent classes were obtained and illustrated by the heat map, which enabled the profiles identification. They were labelled as follows: very serious, serious, moderate, minor (three latent classes), slight. Some recommendations were made regarding the circumstances of occupational accidents with the most severe consequences for the casualties.


2021 ◽  
Vol 6 ◽  
Author(s):  
Grant B. Morgan ◽  
R. Noah Padgett

Person-centered methodologies generally refer to those that take unobserved heterogeneity of populations into account. The use of person-centered methodologies has proliferated, which is likely due to a number of factors, such as methodological advances coupled with increased personal computing power and ease of software use. Using latent class analysis and its extension for longitudinal data, [latent transition analysis (LTA)], multiple underlying, homogeneous subgroups can be inferred from a set of categorical and/or continuous observed variables within a large heterogeneous data set. Such analyses allow researchers to statistically treat members of different subgroups separately, which may provide researchers with more power to detect effects of interest and closer alignment between statistical modeling and one’s guiding theory. For many educational and psychological settings, the hierarchical structure of organizational data must also be taken into account; for example, students (i.e., level-1 units) are nested within teacher/schools (i.e., level-2 units). Finally, multilevel LTA can be used to estimate the number of latent classes in each structured unit and the potential movement, or transitions, participants make between latent classes across time. The transitions/stability between latent classes across time can be treated as the outcome in and of itself, or the transitions/stability can be used as a correlate or predictor of some other, distal outcome. The purpose of the paper is to discuss multilevel LTA, provide considerations for its use, and demonstrate variance decomposition, which requires numerous steps. The variance decomposition steps are presented didactically along with a worked example based on analysis from the Social Rating Scale of ECLS-K.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Gregoire Preud’homme ◽  
Kevin Duarte ◽  
Kevin Dalleau ◽  
Claire Lacomblez ◽  
Emmanuel Bresso ◽  
...  

AbstractThe choice of the most appropriate unsupervised machine-learning method for “heterogeneous” or “mixed” data, i.e. with both continuous and categorical variables, can be challenging. Our aim was to examine the performance of various clustering strategies for mixed data using both simulated and real-life data. We conducted a benchmark analysis of “ready-to-use” tools in R comparing 4 model-based (Kamila algorithm, Latent Class Analysis, Latent Class Model [LCM] and Clustering by Mixture Modeling) and 5 distance/dissimilarity-based (Gower distance or Unsupervised Extra Trees dissimilarity followed by hierarchical clustering or Partitioning Around Medoids, K-prototypes) clustering methods. Clustering performances were assessed by Adjusted Rand Index (ARI) on 1000 generated virtual populations consisting of mixed variables using 7 scenarios with varying population sizes, number of clusters, number of continuous and categorical variables, proportions of relevant (non-noisy) variables and degree of variable relevance (low, mild, high). Clustering methods were then applied on the EPHESUS randomized clinical trial data (a heart failure trial evaluating the effect of eplerenone) allowing to illustrate the differences between different clustering techniques. The simulations revealed the dominance of K-prototypes, Kamila and LCM models over all other methods. Overall, methods using dissimilarity matrices in classical algorithms such as Partitioning Around Medoids and Hierarchical Clustering had a lower ARI compared to model-based methods in all scenarios. When applying clustering methods to a real-life clinical dataset, LCM showed promising results with regard to differences in (1) clinical profiles across clusters, (2) prognostic performance (highest C-index) and (3) identification of patient subgroups with substantial treatment benefit. The present findings suggest key differences in clustering performance between the tested algorithms (limited to tools readily available in R). In most of the tested scenarios, model-based methods (in particular the Kamila and LCM packages) and K-prototypes typically performed best in the setting of heterogeneous data.


2021 ◽  
pp. 088626052199912
Author(s):  
Valdemir Ferreira-Junior ◽  
Juliana Y. Valente ◽  
Zila M. Sanchez

Although many studies addressed bullying occurrence and its associations, they often use individual variables constructed from few items that probably are inadequate to evaluate bullying severity and type. We aimed to identify involvement patterns in bullying victimization and perpetration, and its association with alcohol use, school performance, and sociodemographic variables. Baseline assessment of a randomized controlled trial were used and a latent class analysis was conducted to identify bullying patterns among 1,742 fifth-grade and 2,316 seventh-grade students from 30 public schools in São Paulo, Brazil. Data were collected using an anonymous self-reported, audio-guided questionnaire completed by the participants on smartphones. Multinomial logistic regressions were performed to verify how covariant variables affected bullying latent classes. Both grades presented the same four latent classes: low bullying, moderate bullying victimization, high bullying victimization, and high bullying victimization and perpetration. Alcohol use was associated with all bullying classes in both grades, with odds ratio up to 5.36 (95% CI 3.05; 10.38) among fifth graders from the high bullying victimization and perpetration class. Poor school performance was also strongly associated with this class (aOR = 10.12, 95%CI = 4.19; 24.41). Black/brown 5th graders were 3.35 times more likely to fit into the high bullying victimization class (95% CI 1.34; 8.37). Lack of evidence for association of sociodemographic variables and bullying latent class among seventh-grade students was found. Bullying and alcohol use are highly harmful behaviors that must be prevented. However, prevention programs should consider how racial and gender issues are influencing the way students experience violence.


Sensors ◽  
2021 ◽  
Vol 21 (10) ◽  
pp. 3410
Author(s):  
Claudia Malzer ◽  
Marcus Baum

High-resolution automotive radar sensors play an increasing role in detection, classification and tracking of moving objects in traffic scenes. Clustering is frequently used to group detection points in this context. However, this is a particularly challenging task due to variations in number and density of available data points across different scans. Modified versions of the density-based clustering method DBSCAN have mostly been used so far, while hierarchical approaches are rarely considered. In this article, we explore the applicability of HDBSCAN, a hierarchical DBSCAN variant, for clustering radar measurements. To improve results achieved by its unsupervised version, we propose the use of cluster-level constraints based on aggregated background information from cluster candidates. Further, we propose the application of a distance threshold to avoid selection of small clusters at low hierarchy levels. Based on exemplary traffic scenes from nuScenes, a publicly available autonomous driving data set, we test our constraint-based approach along with other methods, including label-based semi-supervised HDBSCAN. Our experiments demonstrate that cluster-level constraints help to adjust HDBSCAN to the given application context and can therefore achieve considerably better results than the unsupervised method. However, the approach requires carefully selected constraint criteria that can be difficult to choose in constantly changing environments.


2018 ◽  
pp. 130-155
Author(s):  
Fozia Munir ◽  
Mirajul Haq ◽  
Syed Nisar Hussain Hamadani

Maximization of wellbeing is the exceedingly targeted objective that conventional economics going forward. Keeping in view its central place, economists developed well-structured models and tools in order to measure and investigate wellbeing. In received literature, on the subject, various factors have been investigated that affecting wellbeing. However, wellbeing which is viewed from different approaches and is of a different form is not shaping equally with different types of factors. In this context, this study is an attempt to investigate how subjective wellbeing is affecting by social capital. The basic hypothesis is that “individual wellbeing moves parallel with its social capital”. The hypothesis is empirically tested using primary data set of 848 individuals collecting form Azad Jammu and Kashmir (Pakistan). The empirical estimates indicate that keeping other factors constant, an individual that embodied more social capital enjoy more wellbeing in their life. JEL Classification: B24, I30, C43


2018 ◽  
Vol 69 (1) ◽  
pp. 3-23 ◽  
Author(s):  
Calonie M. K. Gray

With the U.S. adult education system providing education services to millions of immigrants annually, understanding the unique skills and assets among adult immigrant learners is important. Using data from the U.S. Program for the International Assessment of Adult Competencies, this study used data on immigrants ( n = 1,873) to identify latent classes along dimensions of human and social capital. Latent class analysis indicated five discrete profiles: High Opportunity, Upskill Ready, Satisfactorily Skilled, Motivated and Engaged, and Highly Skilled. The results provide support for using customized education approaches to capitalize on the collection of assets adult learners have while concurrently increasing education service providers’ capacity to serve.


Sign in / Sign up

Export Citation Format

Share Document