scholarly journals Detection of Representative Variables in Complex Systems with Interpretable Rules Using Core-Clusters

Algorithms ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 66
Author(s):  
Camille Champion ◽  
Anne-Claire Brunet ◽  
Rémy Burcelin ◽  
Jean-Michel Loubes ◽  
Laurent Risser

In this paper, we present a new framework dedicated to the robust detection of representative variables in high dimensional spaces with a potentially limited number of observations. Representative variables are selected by using an original regularization strategy: they are the center of specific variable clusters, denoted CORE-clusters, which respect fully interpretable constraints. Each CORE-cluster indeed contains more than a predefined amount of variables and each pair of its variables has a coherent behavior in the observed data. The key advantage of our regularization strategy is therefore that it only requires to tune two intuitive parameters: the minimal dimension of the CORE-clusters and the minimum level of similarity which gathers their variables. Interpreting the role played by a selected representative variable is additionally obvious as it has a similar observed behaviour as a controlled number of other variables. After introducing and justifying this variable selection formalism, we propose two algorithmic strategies to detect the CORE-clusters, one of them scaling particularly well to high-dimensional data. Results obtained on synthetic as well as real data are finally presented.

2021 ◽  
Author(s):  
Lajos Horváth ◽  
Zhenya Liu ◽  
Gregory Rice ◽  
Yuqian Zhao

Abstract The problem of detecting change points in the mean of high dimensional panel data with potentially strong cross–sectional dependence is considered. Under the assumption that the cross–sectional dependence is captured by an unknown number of common factors, a new CUSUM type statistic is proposed. We derive its asymptotic properties under three scenarios depending on to what extent the common factors are asymptotically dominant. With panel data consisting of N cross sectional time series of length T, the asymptotic results hold under the mild assumption that min {N, T} → ∞, with an otherwise arbitrary relationship between N and T, allowing the results to apply to most panel data examples. Bootstrap procedures are proposed to approximate the sampling distribution of the test statistics. A Monte Carlo simulation study showed that our test outperforms several other existing tests in finite samples in a number of cases, particularly when N is much larger than T. The practical application of the proposed results are demonstrated with real data applications to detecting and estimating change points in the high dimensional FRED-MD macroeconomic data set.


2018 ◽  
Vol 30 (12) ◽  
pp. 3281-3308
Author(s):  
Hong Zhu ◽  
Li-Zhi Liao ◽  
Michael K. Ng

We study a multi-instance (MI) learning dimensionality-reduction algorithm through sparsity and orthogonality, which is especially useful for high-dimensional MI data sets. We develop a novel algorithm to handle both sparsity and orthogonality constraints that existing methods do not handle well simultaneously. Our main idea is to formulate an optimization problem where the sparse term appears in the objective function and the orthogonality term is formed as a constraint. The resulting optimization problem can be solved by using approximate augmented Lagrangian iterations as the outer loop and inertial proximal alternating linearized minimization (iPALM) iterations as the inner loop. The main advantage of this method is that both sparsity and orthogonality can be satisfied in the proposed algorithm. We show the global convergence of the proposed iterative algorithm. We also demonstrate that the proposed algorithm can achieve high sparsity and orthogonality requirements, which are very important for dimensionality reduction. Experimental results on both synthetic and real data sets show that the proposed algorithm can obtain learning performance comparable to that of other tested MI learning algorithms.


2018 ◽  
Vol 8 (2) ◽  
pp. 377-406
Author(s):  
Almog Lahav ◽  
Ronen Talmon ◽  
Yuval Kluger

Abstract A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of meaningfulness is more prominent (e.g. the Euclidean distance between images). In this paper, we propose to exploit a property of high-dimensional data that is usually ignored, which is the structure stemming from the relationships between the coordinates. Specifically, we show that organizing similar coordinates in clusters can be exploited for the construction of the Mahalanobis distance between samples. When the observable samples are generated by a nonlinear transformation of hidden variables, the Mahalanobis distance allows the recovery of the Euclidean distances in the hidden space. We illustrate the advantage of our approach on a synthetic example where the discovery of clusters of correlated coordinates improves the estimation of the principal directions of the samples. Our method was applied to real data of gene expression for lung adenocarcinomas (lung cancer). By using the proposed metric we found a partition of subjects to risk groups with a good separation between their Kaplan–Meier survival plot.


2020 ◽  
Author(s):  
André L. Samson ◽  
Cheree Fitzgibbon ◽  
Komal M. Patel ◽  
Joanne M. Hildebrand ◽  
Lachlan W. Whitehead ◽  
...  

ABSTRACTNecroptosis is a lytic, inflammatory cell death pathway that is dysregulated in many human pathologies. The pathway is executed by a core machinery comprising the RIPK1 and RIPK3 kinases, which assemble into necrosomes in the cytoplasm, and the terminal effector pseudokinase, MLKL. RIPK3-mediated phosphorylation of MLKL induces oligomerization and translocation to the plasma membrane where MLKL accumulates as hotspots and perturbs the lipid bilayer to cause death. The precise choreography of events in the pathway, where they occur within cells, and pathway differences between species, are of immense interest. However, they have been poorly characterized due to a dearth of validated antibodies for microscopy studies. Here, we describe a toolbox of antibodies for immunofluorescent detection of the core necroptosis effectors, RIPK1, RIPK3 and MLKL, and their phosphorylated forms, in human and mouse cells. By comparing reactivity with endogenous proteins in wild-type cells and knockout controls in basal and necroptosis-inducing conditions, we characterise the specificity of frequently-used commercial and recently-developed antibodies for detection of necroptosis signaling events. Importantly, our findings demonstrate that not all frequently-used antibodies are suitable for monitoring necroptosis by immunofluorescence microscopy, and methanol-is preferable to paraformaldehyde-fixation for robust detection of specific RIPK1, RIPK3 and MLKL signals.


Biometrika ◽  
2021 ◽  
Author(s):  
Pixu Shi ◽  
Yuchen Zhou ◽  
Anru R Zhang

Abstract In microbiome and genomic studies, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth, the classic log-contrast model is often used where read counts are normalized into compositions. However, zero read counts and the randomness in covariates remain critical issues. In this article, we introduce a surprisingly simple, interpretable, and efficient method for the estimation of compositional data regression through the lens of a novel high-dimensional log-error-in-variable regression model. The proposed method provides both corrections on sequencing data with possible overdispersion and simultaneously avoids any subjective imputation of zero read counts. We provide theoretical justifications with matching upper and lower bounds for the estimation error. The merit of the procedure is illustrated through real data analysis and simulation studies.


2021 ◽  
Author(s):  
Todd Guth ◽  
Yoon Soo Park ◽  
Janice Hanson ◽  
Rachel Yudkowsky

Abstract Background The Core Physical Exam (CPE) has been proposed as a set of key physical exam (PE) items for teaching and assessing PE skills in medical students, and as the basis of a Core + Cluster curriculum. Beyond the initial development of the CPE and proposal of the CPE and the Core + Cluster curriculum, no additional validity evidence has been presented for use of the CPE to teach or assess PE skills of medical students. As a result, a modified version of the CPE was developed by faculty at the University of Colorado School of Medicine (UCSOM) and implemented in the school’s clinical skills course in the context of an evolving Core + Cluster curriculum. Methods Validity evidence for the 25-item University of Colorado School of Medicine (UCSOM) CPE was analyzed using longitudinal assessment data from 366 medical students (Classes of 2019 and 2020), obtained from September 2015 through December 2019. Using Messick's unified validity framework, validity evidence specific to content, response process, internal structure, relationship to other variables, and consequences was gathered. Results Content and response process validity evidence included expert content review and rater training. For internal structure, a generalizability study phi coefficient of 0.258 suggests low reliability for a single assessment due to variability in learner performance by occasion and CPE items. Correlations of performance on the UCSOM CPE with other PE assessments were low, ranging from .00-.34. Consequences were explored through determination of a pass-fail cut score. Following a modified Angoff process, clinical skills course directors selected a consensus pass-fail cut score of 80% as a defensible and practical threshold for entry into precepted clinical experiences. Conclusions Validity evidence supports the use of the UCSOM CPE as an instructional strategy for teaching PE skills and as a formative assessment of readiness for precepted clinical experiences. The low generalizability coefficient suggests that inferences about PE skills based on the UCSOM CPE alone should be made with caution, and that the UCSOM CPE in isolation should be used primarily as a formative assessment.


2021 ◽  
Vol 108 (1) ◽  
pp. 25-33
Author(s):  
Matthew Clauhs ◽  
Bryan Powell

The National Coalition for Core Arts Standards released standards for music education in 2014. These standards are guided by artistic processes and measured by performance standards specific to content areas and grade levels. As school districts in the United States adopt the Core Arts Standards for their music programs, it is imperative that modern band teachers demonstrate how their curriculum aligns with this new framework. Modern band is one approach to popular music education that is particularly well suited to address this new framework; the emphases of songwriting, improvising, critical listening, and group work in a learner-centered modern band class/ensemble are associated with a wide variety of standards. This article explores connections between popular music pedagogies and each of the processes in the Core Arts Standards and examines which standards may be most appropriate for modern band contexts.


Biometrika ◽  
2020 ◽  
Author(s):  
X Guo ◽  
C Y Tang

Summary We consider testing the covariance structure in statistical models. We focus on developing such tests when the random vectors of interest are not directly observable and have to be derived via estimated models. Additionally, the covariance specification may involve extra nuisance parameters which also need to be estimated. In a generic additive model setting, we develop and investigate test statistics based on the maximum discrepancy measure calculated from the residuals. To approximate the distributions of the test statistics under the null hypothesis, new multiplier bootstrap procedures with dedicated adjustments that incorporate the model and nuisance parameter estimation errors are proposed. Our theoretical development elucidates the impact due to the estimation errors with high-dimensional data and demonstrates the validity of our tests. Simulations and real data examples confirm our theory and demonstrate the performance of the proposed tests.


2011 ◽  
Vol 23 (6) ◽  
pp. 1605-1622 ◽  
Author(s):  
Lingyan Ruan ◽  
Ming Yuan ◽  
Hui Zou

Finite gaussian mixture models are widely used in statistics thanks to their great flexibility. However, parameter estimation for gaussian mixture models with high dimensionality can be challenging because of the large number of parameters that need to be estimated. In this letter, we propose a penalized likelihood estimator to address this difficulty. The [Formula: see text]-type penalty we impose on the inverse covariance matrices encourages sparsity on its entries and therefore helps to reduce the effective dimensionality of the problem. We show that the proposed estimate can be efficiently computed using an expectation-maximization algorithm. To illustrate the practical merits of the proposed method, we consider its applications in model-based clustering and mixture discriminant analysis. Numerical experiments with both simulated and real data show that the new method is a valuable tool for high-dimensional data analysis.


2012 ◽  
Vol 49 (11) ◽  
pp. 1227-1243 ◽  
Author(s):  
Jayantha Kodikara

Volumetric behaviour is a fundamental consideration in unsaturated soil constitutive modelling. It is more complex than when the soil is saturated, as unsaturated soils exhibit a range of responses such as swelling and collapse under wetting and shrinkage and cracking during drying. While significant advances have been made, it is still difficult to generally explain all patterns of behaviour. This paper presents a new framework for modelling volumetric response of unsaturated soils with emphasis on compacted soils. The framework uses void ratio (e), moisture ratio (ew), and net stress (p) as the main constitutive variables and suction as a dependent variable. This choice of ew as a main constitutive variable is theoretically sound and is more attractive than the use of suction, which is relatively difficult to measure and displays significant hysteresis during drying and wetting. The framework incorporates the well-known compaction curve making it easily applicable to practical situations. Within the overall e–ew–p space, the operative space is constrained by three main surfaces; namely, loading–wetting state boundary surface, tensile failure surface, and the saturated plane. The conceptual basis for these state surfaces is described and the framework is qualitatively validated against observed behaviour of compacted soils.


Sign in / Sign up

Export Citation Format

Share Document