scholarly journals Generalized penalty for circular coordinate representation

2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Hengrui Luo ◽  
Alice Patania ◽  
Jisu Kim ◽  
Mikael Vejdemo-Johansson

<p style='text-indent:20px;'>Topological Data Analysis (TDA) provides novel approaches that allow us to analyze the geometrical shapes and topological structures of a dataset. As one important application, TDA can be used for data visualization and dimension reduction. We follow the framework of circular coordinate representation, which allows us to perform dimension reduction and visualization for high-dimensional datasets on a torus using persistent cohomology. In this paper, we propose a method to adapt the circular coordinate framework to take into account the roughness of circular coordinates in change-point and high-dimensional applications. To do that, we use a generalized penalty function instead of an <inline-formula><tex-math id="M1">\begin{document}$ L_{2} $\end{document}</tex-math></inline-formula> penalty in the traditional circular coordinate algorithm. We provide simulation experiments and real data analyses to support our claim that circular coordinates with generalized penalty will detect the change in high-dimensional datasets under different sampling schemes while preserving the topological structures.</p>

2016 ◽  
Vol 29 (8) ◽  
pp. 3049-3056 ◽  
Author(s):  
Daniel S. Wilks

Abstract Principal component analysis (PCA), also known as empirical orthogonal function (EOF) analysis, is widely used for compression of high-dimensional datasets in such applications as climate diagnostics and seasonal forecasting. A critical question when using this method is the number of modes, representing meaningful signal, to retain. The resampling-based “Rule N” method attempts to address the question of PCA truncation in a statistically principled manner. However, it is only valid for the leading (largest) eigenvalue, because it fails to condition the hypothesis tests for subsequent (smaller) eigenvalues on the results of previous tests. This paper draws on several relatively recent statistical results to construct a hypothesis-test-based truncation rule that accounts at each stage for the magnitudes of the larger eigenvalues. The performance of the method is demonstrated in an artificial data setting and illustrated with a real-data example.


2021 ◽  
Vol 15 ◽  
Author(s):  
Louis Kang ◽  
Boyan Xu ◽  
Dmitriy Morozov

Persistent cohomology is a powerful technique for discovering topological structure in data. Strategies for its use in neuroscience are still undergoing development. We comprehensively and rigorously assess its performance in simulated neural recordings of the brain's spatial representation system. Grid, head direction, and conjunctive cell populations each span low-dimensional topological structures embedded in high-dimensional neural activity space. We evaluate the ability for persistent cohomology to discover these structures for different dataset dimensions, variations in spatial tuning, and forms of noise. We quantify its ability to decode simulated animal trajectories contained within these topological structures. We also identify regimes under which mixtures of populations form product topologies that can be detected. Our results reveal how dataset parameters affect the success of topological discovery and suggest principles for applying persistent cohomology, as well as persistent homology, to experimental neural recordings.


Author(s):  
Haoyang Cheng ◽  
Wenquan Cui

Heteroscedasticity often appears in the high-dimensional data analysis. In order to achieve a sparse dimension reduction direction for high-dimensional data with heteroscedasticity, we propose a new sparse sufficient dimension reduction method, called Lasso-PQR. From the candidate matrix derived from the principal quantile regression (PQR) method, we construct a new artificial response variable which is made up from top eigenvectors of the candidate matrix. Then we apply a Lasso regression to obtain sparse dimension reduction directions. While for the “large [Formula: see text] small [Formula: see text]” case that [Formula: see text], we use principal projection to solve the dimension reduction problem in a lower-dimensional subspace and projection back to the original dimension reduction problem. Theoretical properties of the methodology are established. Compared with several existing methods in the simulations and real data analysis, we demonstrate the advantages of our method in the high dimension data with heteroscedasticity.


Stats ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 138-145
Author(s):  
Stephen Babos ◽  
Andreas Artemiou

In this paper, we present the Cumulative Median Estimation (CUMed) algorithm for robust sufficient dimension reduction. Compared with non-robust competitors, this algorithm performs better when there are outliers present in the data and comparably when outliers are not present. This is demonstrated in simulated and real data experiments.


2021 ◽  
Author(s):  
Lajos Horváth ◽  
Zhenya Liu ◽  
Gregory Rice ◽  
Yuqian Zhao

Abstract The problem of detecting change points in the mean of high dimensional panel data with potentially strong cross–sectional dependence is considered. Under the assumption that the cross–sectional dependence is captured by an unknown number of common factors, a new CUSUM type statistic is proposed. We derive its asymptotic properties under three scenarios depending on to what extent the common factors are asymptotically dominant. With panel data consisting of N cross sectional time series of length T, the asymptotic results hold under the mild assumption that min {N, T} → ∞, with an otherwise arbitrary relationship between N and T, allowing the results to apply to most panel data examples. Bootstrap procedures are proposed to approximate the sampling distribution of the test statistics. A Monte Carlo simulation study showed that our test outperforms several other existing tests in finite samples in a number of cases, particularly when N is much larger than T. The practical application of the proposed results are demonstrated with real data applications to detecting and estimating change points in the high dimensional FRED-MD macroeconomic data set.


Author(s):  
Jun Sun ◽  
Lingchen Kong ◽  
Mei Li

With the development of modern science and technology, it is easy to obtain a large number of high-dimensional datasets, which are related but different. Classical unimodel analysis is less likely to capture potential links between the different datasets. Recently, a collaborative regression model based on least square (LS) method for this problem has been proposed. In this paper, we propose a robust collaborative regression based on the least absolute deviation (LAD). We give the statistical interpretation of the LS-collaborative regression and LAD-collaborative regression. Then we design an efficient symmetric Gauss–Seidel-based alternating direction method of multipliers algorithm to solve the two models, which has the global convergence and the Q-linear rate of convergence. Finally we report numerical experiments to illustrate the efficiency of the proposed methods.


Sign in / Sign up

Export Citation Format

Share Document