scholarly journals Normality Testing of High-Dimensional Data Based on Principle Component and Jarque-Bera Statistics

Author(s):  
Ya-nan Song ◽  
Xuejing Zhao

The testing of high-dimensional normality has been an important issue and has been intensively studied in literatures, it depends on the Variance-Covariance matrix of the sample, numerous methods have been proposed to reduce the complex of the Variance-Covariance matrix. The principle component analysis(PCA) was widely used since it can project the high-dimensional data into lower dimensional orthogonal space, and the normality of the reduced data can be evaluated by Jarque-Bera(JB) statistic on each principle direction. We propose two combined statistics, the summation and the maximum of one-way JB statistics, upon the independency of each principle direction, to test the multivariate normality of data in high dimensions. The performance of the proposed methods is illustrated by the empirical power of the simulated data of normal data and non-normal data. Two real examples show the validity of our proposed methods.

Author(s):  
Ya-nan Song ◽  
Xuejing Zhao

The testing of high-dimensional normality has been an important issue and has been intensively studied in literatures, it depends on the Variance-Covariance matrix of the sample, numerous methods have been proposed to reduce the complex of the Variance-Covariance matrix. The principle component analysis(PCA) was widely used since it can project the high-dimensional data into lower dimensional orthogonal space, and the normality of the reduced data can be evaluated by Jarque-Bera(JB) statistic on each principle direction. We propose two combined statistics, the summation and the maximum of one-way JB statistics, upon the independency of each principle direction, to test the multivariate normality of data in high dimensions. The performance of the proposed methods is illustrated by the empirical power of the simulated data of normal data and non-normal data. Two real examples show the validity of our proposed methods.


Stats ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 216-227
Author(s):  
Yanan Song ◽  
Xuejing Zhao

The testing of high-dimensional normality is an important issue and has been intensively studied in the literature, it depends on the variance–covariance matrix of the sample and numerous methods have been proposed to reduce its complexity. Principle component analysis (PCA) has been widely used in high dimensions, since it can project high-dimensional data into a lower-dimensional orthogonal space. The normality of the reduced data can then be evaluated by Jarque–Bera (JB) statistics in each principle direction. We propose a combined test statistic—the summation of one-way JB statistics upon the independence of the principle directions—to test the multivariate normality of data in high dimensions. The performance of the proposed method is illustrated by the empirical power of the simulated normal and non-normal data. Two real data examples show the validity of our proposed method.


2021 ◽  
pp. 1471082X2110410
Author(s):  
Elena Tuzhilina ◽  
Leonardo Tozzi ◽  
Trevor Hastie

Canonical correlation analysis (CCA) is a technique for measuring the association between two multivariate data matrices. A regularized modification of canonical correlation analysis (RCCA) which imposes an [Formula: see text] penalty on the CCA coefficients is widely used in applications with high-dimensional data. One limitation of such regularization is that it ignores any data structure, treating all the features equally, which can be ill-suited for some applications. In this article we introduce several approaches to regularizing CCA that take the underlying data structure into account. In particular, the proposed group regularized canonical correlation analysis (GRCCA) is useful when the variables are correlated in groups. We illustrate some computational strategies to avoid excessive computations with regularized CCA in high dimensions. We demonstrate the application of these methods in our motivating application from neuroscience, as well as in a small simulation example.


Author(s):  
Guangzhu Guangzhu Yu ◽  
Shihuang Shao ◽  
Bin Luo ◽  
Xianhui Zeng

Existing algorithms for high-utility itemsets mining are column enumeration based, adopting an Apriorilike candidate set generation-and-test approach, and thus are inadequate in datasets with high dimensions or long patterns. To solve the problem, this paper proposed a hybrid model and a row enumerationbased algorithm, i.e., Inter-transaction, to discover high-utility itemsets from two directions: an existing algorithm can be used to seek short high-utility itemsets from the bottom, while Inter-transaction can be used to seek long high-utility itemsets from the top. Inter-transaction makes full use of the characteristic that there are few common items between or among long transactions. By intersecting relevant transactions, the new algorithm can identify long high-utility itemsets, without extending short itemsets step by step. In addition, we also developed new pruning strategies and an optimization technique to improve the performance of Inter-transaction.


Sign in / Sign up

Export Citation Format

Share Document