scholarly journals Penalized orthogonal-components regression for large p small n data

2009 ◽  
Vol 3 (0) ◽  
pp. 781-796 ◽  
Author(s):  
Dabao Zhang ◽  
Yanzhu Lin ◽  
Min Zhang
2013 ◽  
Vol 7 (0) ◽  
pp. 2005-2031 ◽  
Author(s):  
Piercesare Secchi ◽  
Aymeric Stamm ◽  
Simone Vantini

2009 ◽  
Vol 20 (3) ◽  
pp. 381-392 ◽  
Author(s):  
Eun-Kyung Lee ◽  
Dianne Cook

Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1493
Author(s):  
Y-h. Taguchi ◽  
Turki Turki

The large p small n problem is a challenge without a de facto standard method available to it. In this study, we propose a tensor-decomposition (TD)-based unsupervised feature extraction (FE) formalism applied to multiomics datasets, in which the number of features is more than 100,000 whereas the number of samples is as small as about 100, hence constituting a typical large p small n problem. The proposed TD-based unsupervised FE outperformed other conventional supervised feature selection methods, random forest, categorical regression (also known as analysis of variance, or ANOVA), penalized linear discriminant analysis, and two unsupervised methods, multiple non-negative matrix factorization and principal component analysis (PCA) based unsupervised FE when applied to synthetic datasets and four methods other than PCA based unsupervised FE when applied to multiomics datasets. The genes selected by TD-based unsupervised FE were enriched in genes known to be related to tissues and transcription factors measured. TD-based unsupervised FE was demonstrated to be not only the superior feature selection method but also the method that can select biologically reliable genes. To our knowledge, this is the first study in which TD-based unsupervised FE has been successfully applied to the integration of this variety of multiomics measurements.


2020 ◽  
Author(s):  
Y-h. Taguchi ◽  
Turki Turki

In this work, we extended the recently developed tensor decomposition (TD) based unsupervised feature extraction (FE) to a kernel-based method through a mathematical formulation. Subsequently, the kernel TD (KTD) based unsupervised FE was applied to two synthetic examples as well as real data sets, and the findings were compared with those obtained previously using the TD-based unsupervised FE approaches. The KTD-based unsupervised FE outperformed or performed comparably with the TD-based unsupervised FE in large p small n situations, which are situations involving a limited number of samples with many variables (observations). Nevertheless, the KTD-based unsupervised FE outperformed the TD-based unsupervised FE in non large p small n situations. In general, although the use of the kernel trick can help the TD-based unsupervised FE gain more variations, a wider range of problems may also be encountered. Considering the outperformance or comparable performance of the KTD-based unsupervised FE compared to the TD-based unsupervised FE when applied to large p small n problems, it is expected that the KTD-based unsupervised FE can be applied in the genomic science domain, which involves many large p small n problems, and, in which, the TD-based unsupervised FE approach has been effectively applied.


Sign in / Sign up

Export Citation Format

Share Document