A Multi-Linear Statistical Method for Discriminant Analysis of 2D Frontal Face Images

This chapter describes a multi-linear discriminant method of constructing and quantifying statistically significant changes on human identity photographs. The approach is based on a general multivariate two-stage linear framework that addresses the small sample size problem in high-dimensional spaces. Starting with a 2D data set of frontal face images, the authors determine a most characteristic direction of change by organizing the data according to the patterns of interest. These experiments on publicly available face image sets show that the multi-linear approach does produce visually plausible results for gender, facial expression and aging facial changes in a simple and efficient way. The authors believe that such approach could be widely applied for modeling and reconstruction in face recognition and possibly in identifying subjects after a lapse of time.

Download Full-text

-Plot for Testing Spherical Symmetry for High-Dimensional Data with a Small Sample Size

Journal of Probability and Statistics ◽

10.1155/2012/728565 ◽

2012 ◽

Vol 2012 ◽

pp. 1-18

Author(s):

Jiajuan Liang

Keyword(s):

Sample Size ◽

Spherical Symmetry ◽

Graphical Method ◽

Small Sample Size ◽

High Dimensional Data ◽

Monte Carlo Study ◽

Real Data ◽

Small Sample ◽

High Dimensional ◽

Data Set

High-dimensional data with a small sample size, such as microarray data and image data, are commonly encountered in some practical problems for which many variables have to be measured but it is too costly or time consuming to repeat the measurements for many times. Analysis of this kind of data poses a great challenge for statisticians. In this paper, we develop a new graphical method for testing spherical symmetry that is especially suitable for high-dimensional data with small sample size. The new graphical method associated with the local acceptance regions can provide a quick visual perception on the assumption of spherical symmetry. The performance of the new graphical method is demonstrated by a Monte Carlo study and illustrated by a real data set.

Download Full-text

Risk of Selection of Irrelevant Features from High-Dimensional Data with Small Sample Size

Springer Proceedings in Mathematics & Statistics - Stochastic Models, Statistics and Their Applications ◽

10.1007/978-3-319-13881-7_44 ◽

2015 ◽

pp. 399-405

Author(s):

Henryk Maciejewski

Keyword(s):

Sample Size ◽

Small Sample Size ◽

High Dimensional Data ◽

Small Sample ◽

High Dimensional ◽

Selection Of

Download Full-text

Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition

Pattern Recognition Letters ◽

10.1016/j.patrec.2004.09.014 ◽

2005 ◽

Vol 26 (2) ◽

pp. 181-191 ◽

Cited By ~ 200

Author(s):

Juwei Lu ◽

K.N. Plataniotis ◽

A.N. Venetsanopoulos

Keyword(s):

Face Recognition ◽

Discriminant Analysis ◽

Sample Size ◽

Linear Discriminant Analysis ◽

Small Sample Size ◽

Small Sample ◽

Linear Discriminant

Download Full-text

An Ensemble Classification Method for High-Dimensional Data Using Neighborhood Rough Set

Complexity ◽

10.1155/2021/8358921 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Jing Zhang ◽

Guang Lu ◽

Jiaquan Li ◽

Chuanwen Li

Keyword(s):

Feature Selection ◽

Rough Set ◽

Small Sample Size ◽

High Dimensional Data ◽

Classification Performance ◽

Small Sample ◽

Ensemble Classification ◽

High Dimensional ◽

Sample Classification ◽

Neighborhood Rough Set

Mining useful knowledge from high-dimensional data is a hot research topic. Efficient and effective sample classification and feature selection are challenging tasks due to high dimensionality and small sample size of microarray data. Feature selection is necessary in the process of constructing the model to reduce time and space consumption. Therefore, a feature selection model based on prior knowledge and rough set is proposed. Pathway knowledge is used to select feature subsets, and rough set based on intersection neighborhood is then used to select important feature in each subset, since it can select features without redundancy and deals with numerical features directly. In order to improve the diversity among base classifiers and the efficiency of classification, it is necessary to select part of base classifiers. Classifiers are grouped into several clusters by k-means clustering using the proposed combination distance of Kappa-based diversity and accuracy. The base classifier with the best classification performance in each cluster will be selected to generate the final ensemble model. Experimental results on three Arabidopsis thaliana stress response datasets showed that the proposed method achieved better classification performance than existing ensemble models.

Download Full-text

Filtering high-dimensional methylation marks with extremely small sample size: an application to gastric cancer data

10.21203/rs.3.rs-284773/v1 ◽

2021 ◽

Author(s):

Xin Chen ◽

Qingrun Zhang ◽

Thierry Chekouo

Keyword(s):

Gastric Cancer ◽

Dna Methylation ◽

Sample Size ◽

Small Sample Size ◽

Small Sample ◽

Differential Methylation ◽

High Dimensional ◽

Cancer Data ◽

Cancer Pathogenesis ◽

A Genome

Abstract Background: DNA methylations in critical regions are highly involved in cancer pathogenesis and drug response. However, to identify causal methylations out of a large number of potential polymorphic DNA methylation sites is challenging. This high-dimensional data brings two obstacles: first, many established statistical models are not scalable to so many features; second, multiple-test and overfitting become serious. To this end, a method to quickly filter candidate sites to narrow down targets for downstream analyses is urgently needed. Methods: BACkPAy is a pre-screening Bayesian approach to detect biological meaningful clusters of potential differential methylation levels with small sample size. BACkPAy prioritizes potentially important biomarkers by the Bayesian false discovery rate (FDR) approach. It filters non-informative sites (i.e. non-differential) with flat methylation pattern levels accross experimental conditions. In this work, we applied BACkPAy to a genome-wide methylation dataset with 3 tissue types and each type contains 3 gastric cancer samples. We also applied LIMMA (Linear Models for Microarray and RNA-Seq Data) to compare its results with what we achieved by BACkPAy. Then, Cox proportional hazards regression models were utilized to visualize prognostics significant markers with The Cancer Genome Atlas (TCGA) data for survival analysis. Results: Using BACkPAy, we identified 8 biological meaningful clusters/groups of differential probes from the DNA methylation dataset. Using TCGA data, we also identified five prognostic genes (i.e. predictive to the progression of gastric cancer) that contain some differential methylation probes, whereas no significant results was identified using the Benjamin-Hochberg FDR in LIMMA. Conclusions: We showed the importance of using BACkPAy for the analysis of DNA methylation data with extremely small sample size in gastric cancer. We revealed that RDH13, CLDN11, TMTC1, UCHL1 and FOXP2 can serve as predictive biomarkers for gastric cancer treatment and the promoter methylation level of these five genes in serum could have prognostic and diagnostic functions in gastric cancer patients.

Download Full-text

Subspace Regularized Linear Discriminant Analysis for Small Sample Size Problems

Lecture Notes in Computer Science - PRICAI 2012: Trends in Artificial Intelligence ◽

10.1007/978-3-642-32695-0_58 ◽

2012 ◽

pp. 661-672

Author(s):

Zhidong Wang ◽

Wuyi Yang

Keyword(s):

Discriminant Analysis ◽

Sample Size ◽

Linear Discriminant Analysis ◽

Small Sample Size ◽

Small Sample ◽

Linear Discriminant ◽

Regularized Linear Discriminant Analysis

Download Full-text

Regularized Complete Linear Discriminant Analysis for Small Sample Size Problems

Communications in Computer and Information Science - Emerging Intelligent Computing Technology and Applications ◽

10.1007/978-3-642-31837-5_10 ◽

2012 ◽

pp. 67-73

Author(s):

Wuyi Yang

Keyword(s):

Discriminant Analysis ◽

Sample Size ◽

Linear Discriminant Analysis ◽

Small Sample Size ◽

Small Sample ◽

Linear Discriminant

Download Full-text

Neuromorphic Representation of Cardiac Data from the American Black Bear During Hibernation

2021 Design of Medical Devices Conference ◽

10.1115/dmd2021-1073 ◽

2020 ◽

Author(s):

Tinen L. Iles ◽

Timothy G. Laske ◽

Paul A. Iaizzo ◽

Elishai Ezra Tsur

Keyword(s):

Neural Networks ◽

Adaptive Control ◽

Cardiac Pacing ◽

Black Bear ◽

Small Sample ◽

High Dimensional ◽

Data Set ◽

American Black Bear ◽

American Black ◽

Ecg Data

Abstract Brain-inspired (neuromorphic) systems realize biological neural principles with Spiking Neural Networks (SNN) to provide high-performing, energy-efficient frameworks for robotics, artificial intelligence, and adaptive control. The Neural Engineering Framework (NEF) brings forth a theoretical framework approach for the representation of high-dimensional mathematical constructs with spiking neurons for the implementation of functional large-scale neural networks. Here, we explore the utilization of neuromorphic adaptive control for circadian modulated cardiac pacing by examining the neuromorphic representation of high-dimensional cardiac data. For this study, we have utilized a model from a data set acquired from an American black bear during hibernation. Black bears in Minnesota will hibernate for 4-6 months without eating and drinking while losing little muscle mass and remain relatively normothermic throughout the winter [10]. In the current study, we obtained EEG and ECG data from one black bear throughout the winter months in Grand Rapids, MN, represented with NEF. Our results demonstrated opposing requirements for neuromorphic representation. While using high synaptic time constants for obtained ECG data, provided desirable low pass filtering, representation of EEG data requires fast synapses and a high number of neurons. Although this is only an analysis of a small sample of the data available, these guidelines provided the robust pilot dataset to observe the SNN patterns during prolonged hibernation and pair this data with the cardiac responses and thus support research questions related to the autonomic tone during hibernation. This preliminary research will help further develop our neuromorphic adaptive controller to better adapt cardiac pacing to circadian rhythms. This unique dataset may pave the way toward deciphering the underlying neural mechanisms of hibernation, providing translational to humans.

Download Full-text

SSMD: a semi-supervised approach for a robust cell type identification and deconvolution of mouse transcriptomics data

Briefings in Bioinformatics ◽

10.1093/bib/bbaa307 ◽

2020 ◽

Author(s):

Xiaoyu Lu ◽

Szu-Wei Tu ◽

Wennan Chang ◽

Changlin Wan ◽

Jiashi Wang ◽

...

Keyword(s):

Small Sample Size ◽

Cell Types ◽

Small Sample ◽

Training Data ◽

Mouse Tissue ◽

Marker Genes ◽

Specific Cell ◽

Cell Type ◽

Data Set ◽

Transcriptomics Data

Abstract Deconvolution of mouse transcriptomic data is challenged by the fact that mouse models carry various genetic and physiological perturbations, making it questionable to assume fixed cell types and cell type marker genes for different data set scenarios. We developed a Semi-Supervised Mouse data Deconvolution (SSMD) method to study the mouse tissue microenvironment. SSMD is featured by (i) a novel nonparametric method to discover data set-specific cell type signature genes; (ii) a community detection approach for fixing cell types and their marker genes; (iii) a constrained matrix decomposition method to solve cell type relative proportions that is robust to diverse experimental platforms. In summary, SSMD addressed several key challenges in the deconvolution of mouse tissue data, including: (i) varied cell types and marker genes caused by highly divergent genotypic and phenotypic conditions of mouse experiment; (ii) diverse experimental platforms of mouse transcriptomics data; (iii) small sample size and limited training data source and (iv) capable to estimate the proportion of 35 cell types in blood, inflammatory, central nervous or hematopoietic systems. In silico and experimental validation of SSMD demonstrated its high sensitivity and accuracy in identifying (sub) cell types and predicting cell proportions comparing with state-of-the-arts methods. A user-friendly R package and a web server of SSMD are released via https://github.com/xiaoyulu95/SSMD.

Download Full-text

Discriminant Analysis for Biometric Recognition

Advances in Information and Communication Technology Education - Advanced Pattern Recognition Technologies with Applications to Biometrics ◽

10.4018/978-1-60566-200-8.ch002 ◽

2011 ◽

pp. 25-29

Author(s):

David Zhang ◽

Fengxi Song ◽

Yong Xu ◽

Zhizhen Liang

Keyword(s):

Pattern Recognition ◽

Feature Extraction ◽

Discriminant Analysis ◽

Sample Size ◽

Linear Discriminant Analysis ◽

Small Sample Size ◽

Small Sample ◽

Biometric Recognition ◽

Linear Discriminant

This chapter is a brief introduction to biometric discriminant analysis technologies — Section I of the book. Section 2.1 describes two kinds of linear discriminant analysis (LDA) approaches: classification-oriented LDA and feature extraction-oriented LDA. Section 2.2 discusses LDA for solving the small sample size (SSS) pattern recognition problems. Section 2.3 shows the organization of Section I.

Download Full-text