A Practical Guide to Sparse k-Means Clustering for Studying Molecular Development of the Human Brain

Studying the molecular development of the human brain presents unique challenges for selecting a data analysis approach. The rare and valuable nature of human postmortem brain tissue, especially for developmental studies, means the sample sizes are small (n), but the use of high throughput genomic and proteomic methods measure the expression levels for hundreds or thousands of variables [e.g., genes or proteins (p)] for each sample. This leads to a data structure that is high dimensional (p ≫ n) and introduces the curse of dimensionality, which poses a challenge for traditional statistical approaches. In contrast, high dimensional analyses, especially cluster analyses developed for sparse data, have worked well for analyzing genomic datasets where p ≫ n. Here we explore applying a lasso-based clustering method developed for high dimensional genomic data with small sample sizes. Using protein and gene data from the developing human visual cortex, we compared clustering methods. We identified an application of sparse k-means clustering [robust sparse k-means clustering (RSKC)] that partitioned samples into age-related clusters that reflect lifespan stages from birth to aging. RSKC adaptively selects a subset of the genes or proteins contributing to partitioning samples into age-related clusters that progress across the lifespan. This approach addresses a problem in current studies that could not identify multiple postnatal clusters. Moreover, clusters encompassed a range of ages like a series of overlapping waves illustrating that chronological- and brain-age have a complex relationship. In addition, a recently developed workflow to create plasticity phenotypes (Balsor et al., 2020) was applied to the clusters and revealed neurobiologically relevant features that identified how the human visual cortex changes across the lifespan. These methods can help address the growing demand for multimodal integration, from molecular machinery to brain imaging signals, to understand the human brain’s development.

Download Full-text

A Practical Guide to Sparse Clustering for Studying Molecular Development of the Human Brain

10.1101/2020.12.31.425014 ◽

2021 ◽

Author(s):

Justin L. Balsor ◽

Keon Arbabi ◽

Dezi Ahuja ◽

Ewalina Jeyanesan ◽

Kathryn M. Murphy

Keyword(s):

Human Brain ◽

Brain Development ◽

Small Sample ◽

Postmortem Brain ◽

Clustering Methods ◽

Sample Sizes ◽

Human Postmortem Brain ◽

Human Brain Development ◽

Development Data ◽

Small Sample Sizes

AbstractStudying the molecular development of the human brain presents unique challenges for selecting the best data analysis approach. The rare and valuable nature of human postmortem brain samples, especially for studies examining development, means that those studies have small sample sizes (n) but often include measurements (p) for a large number of genes or proteins for every sample. Thus, most of those data sets have a structure that is p >> n, which introduces the problem of sparsity. Here we present a guide to analyzing human brain development data by focusing on sparsity-based clustering methods developed for small sample sizes. We test different methods and identify an application of sparse K-means clustering called Robust Sparse K-means Clustering (RSKC) that does a good job revealing clusters of samples that reflect lifespan stages from birth to aging. The algorithm adaptively selects a subset of the genes or proteins that contributes to generating clusters of samples that are spread across the lifespan. This approach addresses a problem in current studies that were unable to identify postnatal clusters. The guide illustrates that careful selection of the clustering method is essential to reveal meaningful aspects of human brain development.

Download Full-text

Decoding Categories from Human Brain Activity in the Human Visual Cortex Using the Triplet Network

Proceedings of the 2021 International Conference on Bioinformatics and Intelligent Computing ◽

10.1145/3448748.3448769 ◽

2021 ◽

Author(s):

Lulu Hu ◽

Jingwei Li ◽

Chi Zhang ◽

Li Tong

Keyword(s):

Visual Cortex ◽

Human Brain ◽

Brain Activity ◽

Human Visual Cortex

Download Full-text

Evidence of neuroplasticity in the human visual cortex following beneficial anti-VEGF treatment in exudative age-related macular degeneration

Acta Ophthalmologica ◽

10.1111/j.1755-3768.2012.f074.x ◽

2012 ◽

Vol 90 ◽

pp. 0-0

Author(s):

P VOTTONEN ◽

A PÄÄKKÖNEN ◽

I TARKKA ◽

K KAARNIRANTA

Keyword(s):

Visual Cortex ◽

Macular Degeneration ◽

Age Related Macular Degeneration ◽

Human Visual Cortex ◽

Anti Vegf ◽

Age Related

Download Full-text

Effects of underlying gene-regulation network structure on prediction accuracy in high-dimensional regression

10.1101/2020.09.11.293456 ◽

2020 ◽

Author(s):

Yuichi Okinaga ◽

Daisuke Kyogoku ◽

Satoshi Kondo ◽

Atsushi J. Nagano ◽

Kei Hirose

Keyword(s):

Gene Regulation ◽

Prediction Accuracy ◽

Principal Component ◽

Small Sample ◽

Estimation Methods ◽

Estimation Accuracy ◽

High Dimensional ◽

Sample Sizes ◽

Gene Regulation Network ◽

Regulation Network

AbstractMotivationThe least absolute shrinkage and selection operator (lasso) and principal component regression (PCR) are popular methods of estimating traits from high-dimensional omics data, such as transcriptomes. The prediction accuracy of these estimation methods is highly dependent on the covariance structure, which is characterized by gene regulation networks. However, the manner in which the structure of a gene regulation network together with the sample size affects prediction accuracy has not yet been sufficiently investigated. In this study, Monte Carlo simulations are conducted to investigate the prediction accuracy for several network structures under various sample sizes.ResultsWhen the gene regulation network was random graph, the simulation indicated that models with high estimation accuracy could be achieved with small sample sizes. However, a real gene regulation network is likely to exhibit a scale-free structure. In such cases, the simulation indicated that a relatively large number of observations is required to accurately predict traits from a transcriptome.Availability and implementationSource code at https://github.com/keihirose/[email protected]

Download Full-text

Nonparametric relevance-shifted multiple testing procedures for the analysis of high-dimensional multivariate data with small sample sizes

BMC Bioinformatics ◽

10.1186/1471-2105-9-54 ◽

2008 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Cornelia Frömke ◽

Ludwig A Hothorn ◽

Siegfried Kropf

Keyword(s):

Multiple Testing ◽

Multivariate Data ◽

Small Sample ◽

High Dimensional ◽

Sample Sizes ◽

Testing Procedures ◽

Multiple Testing Procedures ◽

Small Sample Sizes

Download Full-text

RNA Metabolism in Human Brain During Aging and in Alzheimer’s Disease: Rna Synthesis in the Nuclei Isolated from Postmortem Brain Tissue

Advances in Behavioral Biology - Treatment of Dementias ◽

10.1007/978-1-4615-3432-7_29 ◽

1992 ◽

pp. 397-406

Author(s):

Elizabeth M. Sajdel-Sulkowska

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Human Brain ◽

Brain Tissue ◽

Rna Synthesis ◽

Rna Metabolism ◽

Postmortem Brain ◽

Postmortem Brain Tissue

Download Full-text

The Normalization Model Captures the Effects of Object-based Attention in the Human Visual Cortex

10.1101/2021.05.21.445228 ◽

2021 ◽

Author(s):

Narges Doostani ◽

Gholam-Ali Hossein-Zadeh ◽

Maryam Vaziri-Pashkam

Keyword(s):

Visual Cortex ◽

Human Brain ◽

Human Visual Cortex ◽

Visual Hierarchy ◽

Primate Brain ◽

Object Based

Here, we report that normalization model can capture the effects of object-based attention across the visual hierarchy in the human brain. We used superimposed pairs of objects and asked participants to attend to different targets. Modeling voxel responses, we demonstrated that the normalization model outperforms other models in predicting voxel responses in the presence of attention. Our results propose normalization as a canonical computation operating in the primate brain.

Download Full-text

Small sample sizes: A big data problem in high-dimensional data analysis

Statistical Methods in Medical Research ◽

10.1177/0962280220970228 ◽

2020 ◽

pp. 096228022097022

Author(s):

Frank Konietschke ◽

Karima Schwab ◽

Markus Pauly

Keyword(s):

Repeated Measures ◽

Real Data ◽

Small Sample ◽

High Dimensional ◽

Data Sets ◽

Sample Sizes ◽

Preclinical Research ◽

Data Set ◽

Data Problem ◽

Small Sample Sizes

In many experiments and especially in translational and preclinical research, sample sizes are (very) small. In addition, data designs are often high dimensional, i.e. more dependent than independent replications of the trial are observed. The present paper discusses the applicability of max t-test-type statistics (multiple contrast tests) in high-dimensional designs (repeated measures or multivariate) with small sample sizes. A randomization-based approach is developed to approximate the distribution of the maximum statistic. Extensive simulation studies confirm that the new method is particularly suitable for analyzing data sets with small sample sizes. A real data set illustrates the application of the methods.

Download Full-text