Application of t-SNE to human genetic data

The t-distributed stochastic neighbor embedding t-SNE is a new dimension reduction and visualization technique for high-dimensional data. t-SNE is rarely applied to human genetic data, even though it is commonly used in other data-intensive biological fields, such as single-cell genomics. We explore the applicability of t-SNE to human genetic data and make these observations: (i) similar to previously used dimension reduction techniques such as principal component analysis (PCA), t-SNE is able to separate samples from different continents; (ii) unlike PCA, t-SNE is more robust with respect to the presence of outliers; (iii) t-SNE is able to display both continental and sub-continental patterns in a single plot. We conclude that the ability for t-SNE to reveal population stratification at different scales could be useful for human genetic association studies.

Download Full-text

Application of t-SNE to Human Genetic Data

10.1101/114884 ◽

2017 ◽

Author(s):

Wentian Li ◽

Jane E Cerise ◽

Yaning Yang ◽

Henry Han

Keyword(s):

Dimension Reduction ◽

Population Stratification ◽

Association Studies ◽

Genetic Association Studies ◽

Principal Component ◽

Genetic Data ◽

Visualization Technique ◽

Data Intensive ◽

Reduction Techniques ◽

Dimension Reduction Techniques

AbstractThe t-SNE (t-distributed stochastic neighbor embedding) is a new dimension reduction and visualization technique for high-dimensional data. t-SNE is rarely applied to human genetic data, even though it is commonly used in other data-intensive biological fields, such as single-cell genomics. We explore the applicability of t-SNE to human genetic data and make these observations: (i) similar to previously used dimension reduction techniques such as principal component analysis (PCA), t-SNE is able to separate samples from different continents; (ii) unlike PCA, t-SNE is more robust with respect to the presence of outliers; (iii) t-SNE is able to display both continental and sub-continental patterns in a single plot. We conclude that the ability for t-SNE to reveal population stratification at different scales could be useful for human genetic association studies.

Download Full-text

Principal component of explained variance: An efficient and optimal data dimension reduction framework for association studies

Statistical Methods in Medical Research ◽

10.1177/0962280216660128 ◽

2016 ◽

Vol 27 (5) ◽

pp. 1331-1350 ◽

Cited By ~ 4

Author(s):

Maxime Turgeon ◽

Karim Oualkacha ◽

Antonio Ciampi ◽

Hanane Miftah ◽

Golsa Dehghan ◽

...

Keyword(s):

Dimension Reduction ◽

Association Studies ◽

Computational Cost ◽

Principal Component ◽

Original Method ◽

High Dimensional ◽

Testing Procedures ◽

Simple Strategy ◽

Reduction Techniques ◽

Explained Variance

The genomics era has led to an increase in the dimensionality of data collected in the investigation of biological questions. In this context, dimension-reduction techniques can be used to summarise high-dimensional signals into low-dimensional ones, to further test for association with one or more covariates of interest. This paper revisits one such approach, previously known as principal component of heritability and renamed here as principal component of explained variance (PCEV). As its name suggests, the PCEV seeks a linear combination of outcomes in an optimal manner, by maximising the proportion of variance explained by one or several covariates of interest. By construction, this method optimises power; however, due to its computational complexity, it has unfortunately received little attention in the past. Here, we propose a general analytical PCEV framework that builds on the assets of the original method, i.e. conceptually simple and free of tuning parameters. Moreover, our framework extends the range of applications of the original procedure by providing a computationally simple strategy for high-dimensional outcomes, along with exact and asymptotic testing procedures that drastically reduce its computational cost. We investigate the merits of the PCEV using an extensive set of simulations. Furthermore, the use of the PCEV approach is illustrated using three examples taken from the fields of epigenetics and brain imaging.

Download Full-text

Principal component of explained variance: an efficient and optimal data dimension reduction framework for association studies

10.1101/036566 ◽

2016 ◽

Cited By ~ 1

Author(s):

Maxime Turgeon ◽

Karim Oualkacha ◽

Antonio Ciampi ◽

Golsa Dehghan ◽

Brent W. Zanke ◽

...

Keyword(s):

Dimension Reduction ◽

Association Studies ◽

Computational Cost ◽

Principal Component ◽

Original Method ◽

High Dimensional ◽

Testing Procedures ◽

Simple Strategy ◽

Reduction Techniques ◽

Explained Variance

The genomics era has led to an increase in the dimensionality of the data collected to investigate biological questions. In this context, dimension-reduction techniques can be used to summarize high-dimensional signals into low-dimensional ones, to further test for association with one or more covariates of interest. This paper revisits one such approach, previously known as Principal Component of Heritability and renamed here as Principal Component of Explained Variance (PCEV). As its name suggests, the PCEV seeks a linear combination of outcomes in an optimal manner, by maximising the proportion of variance explained by one or several covariates of interest. By construction, this method optimises power but limited by its computational complexity, it has unfortunately received little attention in the past. Here, we propose a general analytical PCEV framework that builds on the assets of the original method, i.e. conceptually simple and free of tuning parameters. Moreover, our framework extends the range of applications of the original procedure by providing a computationally simple strategy for high-dimensional outcomes, along with exact and asymptotic testing procedures that drastically reduce its computational cost. We investigate the merits of the PCEV using an extensive set of simulations. Furthermore, the use of the PCEV approach will be illustrated using three examples taken from the epigenetics and brain imaging areas.

Download Full-text