A Fully Automated Parallel-Processing R Package for High-Dimensional Multiple-Phenotype Analysis Considering Population Structure

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.

Download Full-text

R/qtlcharts: interactive graphics for quantitative trait locus mapping

10.1101/011437 ◽

2014 ◽

Cited By ~ 1

Author(s):

Karl W Broman

Keyword(s):

Quantitative Trait Locus ◽

Quantitative Trait Locus Mapping ◽

Quantitative Trait ◽

Quantitative Traits ◽

R Package ◽

High Dimensional ◽

Interactive Graphics ◽

Phenotype Data ◽

Trait Locus ◽

Locus Mapping

Every data visualization can be improved with some level of interactivity. Interactive graphics hold particular promise for the exploration of high-dimensional data. R/qtlcharts is an R package to create interactive graphics for experiments to map quantitative trait loci (QTL; genetic loci that influence quantitative traits). R/qtlcharts serves as a companion to the R/qtl package, providing interactive versions of R/qtl's static graphs, as well as additional interactive graphs for the exploration of high-dimensional genotype and phenotype data.

Download Full-text

An R package for divergence analysis of omics data

PLoS ONE ◽

10.1371/journal.pone.0249002 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0249002

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis ◽

Data Analysis Methods ◽

Genome Atlas ◽

Omics Data Analysis

Given the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with data from the Cancer Genome Atlas.

Download Full-text

A Tutorial on : R Package for the Linearized Bregman Algorithm in High-Dimensional Statistics

Handbook of Big Data Analytics - Springer Handbooks of Computational Statistics ◽

10.1007/978-3-319-18284-1_17 ◽

2018 ◽

pp. 425-453

Author(s):

Jiechao Xiong ◽

Feng Ruan ◽

Yuan Yao

Keyword(s):

R Package ◽

High Dimensional ◽

High Dimensional Statistics

Download Full-text

Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent

BMC Bioinformatics ◽

10.1186/s12859-020-03725-w ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Jan Klosa ◽

Noah Simon ◽

Pål Olof Westermark ◽

Volkmar Liebscher ◽

Dörte Wittenburg

Keyword(s):

Linear Regression ◽

Regression Models ◽

Gradient Descent ◽

Methylation Status ◽

R Package ◽

Group Lasso ◽

High Dimensional ◽

Linear Regression Models ◽

Sparse Group Lasso ◽

Proximal Gradient Descent

Abstract Background Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths. Results Publicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R2 > 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature. Conclusions The following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants.

Download Full-text

MLRMPA: An R package of multiple linear regression model population analysis based on a cluster sampling technique for variable selection of high dimensional data

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2014.01.010 ◽

2014 ◽

Vol 132 ◽

pp. 124-132 ◽

Cited By ~ 4

Author(s):

Meihong Xie ◽

Fangfang Deng ◽

Xiaoyun Zhang ◽

Yueli Tian ◽

Peizhen Li ◽

...

Keyword(s):

Linear Regression ◽

Regression Model ◽

Linear Regression Model ◽

Population Analysis ◽

Multiple Linear Regression Model ◽

Sampling Technique ◽

R Package ◽

High Dimensional ◽

Cluster Sampling ◽

Selection Of

Download Full-text

Constructing plasticity phenotypes to classify experience-dependent development of the visual cortex

10.1101/2020.01.07.896191 ◽

2020 ◽

Cited By ~ 1

Author(s):

Justin L. Balsor ◽

David G. Jones ◽

Kathryn M. Murphy

Keyword(s):

Visual Cortex ◽

Neural Development ◽

Large Data ◽

R Package ◽

Synaptic Proteins ◽

High Dimensional ◽

Data Sets ◽

Visual Plasticity ◽

Dependent Plasticity ◽

Dimensional Changes

AbstractMany neural mechanisms regulate experience-dependent plasticity in the visual cortex (V1) and new techniques for quantifying large numbers of proteins or genes are transforming how plasticity is studied into the era of big data. With those large data sets comes the challenge of extracting biologically meaningful results about visual plasticity from data-driven analytical methods designed for high-dimensional data. In other areas of neuroscience, high-information content methodologies are revealing more subtle aspects of neural development and individual variations that give rise to a richer picture of brain disorders. We have developed an approach for studying V1 plasticity that takes advantage of the known functions of many synaptic proteins for regulating visual plasticity and using that to rebrand the results of high-dimensional analyses into a plasticity phenotype. Here we provide a primer for analyzing experience-dependent plasticity in V1 using example R code to identify high-dimensional changes in a group of proteins. We describe using PCA to classify high-dimensional plasticity features and use them to construct a plasticity phenotype. In the examples, we show how the plasticity phenotype can be visualized and used to identify neurobiological features in V1 that change during development or after different visual rearing conditions. We include an R package “v1hdexplorer” that aggregates the various coding packages and custom visualization scripts written in R Studio.

Download Full-text

PhenoExam: an R package and Web application for the examination of phenotypes linked to genes and gene sets

10.1101/2021.06.29.450324 ◽

2021 ◽

Author(s):

Alejandro Cisterna García ◽

Aurora González-Vidal ◽

Daniel Ruiz Villa ◽

Jordi Ortiz Murillo ◽

Alicia Gómez-Pascual ◽

...

Keyword(s):

Web Application ◽

Enrichment Analysis ◽

R Package ◽

Web Interface ◽

Gene Set ◽

New Genes ◽

Gene Sets ◽

Phenotype Analysis ◽

New Gene ◽

Early Onset Parkinson’S Disease

Gene set based phenotype enrichment analysis (detecting phenotypic terms that emerge as significant in a set of genes) can improve the rate of genetic diagnoses amongst other research purposes. To facilitate diverse phenotype analysis, we developed PhenoExam, a freely available R package for tool developers and a web interface for users, which performs: (1) phenotype and disease enrichment analysis on a gene set; (2) measures statistically significant phenotype similarities between gene sets and (3) detects significant differential phenotypes or disease terms across different databases. PhenoExam achieves these tasks by integrating databases or resources such as the HPO, MGD, CRISPRbrain, CTD, ClinGen, CGI, OrphaNET, UniProt, PsyGeNET, and Genomics England Panel App. PhenoExam accepts both human and mouse genes as input. We developed PhenoExam to assist a variety of users, including clinicians, computational biologists and geneticists. It can be used to support the validation of new gene-to-disease discoveries, and in the detection of differential phenotypes between two gene sets (a phenotype linked to one of the gene set but no to the other) that are useful for differential diagnosis and to improve genetic panels. We validated PhenoExam performance through simulations and its application to real cases. We demonstrate that PhenoExam is effective in distinguishing gene sets or Mendelian diseases with very similar phenotypes through projecting the disease-causing genes into their annotation-based phenotypic spaces. We also tested the tool with early onset Parkinson's disease and dystonia genes, to show phenotype-level similarities but also potentially interesting differences. More specifically, we used PhenoExam to validate computationally predicted new genes potentially associated with epilepsy. Therefore, PhenoExam effectively discovers links between phenotypic terms across annotation databases through effective integration. The R package is available at https://github.com/alexcis95/PhenoExam and the Web tool is accessible at https://snca.atica.um.es/PhenoExamWeb/.

Download Full-text

ImmunoCluster: A computational framework for the non-specialist to profile cellular heterogeneity in cytometry datasets

10.1101/2020.09.09.289033 ◽

2020 ◽

Cited By ~ 1

Author(s):

James W. Opzoomer ◽

Jessica Timms ◽

Kevin Blighe ◽

Thanos P. Mourikis ◽

Nicolas Chapuis ◽

...

Keyword(s):

R Package ◽

Immune Monitoring ◽

Cellular Heterogeneity ◽

High Dimensional ◽

Computational Framework ◽

Flow Cytometry Data ◽

Innovative Tool ◽

Health And Disease ◽

Analytical Approaches ◽

Immune Profiling

AbstractHigh dimensional cytometry is an innovative tool for immune monitoring in health and disease, it has provided novel insight into the underlying biology as well as biomarkers for a variety of diseases. However, the analysis of multiparametric “big data” usually requires specialist computational knowledge. Here we describe ImmunoCluster (https://github.com/kordastilab/ImmunoCluster) an R package for immune profiling cellular heterogeneity in high dimensional liquid and imaging mass cytometry, and flow cytometry data, designed to facilitate computational analysis by a non-specialist. The analysis framework implemented within ImmunoCluster is readily scalable to millions of cells and provides a variety of visualization and analytical approaches, as well as a rich array of plotting tools that can be tailored to users’ needs. The protocol consists of three core computational stages: 1, data import and quality control, 2, dimensionality reduction and unsupervised clustering; and 3, annotation and differential testing, all contained within an R-based open-source framework.

Download Full-text