ACTOR: a latent Dirichlet model to compare expressed isoform proportions to a reference panel

Mapping Intimacies ◽

10.1101/856401 ◽

2019 ◽

Author(s):

Sean D. McCabe ◽

Andrew B. Nobel ◽

Michael I. Love

Keyword(s):

Relative Proportion ◽

Disease Status ◽

R Package ◽

Tissue Expression ◽

Reference Panel ◽

Tissue Type ◽

Disease States ◽

Dirichlet Model ◽

Public Datasets ◽

Splicing Patterns

AbstractThe relative proportion of RNA isoforms expressed for a given gene has been associated with disease states in cancer, retinal diseases, and neurological disorders. Examination of relative isoform proportions can help determine biological mechanisms, but such analyses often require a per-gene investigation of splicing patterns. Leveraging large public datasets produced by genomic consortia as a reference, one can compare splicing patterns in a dataset of interest with those of a reference panel in which samples are divided into distinct groups (tissue of origin, disease status, etc). We propose ACTOR, a latent Dirichlet model with Dirichlet Multinomial observations to compare expressed isoform proportions in a dataset to an independent reference panel. We use a variational Bayes procedure to estimate posterior distributions for the group membership of one or more samples. Using the Genotype-Tissue Expression (GTEx) project as a reference dataset, we evaluate ACTOR on simulated and real RNA-seq datasets to determine tissue-type classifications of genes. ACTOR is publicly available as an R package at https://github.com/mccabes292/actor.

Transcriptome-wide Mendelian randomization study prioritising novel tissue-dependent genes for glioma susceptibility

Scientific Reports ◽

10.1038/s41598-021-82169-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jamie W. Robinson ◽

Richard M. Martin ◽

Spiridon Tsavachidis ◽

Amy E. Howell ◽

Caroline L. Relton ◽

...

Keyword(s):

Gene Expression ◽

Association Studies ◽

Tissue Expression ◽

Tissue Type ◽

Mendelian Randomisation ◽

Genome Wide Association Studies ◽

Causal Pathways ◽

Genome Wide ◽

Glioma Risk ◽

Brain Tissues

AbstractGenome-wide association studies (GWAS) have discovered 27 loci associated with glioma risk. Whether these loci are causally implicated in glioma risk, and how risk differs across tissues, has yet to be systematically explored. We integrated multi-tissue expression quantitative trait loci (eQTLs) and glioma GWAS data using a combined Mendelian randomisation (MR) and colocalisation approach. We investigated how genetically predicted gene expression affects risk across tissue type (brain, estimated effective n = 1194 and whole blood, n = 31,684) and glioma subtype (all glioma (7400 cases, 8257 controls) glioblastoma (GBM, 3112 cases) and non-GBM gliomas (2411 cases)). We also leveraged tissue-specific eQTLs collected from 13 brain tissues (n = 114 to 209). The MR and colocalisation results suggested that genetically predicted increased gene expression of 12 genes were associated with glioma, GBM and/or non-GBM risk, three of which are novel glioma susceptibility genes (RETREG2/FAM134A, FAM178B and MVB12B/FAM125B). The effect of gene expression appears to be relatively consistent across glioma subtype diagnoses. Examining how risk differed across 13 brain tissues highlighted five candidate tissues (cerebellum, cortex, and the putamen, nucleus accumbens and caudate basal ganglia) and four previously implicated genes (JAK1, STMN3, PICK1 and EGFR). These analyses identified robust causal evidence for 12 genes and glioma risk, three of which are novel. The correlation of MR estimates in brain and blood are consistently low which suggested that tissue specificity needs to be carefully considered for glioma. Our results have implicated genes yet to be associated with glioma susceptibility and provided insight into putatively causal pathways for glioma risk.

Saliva cell type DNA methylation reference panel for epidemiology studies in children

10.1101/2020.09.14.20191361 ◽

2020 ◽

Author(s):

Lauren Y M Middleton ◽

John F Dou ◽

Jonah Fisher ◽

Jonathan A Heiss ◽

Vy Nguyen ◽

...

Keyword(s):

Dna Methylation ◽

Epithelial Cells ◽

Immune Cell ◽

R Package ◽

Magnetic Bead ◽

Reference Panel ◽

Size Exclusion ◽

Cell Type ◽

Whole Saliva ◽

Epidemiology Studies

Saliva is a widely used biological sample, especially in pediatric research, containing a heterogenous mixture of immune and epithelial cells. Associations of exposure or disease with saliva DNA methylation can be influenced by cell-type proportions. Here, we developed a saliva cell-type DNA methylation reference panel to estimate interindividual cell-type heterogeneity in whole saliva studies. Saliva was collected from 22 children (7-16 years) and sorted into immune and epithelial cells, using size exclusion filtration and magnetic bead sorting. DNA methylation was measured using the Illumina MethylationEPIC BeadChip. We assessed cell-type differences in DNA methylation profiles and tested for enriched biological pathways. Immune and epithelial cells differed at 164,793 (20.7%) DNA methylation sites (t-test p < 10-8). Immune cell hypomethylated sites mapped to genes enriched for immune pathways (p < 3.2 x 10-5). Epithelial cell hypomethylated sites were enriched for cornification (p = 5.2 x 10-4), a key process for hard palette formation. Saliva immune and epithelial cells have distinct DNA methylation profiles which can drive whole saliva DNA methylation measures. A primary saliva DNA methylation reference panel, easily implemented with an R package, will allow estimates of cell proportions from whole saliva samples and improve epigenetic epidemiology studies by accounting for measurement heterogeneity by cell-type proportions.

The disease status of catatonia

Behavioral and Brain Sciences ◽

10.1017/s0140525x02350106 ◽

2002 ◽

Vol 25 (5) ◽

pp. 590-591 ◽

Cited By ~ 1

Author(s):

Irwin Savodnik

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Disease Status ◽

Logical Status ◽

Disease States

Georg Northoff encounters a problem regarding the logical status of “catatonia.” Whereas Parkinson's disease (PD) is a disease on the basis of Virchowian criteria, catatonia is not. PD is associated with pathognomonic neurological lesions. Catatonia does not require any such association. The diagnosis is rendered using social criteria rather than neuropathological ones. Therefore, Northoff is not comparing two disease states at all.

Endosialin and Associated Protein Expression in Soft Tissue Sarcomas: A Potential Target for Anti-Endosialin Therapeutic Strategies

Sarcoma ◽

10.1155/2016/5213628 ◽

2016 ◽

Vol 2016 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Daniel J. O’Shannessy ◽

Hongyue Dai ◽

Melissa Mitchell ◽

Shane Huntsman ◽

Stephen Brantley ◽

...

Keyword(s):

Gene Expression ◽

Soft Tissue ◽

Protein Expression ◽

Soft Tissue Sarcomas ◽

Tissue Expression ◽

Tissue Type ◽

Expression Data ◽

Interacting Protein ◽

Undifferentiated Sarcoma ◽

Tumor Types

Endosialin (CD248, TEM-1) is expressed in pericytes, tumor vasculature, tumor fibroblasts, and some tumor cells, including sarcomas, with limited normal tissue expression, and appears to play a key role in tumor-stromal interactions, including angiogenesis. Monoclonal antibodies targeting endosialin have entered clinical trials, including soft tissue sarcomas. We evaluated a cohort of 94 soft tissue sarcoma samples to assess the correlation between gene expression and protein expression by immunohistochemistry for endosialin and PDGFR-β, a reported interacting protein, across available diagnoses. Correlations between the expression of endosialin and 13 other genes of interest were also examined. Within cohorts of soft tissue diagnoses assembled by tissue type (liposarcoma, leiomyosarcoma, undifferentiated sarcoma, and other), endosialin expression was significantly correlated with a better outcome. Endosialin expression was highest in liposarcomas and lowest in leiomyosarcomas. A robust correlation between protein and gene expression data for both endosialin and PDGFR-βwas observed. Endosialin expression positively correlated with PDGFR-βand heparin sulphate proteoglycan 2 and negatively correlated with carbonic anhydrase IX. Endosialin likely interacts with a network of extracellular and hypoxia activated proteins in sarcomas and other tumor types. Since expression does vary across histologic groups, endosialin may represent a selective target in soft tissue sarcomas.

UCSCXenaShiny: An R Package for Exploring and Analyzing UCSC Xena Public Datasets in Web Browser

10.20944/preprints202007.0179.v1 ◽

2020 ◽

Author(s):

Shixiang Wang ◽

Yi Xiong ◽

Kai Gu ◽

Longfei Zhao ◽

Yin Li ◽

...

Keyword(s):

R Package ◽

Data Availability ◽

Analysis Tool ◽

Omics Data ◽

Analysis Framework ◽

Web Browser ◽

Research Opportunities ◽

Public Projects ◽

R Shiny ◽

Public Datasets

Motivation: UCSC Xena platform provides huge amounts of processed cancer omics data from big public projects like TCGA or individual reserach groups for enabling unprecedented research opportunities. In 2019, we developed UCSCXenaTools, an R package for retrieval of UCSC Xena data. However, an easier dataset exploration and analysis tool is still lack, especially for researchers without programming experience. Results: We develop UCSCXenaShiny, an R Shiny package to quickly explore, download all datasets from UCSC Xena data hubs. In addiction, a module based analysis framework is constructed to analyze and visualize data. Availability: https://github.com/openbiox/UCSCXenaShiny or https://cran.r-project.org/package=UCSCXenaShiny.

rPanglaoDB: an R package to download and merge labeled single-cell RNA-seq data from the PanglaoDB database

10.1101/2021.05.28.446161 ◽

2021 ◽

Author(s):

Daniel Osorio ◽

Marieke Lydia Kuijjer ◽

James J. Cai

Keyword(s):

Single Cell ◽

Cell Types ◽

R Package ◽

Rna Seq ◽

Cell Type ◽

Sequencing Data ◽

Single Experiment ◽

Tissue Samples ◽

Molecular Phenotypes ◽

Public Datasets

Motivation: Characterizing cells with rare molecular phenotypes is one of the promises of high throughput single-cell RNA sequencing (scRNA-seq) techniques. However, collecting enough cells with the desired molecular phenotype in a single experiment is challenging, requiring several samples preprocessing steps to filter and collect the desired cells experimentally before sequencing. Data integration of multiple public single-cell experiments stands as a solution for this problem, allowing the collection of enough cells exhibiting the desired molecular signatures. By increasing the sample size of the desired cell type, this approach enables a robust cell type transcriptome characterization. Results: Here, we introduce rPanglaoDB, an R package to download and merge the uniformly processed and annotated scRNA-seq data provided by the PanglaoDB database. To show the potential of rPanglaoDB for collecting rare cell types by integrating multiple public datasets, we present a biological application collecting and characterizing a set of 157 fibrocytes. Fibrocytes are a rare monocyte-derived cell type, that exhibits both the inflammatory features of macrophages and the tissue remodeling properties of fibroblasts. This constitutes the first fibrocytes' unbiased transcriptome profile report. We compared the transcriptomic profile of the fibrocytes against the fibroblasts collected from the same tissue samples and confirm their associated relationship with healing processes in tissue damage and infection through the activation of the prostaglandin biosynthesis and regulation pathway. Availability and Implementation: rPanglaoDB is implemented as an R package available through the CRAN repositories https://CRAN.R-project.org/package=rPanglaoDB.

Meta-Analysis of Transcriptome-Wide Association Studies across 13 Brain Tissues Identified Novel Clusters of Genes Associated with Nicotine Addiction

Genes ◽

10.3390/genes13010037 ◽

2021 ◽

Vol 13 (1) ◽

pp. 37

Author(s):

Zhenyao Ye ◽

Chen Mo ◽

Hongjie Ke ◽

Qi Yan ◽

Chixiang Chen ◽

...

Keyword(s):

Association Studies ◽

Meta Analysis ◽

Tissue Expression ◽

Nicotine Addiction ◽

Reference Panel ◽

Strong Linkage Disequilibrium ◽

Genome Wide Association Studies ◽

Tissue Specific ◽

Eqtl Data ◽

Brain Tissues

Genome-wide association studies (GWAS) have identified and reproduced thousands of diseases associated loci, but many of them are not directly interpretable due to the strong linkage disequilibrium among variants. Transcriptome-wide association studies (TWAS) incorporated expression quantitative trait loci (eQTL) cohorts as a reference panel to detect associations with the phenotype at the gene level and have been gaining popularity in recent years. For nicotine addiction, several important susceptible genetic variants were identified by GWAS, but TWAS that detected genes associated with nicotine addiction and unveiled the underlying molecular mechanism were still lacking. In this study, we used eQTL data from the Genotype-Tissue Expression (GTEx) consortium as a reference panel to conduct tissue-specific TWAS on cigarettes per day (CPD) over thirteen brain tissues in two large cohorts: UK Biobank (UKBB; number of participants (N) = 142,202) and the GWAS & Sequencing Consortium of Alcohol and Nicotine use (GSCAN; N = 143,210), then meta-analyzing the results across tissues while considering the heterogeneity across tissues. We identified three major clusters of genes with different meta-patterns across tissues consistent in both cohorts, including homogenous genes associated with CPD in all brain tissues; partially homogeneous genes associated with CPD in cortex, cerebellum, and hippocampus tissues; and, lastly, the tissue-specific genes associated with CPD in only a few specific brain tissues. Downstream enrichment analyses on each gene cluster identified unique biological pathways associated with CPD and provided important biological insights into the regulatory mechanism of nicotine dependence in the brain.

VarGen: an R package for disease-associated variant discovery and annotation

Bioinformatics ◽

10.1093/bioinformatics/btz930 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2626-2627

Author(s):

Corentin Molitor ◽

Matt Brember ◽

Fady Mohareb

Keyword(s):

Association Studies ◽

Genetic Disorders ◽

R Package ◽

Tissue Expression ◽

Mendelian Inheritance ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Variant Discovery ◽

Genome Wide ◽

High Quality Information

Abstract Summary Over the past decade, there has been an exponential increase in the amount of disease-related genomic data available in public databases. However, this high-quality information is spread across independent sources and researchers often need to access these separately. Hence, there is a growing need for tools that gather and compile this information in an easy and automated manner. Here, we present ‘VarGen’, an easy-to-use, customizable R package that fetches, annotates and rank variants related to diseases and genetic disorders, using a collection public databases (viz. Online Mendelian Inheritance in Man, the Functional Annotation of the Mammalian genome 5, the Genotype-Tissue Expression and the Genome Wide Association Studies catalog). This package is also capable of annotating these variants to identify the most impactful ones. We expect that this tool will benefit the research of variant-disease relationships. Availability and implementation VarGen is open-source and freely available via GitHub: https://github.com/MCorentin/VarGen. The software is implemented as an R package and is supported on Linux, MacOS and Windows. Supplementary information Supplementary data are available at Bioinformatics online.

Integrating biological knowledge and gene expression data using pathway-guided random forests: a benchmarking study

Bioinformatics ◽

10.1093/bioinformatics/btaa483 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4301-4308

Author(s):

Stephan Seifert ◽

Sven Gundlach ◽

Olaf Junge ◽

Silke Szymczak

Keyword(s):

Gene Expression ◽

Computational Models ◽

Hybrid Approach ◽

Disease Status ◽

R Package ◽

Gene Expression Omnibus ◽

Functional Enrichment ◽

Supplementary Information ◽

Biological Knowledge ◽

Functional Relationships

Abstract Motivation High-throughput technologies allow comprehensive characterization of individuals on many molecular levels. However, training computational models to predict disease status based on omics data is challenging. A promising solution is the integration of external knowledge about structural and functional relationships into the modeling process. We compared four published random forest-based approaches using two simulation studies and nine experimental datasets. Results The self-sufficient prediction error approach should be applied when large numbers of relevant pathways are expected. The competing methods hunting and learner of functional enrichment should be used when low numbers of relevant pathways are expected or the most strongly associated pathways are of interest. The hybrid approach synthetic features is not recommended because of its high false discovery rate. Availability and implementation An R package providing functions for data analysis and simulation is available at GitHub (https://github.com/szymczak-lab/PathwayGuidedRF). An accompanying R data package (https://github.com/szymczak-lab/DataPathwayGuidedRF) stores the processed and quality controlled experimental datasets downloaded from Gene Expression Omnibus (GEO). Supplementary information Supplementary data are available at Bioinformatics online.

A disease risk index for patients undergoing allogeneic stem cell transplantation

Blood ◽

10.1182/blood-2012-03-418202 ◽

2012 ◽

Vol 120 (4) ◽

pp. 905-913 ◽

Cited By ~ 204

Author(s):

Philippe Armand ◽

Christopher J. Gibson ◽

Corey Cutler ◽

Vincent T. Ho ◽

John Koreth ◽

...

Keyword(s):

Overall Survival ◽

Disease Risk ◽

Risk Index ◽

Progression Free Survival ◽

Disease Status ◽

Risk Groups ◽

Disease States ◽

Comorbidity Index ◽

Available Information ◽

Conditioning Intensity

Abstract The outcome of allogeneic HSCT varies considerably by the disease and remission status at the time of transplantation. Any retrospective or prospective HSCT study that enrolls patients across disease types must account for this heterogeneity; yet, current methods are neither standardized nor validated. We conducted a retrospective study of 1539 patients who underwent transplantation at Dana-Farber Cancer Institute/Brigham and Women's Hospital from 2000 to 2009. Using multivariable models for overall survival, we created a disease risk index. This tool uses readily available information about disease and disease status to categorize patients into 4 risk groups with significantly different overall survival and progression-free survival on the basis of primarily differences in the relapse risk. This scheme applies regardless of conditioning intensity, is independent of comorbidity index, and was validated in an independent cohort of 672 patients from the Fred Hutchinson Cancer Research Center. This simple and validated scheme could be used to risk-stratify patients in both retrospective and prospective HSCT studies, to calibrate HSCT outcomes across studies and centers, and to promote the design of HSCT clinical trials that enroll patients across diseases and disease states, increasing our ability to study nondisease-specific outcomes in HSCT.