LncGSEA: a versatile tool to infer lncRNA associated pathways from large-scale cancer transcriptome sequencing data

Abstract Background Long non-coding RNAs (lncRNAs) are a growing focus in cancer research. Deciphering pathways influenced by lncRNAs is important to understand their role in cancer. Although knock-down or overexpression of lncRNAs followed by gene expression profiling in cancer cell lines are established approaches to address this problem, these experimental data are not available for a majority of the annotated lncRNAs. Results As a surrogate, we present lncGSEA, a convenient tool to predict the lncRNA associated pathways through Gene Set Enrichment Analysis of gene expression profiles from large-scale cancer patient samples. We demonstrate that lncGSEA is able to recapitulate lncRNA associated pathways supported by literature and experimental validations in multiple cancer types. Conclusions LncGSEA allows researchers to infer lncRNA regulatory pathways directly from clinical samples in oncology. LncGSEA is written in R, and is freely accessible at https://github.com/ylab-hi/lncGSEA.

Download Full-text

Discovering novel driver mutations from pan-cancer analysis of mutational and gene expression profiles

PLoS ONE ◽

10.1371/journal.pone.0242780 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0242780

Author(s):

Houriiyah Tegally ◽

Kevin H. Kensler ◽

Zahra Mungloo-Dilmohamud ◽

Anisah W. Ghoorah ◽

Timothy R. Rebbeck ◽

...

Keyword(s):

Gene Expression ◽

Large Scale ◽

Cancer Genomics ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Driver Mutations ◽

Literature Mining ◽

Cancer Genes ◽

Multiple Cancer ◽

Driver Genes

As the genomic profile across cancers varies from person to person, patient prognosis and treatment may differ based on the mutational signature of each tumour. Thus, it is critical to understand genomic drivers of cancer and identify potential mutational commonalities across tumors originating at diverse anatomical sites. Large-scale cancer genomics initiatives, such as TCGA, ICGC and GENIE have enabled the analysis of thousands of tumour genomes. Our goal was to identify new cancer-causing mutations that may be common across tumour sites using mutational and gene expression profiles. Genomic and transcriptomic data from breast, ovarian, and prostate cancers were aggregated and analysed using differential gene expression methods to identify the effect of specific mutations on the expression of multiple genes. Mutated genes associated with the most differentially expressed genes were considered to be novel candidates for driver mutations, and were validated through literature mining, pathway analysis and clinical data investigation. Our driver selection method successfully identified 116 probable novel cancer-causing genes, with 4 discovered in patients having no alterations in any known driver genes: MXRA5, OBSCN, RYR1, and TG. The candidate genes previously not officially classified as cancer-causing showed enrichment in cancer pathways and in cancer diseases. They also matched expectations pertaining to properties of cancer genes, for instance, showing larger gene and protein lengths, and having mutation patterns suggesting oncogenic or tumor suppressor properties. Our approach allows for the identification of novel putative driver genes that are common across cancer sites using an unbiased approach without any a priori knowledge on pathways or gene interactions and is therefore an agnostic approach to the identification of putative common driver genes acting at multiple cancer sites.

Download Full-text

G2S3: A gene graph-based imputation method for single-cell RNA sequencing data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009029 ◽

2021 ◽

Vol 17 (5) ◽

pp. e1009029

Author(s):

Weimiao Wu ◽

Yunqing Liu ◽

Qile Dai ◽

Xiting Yan ◽

Zuoheng Wang

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Sequencing Data ◽

High Data ◽

Study Gene Expression ◽

Single Cell Rna Sequencing

Single-cell RNA sequencing technology provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses in single-cell transcriptomic studies. We propose a new method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and ten existing imputation methods to eight single-cell transcriptomic datasets and compared their performance. Our results demonstrated that G2S3 has superior overall performance in recovering gene expression, identifying cell subtypes, reconstructing cell trajectories, identifying differentially expressed genes, and recovering gene regulatory and correlation relationships. Moreover, G2S3 is computationally efficient for imputation in large-scale single-cell transcriptomic datasets.

Download Full-text

Post-modified non-negative matrix factorization for deconvoluting the gene expression profiles of specific cell types from heterogeneous clinical samples based on RNA-sequencing data

Journal of Chemometrics ◽

10.1002/cem.2929 ◽

2017 ◽

Vol 32 (11) ◽

pp. e2929 ◽

Cited By ~ 1

Author(s):

Yuan Liu ◽

Yu Liang ◽

Qifan Kuang ◽

Fanfan Xie ◽

Yingyi Hao ◽

...

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Matrix Factorization ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cell Types ◽

Clinical Samples ◽

Specific Cell ◽

Sequencing Data ◽

Non Negative Matrix Factorization

Download Full-text

scMatch: a single-cell gene expression profile annotation tool using reference datasets

Bioinformatics ◽

10.1093/bioinformatics/btz292 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4688-4695 ◽

Cited By ~ 22

Author(s):

Rui Hou ◽

Elena Denisenko ◽

Alistair R R Forrest

Keyword(s):

Gene Expression ◽

Single Cell ◽

Large Scale ◽

Expression Profiles ◽

Single Cells ◽

Gene Expression Profiles ◽

Supplementary Information ◽

Annotation Tool ◽

Sequencing Data ◽

Multiple Sources

Abstract Motivation Single-cell RNA sequencing (scRNA-seq) measures gene expression at the resolution of individual cells. Massively multiplexed single-cell profiling has enabled large-scale transcriptional analyses of thousands of cells in complex tissues. In most cases, the true identity of individual cells is unknown and needs to be inferred from the transcriptomic data. Existing methods typically cluster (group) cells based on similarities of their gene expression profiles and assign the same identity to all cells within each cluster using the averaged expression levels. However, scRNA-seq experiments typically produce low-coverage sequencing data for each cell, which hinders the clustering process. Results We introduce scMatch, which directly annotates single cells by identifying their closest match in large reference datasets. We used this strategy to annotate various single-cell datasets and evaluated the impacts of sequencing depth, similarity metric and reference datasets. We found that scMatch can rapidly and robustly annotate single cells with comparable accuracy to another recent cell annotation tool (SingleR), but that it is quicker and can handle larger reference datasets. We demonstrate how scMatch can handle large customized reference gene expression profiles that combine data from multiple sources, thus empowering researchers to identify cell populations in any complex tissue with the desired precision. Availability and implementation scMatch (Python code) and the FANTOM5 reference dataset are freely available to the research community here https://github.com/forrest-lab/scMatch. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Modulation of the Immune Response by Deferasirox in Myelodysplastic Syndrome Patients

Pharmaceuticals ◽

10.3390/ph14010041 ◽

2021 ◽

Vol 14 (1) ◽

pp. 41

Author(s):

Hana Votavova ◽

Zuzana Urbanova ◽

David Kundrat ◽

Michaela Dostalova Merkerova ◽

Martin Vostry ◽

...

Keyword(s):

Gene Expression ◽

Immune Response ◽

Blood Cell ◽

Myelodysplastic Syndrome ◽

Molecular Mechanisms ◽

Expression Profiles ◽

Adaptive Immune Response ◽

Gene Expression Profiles ◽

Iron Chelator ◽

Multiple Cancer

Deferasirox (DFX) is an oral iron chelator used to reduce iron overload (IO) caused by frequent blood cell transfusions in anemic myelodysplastic syndrome (MDS) patients. To study the molecular mechanisms by which DFX improves outcome in MDS, we analyzed the global gene expression in untreated MDS patients and those who were given DFX treatment. The gene expression profiles of bone marrow CD34+ cells were assessed by whole-genome microarrays. Initially, differentially expressed genes (DEGs) were determined between patients with normal ferritin levels and those with IO to address the effect of excessive iron on cellular pathways. These DEGs were annotated to Gene Ontology terms associated with cell cycle, apoptosis, adaptive immune response and protein folding and were enriched in cancer-related pathways. The deregulation of multiple cancer pathways in iron-overloaded patients suggests that IO is a cofactor favoring the progression of MDS. The DEGs between patients with IO and those treated with DFX were involved predominantly in biological processes related to the immune response and inflammation. These data indicate DFX modulates the immune response mainly via neutrophil-related genes. Suppression of negative regulators of blood cell differentiation essential for cell maturation and upregulation of heme metabolism observed in DFX-treated patients may contribute to the hematopoietic improvement.

Download Full-text

Analysis of blood-based gene expression in idiopathic Parkinson disease

Neurology ◽

10.1212/wnl.0000000000004516 ◽

2017 ◽

Vol 89 (16) ◽

pp. 1676-1683 ◽

Cited By ~ 36

Author(s):

Ron Shamir ◽

Christine Klein ◽

David Amar ◽

Eva-Juliane Vollstedt ◽

Michael Bonin ◽

...

Keyword(s):

Gene Expression ◽

Parkinson Disease ◽

Gene Networks ◽

Large Scale ◽

Expression Profiles ◽

Area Under The Curve ◽

Gene Expression Profiles ◽

Gene Signature ◽

Gene Profiles ◽

Independent Test

Objective:To examine whether gene expression analysis of a large-scale Parkinson disease (PD) patient cohort produces a robust blood-based PD gene signature compared to previous studies that have used relatively small cohorts (≤220 samples).Methods:Whole-blood gene expression profiles were collected from a total of 523 individuals. After preprocessing, the data contained 486 gene profiles (n = 205 PD, n = 233 controls, n = 48 other neurodegenerative diseases) that were partitioned into training, validation, and independent test cohorts to identify and validate a gene signature. Batch-effect reduction and cross-validation were performed to ensure signature reliability. Finally, functional and pathway enrichment analyses were applied to the signature to identify PD-associated gene networks.Results:A gene signature of 100 probes that mapped to 87 genes, corresponding to 64 upregulated and 23 downregulated genes differentiating between patients with idiopathic PD and controls, was identified with the training cohort and successfully replicated in both an independent validation cohort (area under the curve [AUC] = 0.79, p = 7.13E–6) and a subsequent independent test cohort (AUC = 0.74, p = 4.2E–4). Network analysis of the signature revealed gene enrichment in pathways, including metabolism, oxidation, and ubiquitination/proteasomal activity, and misregulation of mitochondria-localized genes, including downregulation of COX4I1, ATP5A1, and VDAC3.Conclusions:We present a large-scale study of PD gene expression profiling. This work identifies a reliable blood-based PD signature and highlights the importance of large-scale patient cohorts in developing potential PD biomarkers.

Download Full-text

Discovering Distinct Patterns in Gene Expression Profiles

Journal of Integrative Bioinformatics ◽

10.1515/jib-2008-105 ◽

2008 ◽

Vol 5 (2) ◽

Cited By ~ 1

Author(s):

Li Teng ◽

Laiwan Chan

Keyword(s):

Gene Expression ◽

Large Scale ◽

Expression Profiles ◽

Expression Patterns ◽

Gene Expression Profiles ◽

Clustering Methods ◽

Gene Expressions ◽

Real Gene ◽

Large Scale Dataset ◽

Coexpressed Genes

SummaryTraditional analysis of gene expression profiles use clustering to find groups of coexpressed genes which have similar expression patterns. However clustering is time consuming and could be diffcult for very large scale dataset. We proposed the idea of Discovering Distinct Patterns (DDP) in gene expression profiles. Since patterns showing by the gene expressions reveal their regulate mechanisms. It is significant to find all different patterns existing in the dataset when there is little prior knowledge. It is also a helpful start before taking on further analysis. We propose an algorithm for DDP by iteratively picking out pairs of gene expression patterns which have the largest dissimilarities. This method can also be used as preprocessing to initialize centers for clustering methods, like K-means. Experiments on both synthetic dataset and real gene expression datasets show our method is very effective in finding distinct patterns which have gene functional significance and is also effcient.

Download Full-text

CFTR ΔF508 mutation has minimal effect on the gene expression profile of differentiated human airway epithelia

AJP Lung Cellular and Molecular Physiology ◽

10.1152/ajplung.00065.2005 ◽

2005 ◽

Vol 289 (4) ◽

pp. L545-L553 ◽

Cited By ~ 29

Author(s):

Joseph Zabner ◽

Todd E. Scheetz ◽

Hakeem G. Almabrazi ◽

Thomas L. Casavant ◽

Jian Huang ◽

...

Keyword(s):

Gene Expression ◽

Cystic Fibrosis ◽

Large Scale ◽

Expression Profiles ◽

Expression Patterns ◽

Primary Cultures ◽

Gene Expression Profiles ◽

Filter Method ◽

Tissue Destruction ◽

Airway Epithelia

Cystic fibrosis (CF) is caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR), an epithelial chloride channel regulated by phosphorylation. Most of the disease-associated morbidity is the consequence of chronic lung infection with progressive tissue destruction. As an approach to investigate the cellular effects of CFTR mutations, we used large-scale microarray hybridization to contrast the gene expression profiles of well-differentiated primary cultures of human CF and non-CF airway epithelia grown under resting culture conditions. We surveyed the expression profiles for 10 non-CF and 10 ΔF508 homozygote samples. Of the 22,283 genes represented on the Affymetrix U133A GeneChip, we found evidence of significant changes in expression in 24 genes by two-sample t-test ( P < 0.00001). A second, three-filter method of comparative analysis found no significant differences between the groups. The levels of CFTR mRNA were comparable in both groups. There were no significant differences in the gene expression patterns between male and female CF specimens. There were 18 genes with significant increases and 6 genes with decreases in CF relative to non-CF samples. Although the function of many of the differentially expressed genes is unknown, one transcript that was elevated in CF, the KCl cotransporter (KCC4), is a candidate for further study. Overall, the results indicate that CFTR dysfunction has little direct impact on airway epithelial gene expression in samples grown under these conditions.

Download Full-text

Dynamical consequences of regional heterogeneity in the brain’s transcriptional landscape

10.1101/2020.10.28.359943 ◽

2020 ◽

Cited By ~ 1

Author(s):

Gustavo Deco ◽

Kevin Aquino ◽

Aurina Arnatkevičiūtė ◽

Stuart Oldham ◽

Kristina Sabaroedin ◽

...

Keyword(s):

Gene Expression ◽

Large Scale ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Global Gene Expression ◽

Brain Regions ◽

Biophysical Model ◽

Neuronal Dynamics ◽

Regional Heterogeneity ◽

Magnetic Resonance Imaging Mri

AbstractBrain regions vary in their molecular and cellular composition, but how this heterogeneity shapes neuronal dynamics is unclear. Here, we investigate the dynamical consequences of regional heterogeneity using a biophysical model of whole-brain functional magnetic resonance imaging (MRI) dynamics in humans. We show that models in which transcriptional variations in excitatory and inhibitory receptor (E:I) gene expression constrain regional heterogeneity more accurately reproduce the spatiotemporal structure of empirical functional connectivity estimates than do models constrained by global gene expression profiles and MRI-derived estimates of myeloarchitecture. We further show that regional heterogeneity is essential for yielding both ignition-like dynamics, which are thought to support conscious processing, and a wide variance of regional activity timescales, which supports a broad dynamical range. We thus identify a key role for E:I heterogeneity in generating complex neuronal dynamics and demonstrate the viability of using transcriptional data to constrain models of large-scale brain function.

Download Full-text

Sex differences in viral entry protein expression and host transcript responses to SARS-CoV-2

10.21203/rs.3.rs-100914/v2 ◽

2020 ◽

Author(s):

Mengying Sun ◽

Rama Shankar ◽

Meehyun Ko ◽

Christopher Daniel Chang ◽

Shan-Ju Yeh ◽

...

Keyword(s):

Gene Expression ◽

Sex Differences ◽

Viral Entry ◽

Large Scale ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Scale Analysis ◽

Transcriptional Responses ◽

Deconvolution Analysis ◽

Large Scale Analysis

Abstract Epidemiological studies suggest that men exhibit a higher mortality rate to COVID-19 than women, yet the underlying biology is largely unknown. Here, we seek to delineate sex differences in the gene expression of viral entry proteins ACE2 and TMPRSS2, and host transcriptional responses to SARS-CoV-2 through large-scale analysis of genomic and clinical data. We first compiled 220,000 human gene expression profiles from three databases and completed the meta-information through machine learning and manual annotation. Large scale analysis of these profiles indicated that male samples show higher expression levels of ACE2 and TMPRSS2 than female samples, especially in the older group (>60 years) and in the kidney. Subsequent analysis of 6,031 COVID-19 patients at Mount Sinai Health System revealed that men have significantly higher creatinine levels, an indicator of impaired kidney function. Further analysis of 782 COVID-19 patient gene expression profiles taken from upper airway and blood suggested men and women present distinct expression changes. Computational deconvolution analysis of these profiles revealed male COVID-19 patients have enriched kidney-specific mesangial cells in blood compared to healthy patients. Together, this study suggests biological differences in the kidney between sexes may contribute to sex disparity in COVID-19.

Download Full-text