scholarly journals Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues

2021 ◽  
Author(s):  
Ariel DH Gewirtz ◽  
F William Townes ◽  
Barbara E Engelhardt

Expression quantitative trait loci (eQTLs), or single nucleotide polymorphisms (SNPs) that affect average gene expression levels, provide important insights into context-specific gene regulation. Classic eQTL analyses use one-to-one association tests, which test gene-variant pairs individually and ignore correlations induced by gene regulatory networks and linkage disequilibrium. Probabilistic topic models, such as latent Dirichlet allocation, estimate latent topics for a collection of count observations. Prior multi-modal frameworks that bridge genotype and expression data assume matched sample numbers between modalities. However, many data sets have a nested structure where one individual has several associated gene expression samples and a single germline genotype vector. Here, we build a telescoping bimodal latent Dirichlet allocation (TBLDA) framework to learn shared topics across gene expression and genotype data that allows multiple RNA-sequencing samples to correspond to a single individual's genotype. By using raw count data, our model avoids possible adulteration via normalization procedures. Ancestral structure is captured in a genotype-specific latent space, effectively removing it from shared components. Using GTEx v8 expression data across ten tissues and genotype data, we show that the estimated topics capture meaningful and robust biological signal in both modalities, and identify associations within and across tissue types. We identify 53,358 cis-eQTLs and 1,173 trans-eQTLs by conducting eQTL mapping between the most informative features in each topic. Our TBLDA model is able to identify associations using raw sequencing count data when the samples in two separate data modalities are matched one-to-many, as is often the case in biological data. All software is available at: https://github.com/gewirtz/TBLDA

Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 772
Author(s):  
Seonghun Kim ◽  
Seockhun Bae ◽  
Yinhua Piao ◽  
Kyuri Jo

Genomic profiles of cancer patients such as gene expression have become a major source to predict responses to drugs in the era of personalized medicine. As large-scale drug screening data with cancer cell lines are available, a number of computational methods have been developed for drug response prediction. However, few methods incorporate both gene expression data and the biological network, which can harbor essential information about the underlying process of the drug response. We proposed an analysis framework called DrugGCN for prediction of Drug response using a Graph Convolutional Network (GCN). DrugGCN first generates a gene graph by combining a Protein-Protein Interaction (PPI) network and gene expression data with feature selection of drug-related genes, and the GCN model detects the local features such as subnetworks of genes that contribute to the drug response by localized filtering. We demonstrated the effectiveness of DrugGCN using biological data showing its high prediction accuracy among the competing methods.


Circulation ◽  
2015 ◽  
Vol 132 (suppl_3) ◽  
Author(s):  
Martin I Sigurdsson ◽  
Mahyar Heydarpour ◽  
Louis Saddic ◽  
Tzuu-Wang Chang ◽  
Stanton K Shernan ◽  
...  

Introduction: The majority of information on the genetic background of atrial fibrillation (AF) results from genomic DNA variant analysis without consideration of tissue expression. Hypothesis: Analysis of tissue-specific gene expression in left atrium (LA) can further understanding of the molecular mechanism of identified AF risk variants, and identify novel genes and gene variants associated with AF. Methods: We isolated mRNA from samples of the LA free wall taken during mitral valve surgery in 62 Caucasian individuals. Gene expression in the LA was compared between patients who did and did not have post-operative AF (poAF) using high-throughput RNA expression. Using genotypes of 1.4 million single nucleotide polymorphisms (SNP) we performed cis expression quantifying trait loci (eQTL) analysis, correlating gene expression of each gene with the genotypes of adjacent (<1Mbp) SNPs. Results: We identified 23 differentially expressed genes in the LA of patients with poAF, including three potassium channel genes (KCNA7, KCNH8 and KCNK17). The largest expression difference was in LOC645323, a long non-coding RNA. The expression of PITX2, ZFHX3 and KCNN3, previously shown to be associated with AF, did not differ between patients with and without poAF. We identified 12,476 cis eQTL relationships in the LA, several of those included genetic regions and genes previously associated with AF. We confirmed an eQTL relationship between rs3744029 genotype and the expression of MYOZ1. Furthermore we describe a novel eQTL relationship between rs6795970 genotype and the expression of the SCN10A gene. Conclusions: We have analysed the human LA expression via high-throughput RNA sequencing, and identified novel genes and gene variants likely involved in the molecular pathophysiology of AF.


2021 ◽  
Author(s):  
Giulia Zancolli ◽  
Maarten Reijnders ◽  
Robert Waterhouse ◽  
Marc Robinson-Rechavi

Animals have repeatedly evolved specialized organs and anatomical structures to produce and deliver a cocktail of potent bioactive molecules to subdue prey or predators: venom. This makes it one of the most widespread convergent functions in the animal kingdom. Whether animals have adopted the same genetic toolkit to evolved venom systems is a fascinating question that still eludes us. Here, we performed the first comparative analysis of venom gland transcriptomes from 20 venomous species spanning the main Metazoan lineages, to test whether different animals have independently adopted similar molecular mechanisms to perform the same function. We found a strong convergence in gene expression profiles, with venom glands being more similar to each other than to any other tissue from the same species, and their differences closely mirroring the species phylogeny. Although venom glands secrete some of the fastest evolving molecules (toxins), their gene expression does not evolve faster than evolutionarily older tissues. We found 15 venom gland specific gene modules enriched in endoplasmic reticulum stress and unfolded protein response pathways, indicating that animals have independently adopted stress response mechanisms to cope with mass production of toxins. This, in turns, activates regulatory networks for epithelial development, cell turnover and maintenance which seem composed of both convergent and lineage-specific factors, possibly reflecting the different developmental origins of venom glands. This study represents the first step towards an understanding of the molecular mechanisms underlying the repeated evolution of one of the most successful adaptive traits in the animal kingdom.


Biotechnology ◽  
2019 ◽  
pp. 265-304
Author(s):  
David Correa Martins Jr. ◽  
Fabricio Martins Lopes ◽  
Shubhra Sankar Ray

The inference of Gene Regulatory Networks (GRNs) is a very challenging problem which has attracted increasing attention since the development of high-throughput sequencing and gene expression measurement technologies. Many models and algorithms have been developed to identify GRNs using mainly gene expression profile as data source. As the gene expression data usually has limited number of samples and inherent noise, the integration of gene expression with several other sources of information can be vital for accurately inferring GRNs. For instance, some prior information about the overall topological structure of the GRN can guide inference techniques toward better results. In addition to gene expression data, recently biological information from heterogeneous data sources have been integrated by GRN inference methods as well. The objective of this chapter is to present an overview of GRN inference models and techniques with focus on incorporation of prior information such as, global and local topological features and integration of several heterogeneous data sources.


2020 ◽  
pp. 1052-1075 ◽  
Author(s):  
Dina Elsayad ◽  
A. Ali ◽  
Howida A. Shedeed ◽  
Mohamed F. Tolba

The gene expression analysis is an important research area of Bioinformatics. The gene expression data analysis aims to understand the genes interacting phenomena, gene functionality and the genes mutations effect. The Gene regulatory network analysis is one of the gene expression data analysis tasks. Gene regulatory network aims to study the genes interactions topological organization. The regulatory network is critical for understanding the pathological phenotypes and the normal cell physiology. There are many researches that focus on gene regulatory network analysis but unfortunately some algorithms are affected by data size. Where, the algorithm runtime is proportional to the data size, therefore, some parallel algorithms are presented to enhance the algorithms runtime and efficiency. This work presents a background, mathematical models and comparisons about gene regulatory networks analysis different techniques. In addition, this work proposes Parallel Architecture for Gene Regulatory Network (PAGeneRN).


Sign in / Sign up

Export Citation Format

Share Document