scholarly journals Four-Dimensional Sparse Bayesian Tensor Decomposition for Gene Expression Data

2020 ◽  
Author(s):  
Christopher C. Gill ◽  
Jonathan Marchini

AbstractDisease etiology may be better understood through the study of gene expression in four dimensional (4D) experiments that consist of measurements on multiple individuals, genes, tissues and under multiple conditions or through time. We have developed a sparse Bayesian four dimensional tensor decomposition method aimed at uncovering latent components or gene networks that could be linked to genetic variation. We used a Variational Bayes algorithm to fit the model which provides fast and accurate analysis. In this brief note we illustrate the utility of the method using simulated datasets, and show that when 4D data is available our method shows improved performance in estimating the true structure in the dataset, when compared to using a 3D method on a single slice of the 4D dataset. We also compare the results of the 4D method to that of the 3D method on a suitable unfolding of the dataset, demonstrating that similar performance is observed in this case, while the 4D method accurately recovers the additional structure in the data. We provide software that implements the method in R.

2018 ◽  
Author(s):  
Satesh Ramdhani ◽  
Elisa Navarro ◽  
Evan Udine ◽  
Brian M. Schilder ◽  
Madison Parks ◽  
...  

AbstractRecent human genetic studies suggest that cells of the innate immune system have a primary role in the pathogenesis of neurodegenerative diseases. However, the results from these studies often do not elucidate how the genetic variants affect the biology of these cells to modulate disease risk. Here, we applied a tensor decomposition method to uncover disease-associated gene networks linked to distal genetic variation in stimulated human monocytes and macrophages gene expression profiles. We report robust evidence that some disease-associated genetic variants affect the expression of multiple genes in trans. These include a Parkinson’s disease locus influencing the expression of genes mediated by a protease that controls lysosomal function, and Alzheimer’s disease loci influencing the expression of genes involved in type 1 interferon signaling, myeloid phagocytosis, and complement cascade pathways. Overall, we uncover gene networks in induced innate immune cells linked to disease-associated genetic variants, which may help elucidate the underlying biology of disease.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kalifa Manjang ◽  
Shailesh Tripathi ◽  
Olli Yli-Harja ◽  
Matthias Dehmer ◽  
Galina Glazko ◽  
...  

AbstractThe identification of prognostic biomarkers for predicting cancer progression is an important problem for two reasons. First, such biomarkers find practical application in a clinical context for the treatment of patients. Second, interrogation of the biomarkers themselves is assumed to lead to novel insights of disease mechanisms and the underlying molecular processes that cause the pathological behavior. For breast cancer, many signatures based on gene expression values have been reported to be associated with overall survival. Consequently, such signatures have been used for suggesting biological explanations of breast cancer and drug mechanisms. In this paper, we demonstrate for a large number of breast cancer signatures that such an implication is not justified. Our approach eliminates systematically all traces of biological meaning of signature genes and shows that among the remaining genes, surrogate gene sets can be formed with indistinguishable prognostic prediction capabilities and opposite biological meaning. Hence, our results demonstrate that none of the studied signatures has a sensible biological interpretation or meaning with respect to disease etiology. Overall, this shows that prognostic signatures are black-box models with sensible predictions of breast cancer outcome but no value for revealing causal connections. Furthermore, we show that the number of such surrogate gene sets is not small but very large.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Tiago Azevedo ◽  
Giovanna Maria Dimitri ◽  
Pietro Lió ◽  
Eric R. Gamazon

AbstractHere, we performed a comprehensive intra-tissue and inter-tissue multilayer network analysis of the human transcriptome. We generated an atlas of communities in gene co-expression networks in 49 tissues (GTEx v8), evaluated their tissue specificity, and investigated their methodological implications. UMAP embeddings of gene expression from the communities (representing nearly 18% of all genes) robustly identified biologically-meaningful clusters. Notably, new gene expression data can be embedded into our algorithmically derived models to accelerate discoveries in high-dimensional molecular datasets and downstream diagnostic or prognostic applications. We demonstrate the generalisability of our approach through systematic testing in external genomic and transcriptomic datasets. Methodologically, prioritisation of the communities in a transcriptome-wide association study of the biomarker C-reactive protein (CRP) in 361,194 individuals in the UK Biobank identified genetically-determined expression changes associated with CRP and led to considerably improved performance. Furthermore, a deep learning framework applied to the communities in nearly 11,000 tumors profiled by The Cancer Genome Atlas across 33 different cancer types learned biologically-meaningful latent spaces, representing metastasis (p < 2.2 × 10−16) and stemness (p < 2.2 × 10−16). Our study provides a rich genomic resource to catalyse research into inter-tissue regulatory mechanisms, and their downstream consequences on human disease.


Neurology ◽  
2017 ◽  
Vol 89 (16) ◽  
pp. 1676-1683 ◽  
Author(s):  
Ron Shamir ◽  
Christine Klein ◽  
David Amar ◽  
Eva-Juliane Vollstedt ◽  
Michael Bonin ◽  
...  

Objective:To examine whether gene expression analysis of a large-scale Parkinson disease (PD) patient cohort produces a robust blood-based PD gene signature compared to previous studies that have used relatively small cohorts (≤220 samples).Methods:Whole-blood gene expression profiles were collected from a total of 523 individuals. After preprocessing, the data contained 486 gene profiles (n = 205 PD, n = 233 controls, n = 48 other neurodegenerative diseases) that were partitioned into training, validation, and independent test cohorts to identify and validate a gene signature. Batch-effect reduction and cross-validation were performed to ensure signature reliability. Finally, functional and pathway enrichment analyses were applied to the signature to identify PD-associated gene networks.Results:A gene signature of 100 probes that mapped to 87 genes, corresponding to 64 upregulated and 23 downregulated genes differentiating between patients with idiopathic PD and controls, was identified with the training cohort and successfully replicated in both an independent validation cohort (area under the curve [AUC] = 0.79, p = 7.13E–6) and a subsequent independent test cohort (AUC = 0.74, p = 4.2E–4). Network analysis of the signature revealed gene enrichment in pathways, including metabolism, oxidation, and ubiquitination/proteasomal activity, and misregulation of mitochondria-localized genes, including downregulation of COX4I1, ATP5A1, and VDAC3.Conclusions:We present a large-scale study of PD gene expression profiling. This work identifies a reliable blood-based PD signature and highlights the importance of large-scale patient cohorts in developing potential PD biomarkers.


2021 ◽  
Vol 16 ◽  
Author(s):  
Min Yao ◽  
Caiyun Jiang ◽  
Chenglong Li ◽  
Yongxia Li ◽  
Shan Jiang ◽  
...  

Background: Mammalian genes are regulated at the transcriptional and post-transcriptional levels. These mechanisms may involve the direct promotion or inhibition of transcription via a regulator or post-transcriptional regulation through factors such as micro (mi)RNAs. Objective: This study aimed to construct gene regulation relationships modulated by causality inference-based miRNA-(transition factor)-(target gene) networks and analyze gene expression data to identify gene expression regulators. Methods: Mouse gene expression regulation relationships were manually curated from literature using a text mining method which was then employed to generate miRNA-(transition factor)-(target gene) networks. An algorithm was then introduced to identify gene expression regulators from transcriptome profiling data by applying enrichment analysis to these networks. Results: A total of 22,271 mouse gene expression regulation relationships were curated for 4,018 genes and 242 miRNAs. GEREA software was developed to perform the integrated analyses. We applied the algorithm to transcriptome data for synthetic miR-155 oligo-treated mouse CD4+ T-cells and confirmed that miR-155 is an important network regulator. The software was also tested on publicly available transcriptional profiling data for Salmonella infection, resulting in the identification of miR-125b as an important regulator. Conclusion: The causality inference-based miRNA-(transition factor)-(target gene) networks serve as a novel resource for gene expression regulation research, and GEREA is an effective and useful adjunct to the currently available methods. The regulatory networks and the algorithm implemented in the GEREA software package are available under a free academic license at website : http://www.thua45.cn/gerea.


2018 ◽  
Author(s):  
Αλέξανδρος Τσακογιάννης

The differences between sexes and the concept of sex determination have always fascinated, yet troubled philosophers and scientists. Among the animals that reproduce sexually, teleost fishes show a very wide repertoire of reproductive modes. Except for the gonochoristic species, fish are the only vertebrates in which hermaphroditism appears naturally. Hermaphroditism refers to the capability of an organism to reproduce both as male and female in its life cycle and there are various forms of it. In sequential hermaphroditism, an individual begins as female first and then can change sex to become male (protogyny), or vice versa (protandry). The diverse sex-phenotypes of fish are regulated by a variety of sex determination mechanisms, along a continuum of environmental and heritable factors. The vast majority of sexually dimorphic traits result from the differential expression of genes that are present in both sexes. To date, studies regarding the sex-specific differences in gene expression have been conducted mainly in sex determination systems of model fish species that are well characterized at the genomic level, with distinguishable heteromorphic sex chromosomes, exhibiting genetic sex determination and gonochorism. Among teleosts, the Sparidae family is considered to be one of the most diversified families regarding its reproductive systems, and thus is a unique model for comparative studies to understand the molecular mechanisms underlying different sexual motifs. In this study, using RNA sequencing, we studied the transcriptome from gonads and brains of both sexes in five sparid species, representatives of four different reproductive styles. Specifically, we explored the sex-specific expression patterns of a gonochoristic species: the common dentex Dentex dentex, two protogynous hermaphrodites: the red porgy Pagrus pagrus and the common pandora Pagellus erythrinus, the rudimentary hermaphrodite sharpsnout seabream Diplodus puntazzo, and the protandrous gilthead seabream Sparus aurata. We found minor sex-related expression differences indicating a more homogeneous and sexually plastic brain, whereas there was a plethora of sex biased gene expression in the gonads. The functional divergence of the two gonadal types is reflected in their transcriptomic profiles, in terms of the number of genes differentially expressed, as well as the expression magnitude (i.e. fold-change differences). The observation of almost double the number of up-regulated genes in males compared to females indicates a male-biased expression tendency. Focusing on the pathways and genes implicated in sex determination/differentiation, we aimed to unveil the molecular pathways through which these non-model fish species develop a masculine or a feminine character. We observed the implicated pathways and major gene families (e.g. Wnt/b-catenin pathway and Retinoic-acid signaling pathway, Notch, TGFβ) behind sex-biased expression and the recruitment of known sex-related genes either to male or female type of gonads in these fish. (e.g Dmrt1, Sox9, Sox3, Cyp19a, Filgla, Ctnnb1, Gsdf9, Stra6 etc.). We also carefully investigated the presence of genes reported to be involved in sex determination/differentiation mechanisms in other vertebrates and fish and compared their expression patterns in the species under study. The expression profiling exposed known candidate molecular-players/genes establishing the common female (Cyp19a1, Sox3, Figla, Gdf9, Cyp26a, Ctnnb1, Dnmt1, Stra6) and male identity (Dmrt1, Sox9, Dnmt3aa, Rarb, Raraa, Hdac8, Tdrd7) of the gonad in these sparids. Additionally, we focused on those contributing to a species-specific manner either to female (Wnt4a, Dmrt2a, Foxl2 etc.) or to male (Amh, Dmrt3a, Cyp11b etc.) characters, and discussed the expression patterns of factors that belong to important pathways and/or gene families in the SD context, in our species gonadal transcriptomes. Taken together, most of the studied genes form part of the cascade of sex determination, differentiation, and reproduction across teleosts. In this study, we focused on genes that are active when sex is established (sex-maintainers), revealing the basic “gene-toolkit” & gene-networks underlying functional sex in these five sparids. Comparing related species with alternative reproductive styles, we saw different combinations of genes with conserved sex-linked roles and some “handy” molecular players, in a “partially- conserved” or “modulated” network formulating the male and female phenotype. The knowledge obtained in this study and tools developed during the process have set the groundwork for future experiments that can improve the sex control of this species and help the in-deep understanding the complex process of sex differentiation in the more flexible multi-component systems as these studied here.


2020 ◽  
Author(s):  
Christopher W. Whelan ◽  
Robert E. Handsaker ◽  
Giulio Genovese ◽  
Seva Kashin ◽  
Monkol Lek ◽  
...  

AbstractTwo intriguing forms of genome structural variation (SV) – dispersed duplications, and de novo rearrangements of complex, multi-allelic loci – have long escaped genomic analysis. We describe a new way to find and characterize such variation by utilizing identity-by-descent (IBD) relationships between siblings together with high-precision measurements of segmental copy number. Analyzing whole-genome sequence data from 706 families, we find hundreds of “IBD-discordant” (IBDD) CNVs: loci at which siblings’ CNV measurements and IBD states are mathematically inconsistent. We found that commonly-IBDD CNVs identify dispersed duplications; we mapped 95 of these common dispersed duplications to their true genomic locations through family-based linkage and population linkage disequilibrium (LD), and found several to be in strong LD with genome-wide association (GWAS) signals for common diseases or gene expression variation at their revealed genomic locations. Other CNVs that were IBDD in a single family appear to involve de novo mutations in complex and multi-allelic loci; we identified 26 de novo structural mutations that had not been previously detected in earlier analyses of the same families by diverse SV analysis methods. These included a de novo mutation of the amylase gene locus and multiple de novo mutations at chromosome 15q14. Combining these complex mutations with more-conventional CNVs, we estimate that segmental mutations larger than 1kb arise in about one per 22 human meioses. These methods are complementary to previous techniques in that they interrogate genomic regions that are home to segmental duplication, high CNV allele frequencies, and multi-allelic CNVs.Author SummaryCopy number variation is an important form of genetic variation in which individuals differ in the number of copies of segments of their genomes. Certain aspects of copy number variation have traditionally been difficult to study using short-read sequencing data. For example, standard analyses often cannot tell whether the duplicated copies of a segment are located near the original copy or are dispersed to other regions of the genome. Another aspect of copy number variation that has been difficult to study is the detection of mutations in the copy number of DNA segments passed down from parents to their children, particularly when the mutations affect genome segments which already display common copy number variation in the population. We develop an analytical approach to solving these problems when sequencing data is available for all members of families with at least two children. This method is based on determining the number of parental haplotypes the two siblings share at each location in their genome, and using that information to determine the possible inheritance patterns that might explain the copy numbers we observe in each family member. We show that dispersed duplications and mutations can be identified by looking for copy number variants that do not follow these expected inheritance patterns. We use this approach to determine the location of 95 common duplications which are dispersed to distant regions of the genome, and demonstrate that these duplications are linked to genetic variants that affect disease risk or gene expression levels. We also identify a set of copy number mutations not detected by previous analyses of sequencing data from a large cohort of families, and show that repetitive and complex regions of the genome undergo frequent mutations in copy number.


Sign in / Sign up

Export Citation Format

Share Document