scholarly journals A data mining paradigm for identifying key factors in biological processes using gene expression data

2018 ◽  
Author(s):  
Jin Li ◽  
Le Zheng ◽  
Akihiko Uchiyama ◽  
Lianghua Bin ◽  
Theodora M. Mauro ◽  
...  

AbstractA large volume of biological data is being generated for studying mechanisms of various biological processes. These precious data enable large-scale computational analyses to gain biological insights. However, it remains a challenge to mine the data efficiently for knowledge discovery. The heterogeneity of these data makes it difficult to consistently integrate them, slowing down the process of biological discovery. We introduce a data processing paradigm to identify key factors in biological processes via systematic collection of gene expression datasets, primary analysis of data, and evaluation of consistent signals. To demonstrate its effectiveness, our paradigm was applied to epidermal development and identified many genes that play a potential role in this process. Besides the known epidermal development genes, a substantial proportion of the identified genes are still not supported by gain- or loss-of-function studies, yielding many novel genes for future studies. Among them, we selected a top gene for loss-of-function experimental validation and confirmed its function in epidermal differentiation, proving the ability of this paradigm to identify new factors in biological processes. In addition, this paradigm revealed many key genes in cold-induced thermogenesis using data from cold-challenged tissues, demonstrating its generalizability. This paradigm can lead to fruitful results for studying molecular mechanisms in an era of explosive accumulation of publicly available biological data.

2008 ◽  
Vol 198 (3) ◽  
pp. 489-497 ◽  
Author(s):  
Noriko Sakai ◽  
Hiromi Terami ◽  
Shinobu Suzuki ◽  
Megumi Haga ◽  
Ken Nomoto ◽  
...  

Nuclear receptor subfamily 5, group A, member 1 (NR5A1 previously known as SF-1/AD4BP) is a transcription factor involved in the development of adrenal/gonadal tissues and steroidogenic linage cell differentiation in adult somatic stem cells. To understand the cellular signaling network that regulates NR5A1 gene expression, loss of function screening with an siRNA kinome library, and gain of function screening with an addressable full-length cDNA library representing one quarter of the human genome was carried out. The NR5A1 gene expression was activated in mesenchymal stem cells by siRNA directed against protein kinase C (PKC)-δ, erb-B3, RhoGAP (ARHGAP26), and hexokinase 2, none of which were previously known to be involved in the NR5A1 gene expression. Among these, we identified crosstalk between erb-B3 and PKC-δ signaling cascades. In addition, the gain of function studies indicated that sex-determining region Y (SRY)-box 15 (SOX15), TEA domain family member 4, KIAA1257 (a gene of unknown function), ADAM metallopeptidase with thrombospondin type 1 motif 6, Josephin domain containing 1, centromere protein, TATA box-binding protein-associated factor 5-like RNA polymerase, and inducible T-cell co-stimulator activate NR5A1 gene expression. These results provide new insights into the molecular mechanisms of NR5A1 gene expression.


Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 772
Author(s):  
Seonghun Kim ◽  
Seockhun Bae ◽  
Yinhua Piao ◽  
Kyuri Jo

Genomic profiles of cancer patients such as gene expression have become a major source to predict responses to drugs in the era of personalized medicine. As large-scale drug screening data with cancer cell lines are available, a number of computational methods have been developed for drug response prediction. However, few methods incorporate both gene expression data and the biological network, which can harbor essential information about the underlying process of the drug response. We proposed an analysis framework called DrugGCN for prediction of Drug response using a Graph Convolutional Network (GCN). DrugGCN first generates a gene graph by combining a Protein-Protein Interaction (PPI) network and gene expression data with feature selection of drug-related genes, and the GCN model detects the local features such as subnetworks of genes that contribute to the drug response by localized filtering. We demonstrated the effectiveness of DrugGCN using biological data showing its high prediction accuracy among the competing methods.


2019 ◽  
Author(s):  
Alessandro Greco ◽  
Jon Sanchez Valle ◽  
Vera Pancaldi ◽  
Anaïs Baudot ◽  
Emmanuel Barillot ◽  
...  

AbstractMatrix Factorization (MF) is an established paradigm for large-scale biological data analysis with tremendous potential in computational biology.We here challenge MF in depicting the molecular bases of epidemiologically described Disease-Disease (DD) relationships. As use case, we focus on the inverse comorbidity association between Alzheimer’s disease (AD) and lung cancer (LC), described as a lower than expected probability of developing LC in AD patients. To the day, the molecular mechanisms underlying DD relationships remain poorly explained and their better characterization might offer unprecedented clinical opportunities.To this goal, we extend our previously designed MF-based framework for the molecular characterization of DD relationships. Considering AD-LC inverse comorbidity as a case study, we highlight multiple molecular mechanisms, among which the previously identified immune system and mitochondrial metabolism. We then discriminate mechanisms specific to LC from those shared with other cancers through a pancancer analysis. Additionally, new candidate molecular players, such as Estrogen Receptor (ER), CDH1 and HDAC, are pinpointed as factors that might underlie the inverse relationship, opening the way to new investigations. Finally, some lung cancer subtype-specific factors are also detected, suggesting the existence of heterogeneity across patients also in the context of inverse comorbidity.


2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Ettore Tiraboschi ◽  
Ramon Guirado ◽  
Dario Greco ◽  
Petri Auvinen ◽  
Jose Fernando Maya-Vetencourt ◽  
...  

The nervous system is highly sensitive to experience during early postnatal life, but this phase of heightened plasticity decreases with age. Recent studies have demonstrated that developmental-like plasticity can be reactivated in the visual cortex of adult animals through environmental or pharmacological manipulations. These findings provide a unique opportunity to study the cellular and molecular mechanisms of adult plasticity. Here we used the monocular deprivation paradigm to investigate large-scale gene expression patterns underlying the reinstatement of plasticity produced by fluoxetine in the adult rat visual cortex. We found changes, confirmed with RT-PCRs, in gene expression in different biological themes, such as chromatin structure remodelling, transcription factors, molecules involved in synaptic plasticity, extracellular matrix, and excitatory and inhibitory neurotransmission. Our findings reveal a key role for several molecules such as the metalloproteases Mmp2 and Mmp9 or the glycoprotein Reelin and open up new insights into the mechanisms underlying the reopening of the critical periods in the adult brain.


2008 ◽  
Vol 68 (2) ◽  
pp. 447-452 ◽  
Author(s):  
CA. Sommer ◽  
F. Henrique-Silva

Even though the molecular mechanisms underlying the Down syndrome (DS) phenotypes remain obscure, the characterization of the genes and conserved non-genic sequences of HSA21 together with large-scale gene expression studies in DS tissues are enhancing our understanding of this complex disorder. Also, mouse models of DS provide invaluable tools to correlate genes or chromosome segments to specific phenotypes. Here we discuss the possible contribution of HSA21 genes to DS and data from global gene expression studies of trisomic samples.


2020 ◽  
Vol 14 ◽  
pp. 117793222090616
Author(s):  
Badreddine Nouadi ◽  
Yousra Sbaoui ◽  
Mariame El Messal ◽  
Faiza Bennis ◽  
Fatima Chegdani

Nowadays, the integration of biological data is a major challenge for bioinformatics. Many studies have examined gene expression in the epithelial tissue in the intestines of infants born to term and breastfed, generating a large amount of data. The integration of these data is important to understand the biological processes involved during bacterial colonization of the newborns intestine, particularly through breast milk. This work aims to exploit the bioinformatics approaches, to provide a new representation and interpretation of the interactions between differentially expressed genes in the host intestine induced by the microbiota.


2007 ◽  
Vol 17 (07) ◽  
pp. 2477-2483 ◽  
Author(s):  
D. REMONDINI ◽  
N. NERETTI ◽  
C. FRANCESCHI ◽  
P. TIERI ◽  
J. M. SEDIVY ◽  
...  

We address the problem of finding large-scale functional and structural relationships between genes, given a time series of gene expression data, namely mRNA concentration values measured from genetically engineered rat fibroblasts cell lines responding to conditional cMyc proto-oncogene activation. We show how it is possible to retrieve suitable information about molecular mechanisms governing the cell response to conditional perturbations. This task is complex because typical high-throughput genomics experiments are performed with high number of probesets (103–104 genes) and a limited number of observations (< 102 time points). In this paper, we develop a deepest analysis of our previous work [Remondini et al., 2005] in which we characterized some of the main features of a gene-gene interaction network reconstructed from temporal correlation of gene expression time series. One first advancement is based on the comparison of the reconstructed network with networks obtained from randomly generated data, in order to characterize which features retrieve real biological information, and which are instead due to the characteristics of the network reconstruction method. The second and perhaps more relevant advancement is the characterization of the global change in co-expression pattern following cMyc activation as compared to a basal unperturbed state. We propose an analogy with a physical system in a critical state close to a phase transition (e.g. Potts ferromagnet), since the cell responds to the stimulus with high susceptibility, such that a single gene activation propagates to almost the entire genome. Our result is relative to temporal properties of gene network dynamics, and there are experimental evidence that this can be related to spatial properties regarding the global organization of chromatine structure [Knoepfler et al., 2006].


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
John Girgis ◽  
Dabo Yang ◽  
Imane Chakroun ◽  
Yubing Liu ◽  
Alexandre Blais

Abstract Background The Six1 transcription factor is implicated in controlling the development of several tissue types, notably skeletal muscle. Six1 also contributes to muscle metabolism and its activity is associated with the fast-twitch, glycolytic phenotype. Six1 regulates the expression of certain genes of the fast muscle program by directly stimulating their transcription or indirectly acting through a long non-coding RNA. We hypothesized that additional mechanisms of action of Six1 might be at play. Methods A combined analysis of gene expression profiling and genome-wide location analysis data was performed. Results were validated using in vivo RNA interference loss-of-function assays followed by measurement of gene expression by RT-PCR and transcriptional reporter assays. Results The Slc16a10 gene, encoding the thyroid hormone transmembrane transporter MCT10, was identified as a gene with a transcriptional enhancer directly bound by Six1 and requiring Six1 activity for full expression in adult mouse tibialis anterior, a predominantly fast-twitch muscle. Of the various thyroid hormone transporters, MCT10 mRNA was found to be the most abundant in skeletal muscle, and to have a stronger expression in fast-twitch compared to slow-twitch muscle groups. Loss-of-function of MCT10 in the tibialis anterior recapitulated the effect of Six1 on the expression of fast-twitch muscle genes and led to lower activity of a thyroid hormone receptor-dependent reporter gene. Conclusions These results shed light on the molecular mechanisms controlling the tissue expression profile of MCT10 and identify modulation of the thyroid hormone signaling pathway as an additional mechanism by which Six1 influences skeletal muscle metabolism.


2019 ◽  
Author(s):  
Yuhua Zhang ◽  
Corbin Quick ◽  
Ketian Yu ◽  
Alvaro Barbeira ◽  
Francesca Luca ◽  
...  

AbstractTranscriptome-wide association studies (TWAS), an integrative framework using expression quantitative trait loci (eQTLs) to construct proxies for gene expression, have emerged as a promising method to investigate the biological mechanisms underlying associations between genotypes and complex traits. However, challenges remain in interpreting TWAS results, especially regarding their causality implications. In this paper, we describe a new computational framework, probabilistic TWAS (PTWAS), to detect associations and investigate causal relationships between gene expression and complex traits. We use established concepts and principles from instrumental variables (IV) analysis to delineate and address the unique challenges that arise in TWAS. PTWAS utilizes probabilistic eQTL annotations derived from multi-variant Bayesian fine-mapping analysis conferring higher power to detect TWAS associations than existing methods. Additionally, PTWAS provides novel functionalities to evaluate the causal assumptions and estimate tissue- or cell-type specific causal effects of gene expression on complex traits. These features make PTWAS uniquely suited for in-depth investigations of the biological mechanisms that contribute to complex trait variation. Using eQTL data across 49 tissues from GTEx v8, we apply PTWAS to analyze 114 complex traits using GWAS summary statistics from several large-scale projects, including the UK Biobank. Our analysis reveals an abundance of genes with strong evidence of eQTL-mediated causal effects on complex traits and highlights the heterogeneity and tissue-relevance of these effects across complex traits. We distribute software and eQTL annotations to enable users performing rigorous TWAS analysis by leveraging the full potentials of the latest GTEx multi-tissue eQTL data.


Author(s):  
Gonca Erdemci-Tandogan ◽  
M. Lisa Manning

Large-scale tissue deformation during biological processes such as morphogenesis requires cellular rearrangements. The simplest rearrangement in confluent cellular monolayers involves neighbor exchanges among four cells, called a T1 transition, in analogy to foams. But unlike foams, cells must execute a sequence of molecular processes, such as endocytosis of adhesion molecules, to complete a T1 transition. Such processes could take a long time compared to other timescales in the tissue. In this work, we incorporate this idea by augmenting vertex models to require a fixed, finite time for T1 transitions, which we call the “T1 delay time”. We study how variations in T1 delay time affect tissue mechanics, by quantifying the relaxation time of tissues in the presence of T1 delays and comparing that to the cell-shape based timescale that characterizes fluidity in the absence of any T1 delays. We show that the molecular-scale T1 delay timescale dominates over the cell shape-scale collective response timescale when the T1 delay time is the larger of the two. We extend this analysis to tissues that become anisotropic under convergent extension, finding similar results. Moreover, we find that increasing the T1 delay time increases the percentage of higher-fold coordinated vertices and rosettes, and decreases the overall number of successful T1s, contributing to a more elastic-like – and less fluid-like – tissue response. Our work suggests that molecular mechanisms that act as a brake on T1 transitions could stiffen global tissue mechanics and enhance rosette formation during morphogenesis.


Sign in / Sign up

Export Citation Format

Share Document