scholarly journals A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models

2017 ◽  
Author(s):  
Shayan Tabe-Bordbar ◽  
Amin Emad ◽  
Sihai Dave Zhao ◽  
Saurabh Sinha

AbstractCross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption does not hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of methods used to learn gene regulatory networks. We compared the performance of a regression-based method for gene expression prediction, estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of generalizability of the model compared to CCV. Next, we defined the ‘distinctness’ of a test set from a training set and showed that this measure is predictive of the performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.

2021 ◽  
Author(s):  
Hakimeh Khojasteh ◽  
Mohammad Hossein Olyaee ◽  
Alireza Khanteymoori

The development of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many machine learning methods have been developed, including supervised, unsupervised, and semi-supervised to infer gene regulatory networks. Most of these methods ignore the class imbalance problem which can lead to decreasing the accuracy of predicting regulatory interactions in the network. Therefore, developing an effective method considering imbalanced data is a challenging task. In this paper, we propose EnGRNT approach to infer GRNs with high accuracy that uses ensemble-based methods. The proposed approach, as well as the gene expression data, considers the topological features of GRN. We applied our approach to the simulated Escherichia coli dataset. Experimental results demonstrate that the appropriateness of the inference method relies on the size and type of expression profiles in microarray data. Except for multifactorial experimental conditions, the proposed approach outperforms unsupervised methods. The obtained results recommend the application of EnGRNT on the imbalanced datasets.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Mika J. Välimäki ◽  
Robert S. Leigh ◽  
Sini M. Kinnunen ◽  
Alexander R. March ◽  
Ana Hernández de Sande ◽  
...  

AbstractBackgroundPharmacological modulation of cell fate decisions and developmental gene regulatory networks holds promise for the treatment of heart failure. Compounds that target tissue-specific transcription factors could overcome non-specific effects of small molecules and lead to the regeneration of heart muscle following myocardial infarction. Due to cellular heterogeneity in the heart, the activation of gene programs representing specific atrial and ventricular cardiomyocyte subtypes would be highly desirable. Chemical compounds that modulate atrial and ventricular cell fate could be used to improve subtype-specific differentiation of endogenous or exogenously delivered progenitor cells in order to promote cardiac regeneration.MethodsTranscription factor GATA4-targeted compounds that have previously shown in vivo efficacy in cardiac injury models were tested for stage-specific activation of atrial and ventricular reporter genes in differentiating pluripotent stem cells using a dual reporter assay. Chemically induced gene expression changes were characterized by qRT-PCR, global run-on sequencing (GRO-seq) and immunoblotting, and the network of cooperative proteins of GATA4 and NKX2-5 were further explored by the examination of the GATA4 and NKX2-5 interactome by BioID. Reporter gene assays were conducted to examine combinatorial effects of GATA-targeted compounds and bromodomain and extraterminal domain (BET) inhibition on chamber-specific gene expression.ResultsGATA4-targeted compounds 3i-1000 and 3i-1103 were identified as differential modulators of atrial and ventricular gene expression. More detailed structure-function analysis revealed a distinct subclass of GATA4/NKX2-5 inhibitory compounds with an acetyl lysine-like domain that contributed to ventricular cells (%Myl2-eGFP+). Additionally, BioID analysis indicated broad interaction between GATA4 and BET family of proteins, such as BRD4. This indicated the involvement of epigenetic modulators in the regulation of GATA-dependent transcription. In this line, reporter gene assays with combinatorial treatment of 3i-1000 and the BET bromodomain inhibitor (+)-JQ1 demonstrated the cooperative role of GATA4 and BRD4 in the modulation of chamber-specific cardiac gene expression.ConclusionsCollectively, these results indicate the potential for therapeutic alteration of cell fate decisions and pathological gene regulatory networks by GATA4-targeted compounds modulating chamber-specific transcriptional programs in multipotent cardiac progenitor cells and cardiomyocytes. The compound scaffolds described within this study could be used to develop regenerative strategies for myocardial regeneration.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Neel Patel ◽  
William S. Bush

Abstract Background Transcriptional regulation is complex, requiring multiple cis (local) and trans acting mechanisms working in concert to drive gene expression, with disruption of these processes linked to multiple diseases. Previous computational attempts to understand the influence of regulatory mechanisms on gene expression have used prediction models containing input features derived from cis regulatory factors. However, local chromatin looping and trans-acting mechanisms are known to also influence transcriptional regulation, and their inclusion may improve model accuracy and interpretation. In this study, we create a general model of transcription factor influence on gene expression by incorporating both cis and trans gene regulatory features. Results We describe a computational framework to model gene expression for GM12878 and K562 cell lines. This framework weights the impact of transcription factor-based regulatory data using multi-omics gene regulatory networks to account for both cis and trans acting mechanisms, and measures of the local chromatin context. These prediction models perform significantly better compared to models containing cis-regulatory features alone. Models that additionally integrate long distance chromatin interactions (or chromatin looping) between distal transcription factor binding regions and gene promoters also show improved accuracy. As a demonstration of their utility, effect estimates from these models were used to weight cis-regulatory rare variants for sequence kernel association test analyses of gene expression. Conclusions Our models generate refined effect estimates for the influence of individual transcription factors on gene expression, allowing characterization of their roles across the genome. This work also provides a framework for integrating multiple data types into a single model of transcriptional regulation.


Sign in / Sign up

Export Citation Format

Share Document