scholarly journals Structure Learning for Hierarchical Regulatory Networks

2021 ◽  
Author(s):  
Anthony Federico ◽  
Joseph Kern ◽  
Xaralabos Varelas ◽  
Stefano Monti

Network analysis offers a powerful technique to model the relationships between genes within biological regulatory networks. Inference of biological network structures is often performed on high-dimensional data, yet is hindered by the limited sample size of high throughput "omics" data typically available. To overcome this challenge, we exploit known organizing principles of biological networks that are sparse, modular, and likely share a large portion of their underlying architecture. We present SHINE - Structure Learning for Hierarchical Networks - a framework for defining data-driven structural constraints and incorporating a shared learning paradigm for efficiently learning multiple networks from high-dimensional data. We show through simulations SHINE improves performance when relatively few samples are available and multiple networks are desired, by reducing the complexity of the graphical search space and by taking advantage of shared structural information. We evaluated SHINE on TCGA Pan-Cancer data and found learned tumor-specific networks exhibit expected graph properties of real biological networks, recapture previously validated interactions, and recapitulate findings in literature. Application of SHINE to the analysis of subtype-specific breast cancer networks identified key genes and biological processes for tumor maintenance and survival as well as potential therapeutic targets for modulating known breast cancer disease genes.

2008 ◽  
Vol 14 (1) ◽  
pp. 49-63 ◽  
Author(s):  
Koenraad Van Leemput ◽  
Tim Van den Bulcke ◽  
Thomas Dhollander ◽  
Bart De Moor ◽  
Kathleen Marchal ◽  
...  

The development of structure-learning algorithms for gene regulatory networks depends heavily on the availability of synthetic data sets that contain both the original network and associated expression data. This article reports the application of SynTReN, an existing network generator that samples topologies from existing biological networks and uses Michaelis-Menten and Hill enzyme kinetics to simulate gene interactions. We illustrate the effects of different aspects of the expression data on the quality of the inferred network. The tested expression data parameters are network size, network topology, type and degree of noise, quantity of expression data, and interaction types between genes. This is done by applying three well-known inference algorithms to SynTReN data sets. The results show the power of synthetic data in revealing operational characteristics of inference algorithms that are unlikely to be discovered by means of biological microarray data only.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0243251
Author(s):  
Gustavo de los Campos ◽  
Torsten Pook ◽  
Agustin Gonzalez-Reymundez ◽  
Henner Simianer ◽  
George Mias ◽  
...  

Modern genomic data sets often involve multiple data-layers (e.g., DNA-sequence, gene expression), each of which itself can be high-dimensional. The biological processes underlying these data-layers can lead to intricate multivariate association patterns. We propose and evaluate two methods to determine the proportion of variance of an output data set that can be explained by an input data set when both data panels are high dimensional. Our approach uses random-effects models to estimate the proportion of variance of vectors in the linear span of the output set that can be explained by regression on the input set. We consider a method based on an orthogonal basis (Eigen-ANOVA) and one that uses random vectors (Monte Carlo ANOVA, MC-ANOVA) in the linear span of the output set. Using simulations, we show that the MC-ANOVA method gave nearly unbiased estimates. Estimates produced by Eigen-ANOVA were also nearly unbiased, except when the shared variance was very high (e.g., >0.9). We demonstrate the potential insight that can be obtained from the use of MC-ANOVA and Eigen-ANOVA by applying these two methods to the study of multi-locus linkage disequilibrium in chicken (Gallus gallus) genomes and to the assessment of inter-dependencies between gene expression, methylation, and copy-number-variants in data from breast cancer tumors from humans (Homo sapiens). Our analyses reveal that in chicken breeding populations ~50,000 evenly-spaced SNPs are enough to fully capture the span of whole-genome-sequencing genomes. In the study of multi-omic breast cancer data, we found that the span of copy-number-variants can be fully explained using either methylation or gene expression data and that roughly 74% of the variance in gene expression can be predicted from methylation data.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Zhihao Yao ◽  
Jing Zhang ◽  
Xiufen Zou

Abstract Background With the advance of high throughput sequencing, high-dimensional data are generated. Detecting dependence/correlation between these datasets is becoming one of most important issues in multi-dimensional data integration and co-expression network construction. RNA-sequencing data is widely used to construct gene regulatory networks. Such networks could be more accurate when methylation data, copy number aberration data and other types of data are introduced. Consequently, a general index for detecting relationships between high-dimensional data is indispensable. Results We proposed a Kernel-Based RV-coefficient, named KBRV, for testing both linear and nonlinear correlation between two matrices by introducing kernel functions into RV2 (the modified RV-coefficient). Permutation test and other validation methods were used on simulated data to test the significance and rationality of KBRV. In order to demonstrate the advantages of KBRV in constructing gene regulatory networks, we applied this index on real datasets (ovarian cancer datasets and exon-level RNA-Seq data in human myeloid differentiation) to illustrate its superiority over vector correlation. Conclusions We concluded that KBRV is an efficient index for detecting both linear and nonlinear relationships in high dimensional data. The correlation method for high dimensional data has possible applications in the construction of gene regulatory network.


Author(s):  
Momotaz Begum ◽  
Bimal Chandra Das ◽  
Md. Zakir Hossain ◽  
Antu Saha ◽  
Khaleda Akther Papry

<p>Manipulating high-dimensional data is a major research challenge in the field of computer science in recent years. To classify this data, a lot of clustering algorithms have already been proposed. Kohonen self-organizing map (KSOM) is one of them. However, this algorithm has some drawbacks like overlapping clusters and non-linear separability problems. Therefore, in this paper, we propose an improved KSOM (I-KSOM) to reduce the problems that measures distances among objects using EISEN Cosine correlation formula. So far as we know, no previous work has used EISEN Cosine correlation distance measurements to classify high-dimensional data sets. To the robustness of the proposed KSOM, we carry out the experiments on several popular datasets like Iris, Seeds, Glass, Vertebral column, and Wisconsin breast cancer data sets. Our proposed algorithm shows better result compared to the existing original KSOM and another modified KSOM in terms of predictive performance with topographic and quantization error.</p>


2017 ◽  
Author(s):  
Rajni Jaiswal ◽  
Sabin Dhakal ◽  
Shaurya Jauhari

ABSTRACTReconstruction of biological networks for topological analyses helps in correlation identification between various types of biomarkers. These networks have been vital components of System Biology in present era. Genes are the basic physical and structural unit of heredity. Genes act as instructions to make molecules called proteins. Alterations in the normal sequence of these genes are the root cause of various diseases and cancer is the prominent example disease caused by gene alteration or mutation. These slight alterations can be detected by microarray analysis. The high throughput data obtained by microarray experiments aid scientists in reconstructing cancer specific gene regulatory networks. The purpose of experiment performed is to find out the overlapping of the gene expression profiles of breast and lung cancer data, so that the common hub genes can be sifted and utilized as drug targets which could be used for the treatment of diseased conditions. In this study, first the differentially expressed genes have been identified (lung cancer and breast cancer), followed by a filtration approach and most significant genes are chosen using paired t-test and gene regulatory network construction. The obtained result has been checked and validated with the available databases and literature.


2011 ◽  
Vol 4 (2) ◽  
pp. 8-12
Author(s):  
Leo Alexander T Leo Alexander T ◽  
◽  
Pari Dayal L Pari Dayal L ◽  
Valarmathi S Valarmathi S ◽  
Ponnuraja C Ponnuraja C ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document