Structure Learning for Hierarchical Regulatory Networks

Network analysis offers a powerful technique to model the relationships between genes within biological regulatory networks. Inference of biological network structures is often performed on high-dimensional data, yet is hindered by the limited sample size of high throughput "omics" data typically available. To overcome this challenge, we exploit known organizing principles of biological networks that are sparse, modular, and likely share a large portion of their underlying architecture. We present SHINE - Structure Learning for Hierarchical Networks - a framework for defining data-driven structural constraints and incorporating a shared learning paradigm for efficiently learning multiple networks from high-dimensional data. We show through simulations SHINE improves performance when relatively few samples are available and multiple networks are desired, by reducing the complexity of the graphical search space and by taking advantage of shared structural information. We evaluated SHINE on TCGA Pan-Cancer data and found learned tumor-specific networks exhibit expected graph properties of real biological networks, recapture previously validated interactions, and recapitulate findings in literature. Application of SHINE to the analysis of subtype-specific breast cancer networks identified key genes and biological processes for tumor maintenance and survival as well as potential therapeutic targets for modulating known breast cancer disease genes.

Download Full-text

Relevant Attribute Discovery in High Dimensional Data: Application to Breast Cancer Gene Expressions

Rough Sets and Knowledge Technology - Lecture Notes in Computer Science ◽

10.1007/11795131_70 ◽

2006 ◽

pp. 482-489 ◽

Cited By ~ 11

Author(s):

Julio J. Valdés ◽

Alan J. Barton

Keyword(s):

Breast Cancer ◽

High Dimensional Data ◽

High Dimensional ◽

Cancer Gene ◽

Gene Expressions ◽

Data Application ◽

Relevant Attribute ◽

Breast Cancer Gene

Download Full-text

Exploring the Operational Characteristics of Inference Algorithms for Transcriptional Networks by Means of Synthetic Data

Artificial Life ◽

10.1162/artl.2008.14.1.49 ◽

2008 ◽

Vol 14 (1) ◽

pp. 49-63 ◽

Cited By ~ 1

Author(s):

Koenraad Van Leemput ◽

Tim Van den Bulcke ◽

Thomas Dhollander ◽

Bart De Moor ◽

Kathleen Marchal ◽

...

Keyword(s):

Biological Networks ◽

Regulatory Networks ◽

Structure Learning ◽

Synthetic Data ◽

Network Size ◽

Transcriptional Networks ◽

Data Sets ◽

Expression Data ◽

Operational Characteristics ◽

Inference Algorithms

The development of structure-learning algorithms for gene regulatory networks depends heavily on the availability of synthetic data sets that contain both the original network and associated expression data. This article reports the application of SynTReN, an existing network generator that samples topologies from existing biological networks and uses Michaelis-Menten and Hill enzyme kinetics to simulate gene interactions. We illustrate the effects of different aspects of the expression data on the quality of the inferred network. The tested expression data parameters are network size, network topology, type and degree of noise, quantity of expression data, and interaction types between genes. This is done by applying three well-known inference algorithms to SynTReN data sets. The results show the power of synthetic data in revealing operational characteristics of inference algorithms that are unlikely to be discovered by means of biological microarray data only.

Download Full-text

A Sparse Structure Learning Algorithm for Gaussian Bayesian Network Identification from High-Dimensional Data

IEEE Transactions on Pattern Analysis and Machine Intelligence ◽

10.1109/tpami.2012.129 ◽

2013 ◽

Vol 35 (6) ◽

pp. 1328-1342 ◽

Cited By ~ 39

Author(s):

Shuai Huang ◽

Jing Li ◽

Jieping Ye ◽

Adam Fleisher ◽

Kewei Chen ◽

...

Keyword(s):

Bayesian Network ◽

Structure Learning ◽

Learning Algorithm ◽

High Dimensional Data ◽

High Dimensional ◽

Network Identification ◽

Gaussian Bayesian Network

Download Full-text

ANOVA-HD: Analysis of variance when both input and output layers are high-dimensional

PLoS ONE ◽

10.1371/journal.pone.0243251 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243251

Author(s):

Gustavo de los Campos ◽

Torsten Pook ◽

Agustin Gonzalez-Reymundez ◽

Henner Simianer ◽

George Mias ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Copy Number ◽

Homo Sapiens ◽

Linear Span ◽

Copy Number Variants ◽

High Dimensional ◽

Data Set ◽

Cancer Data ◽

Data Layers

Modern genomic data sets often involve multiple data-layers (e.g., DNA-sequence, gene expression), each of which itself can be high-dimensional. The biological processes underlying these data-layers can lead to intricate multivariate association patterns. We propose and evaluate two methods to determine the proportion of variance of an output data set that can be explained by an input data set when both data panels are high dimensional. Our approach uses random-effects models to estimate the proportion of variance of vectors in the linear span of the output set that can be explained by regression on the input set. We consider a method based on an orthogonal basis (Eigen-ANOVA) and one that uses random vectors (Monte Carlo ANOVA, MC-ANOVA) in the linear span of the output set. Using simulations, we show that the MC-ANOVA method gave nearly unbiased estimates. Estimates produced by Eigen-ANOVA were also nearly unbiased, except when the shared variance was very high (e.g., >0.9). We demonstrate the potential insight that can be obtained from the use of MC-ANOVA and Eigen-ANOVA by applying these two methods to the study of multi-locus linkage disequilibrium in chicken (Gallus gallus) genomes and to the assessment of inter-dependencies between gene expression, methylation, and copy-number-variants in data from breast cancer tumors from humans (Homo sapiens). Our analyses reveal that in chicken breeding populations ~50,000 evenly-spaced SNPs are enough to fully capture the span of whole-genome-sequencing genomes. In the study of multi-omic breast cancer data, we found that the span of copy-number-variants can be fully explained using either methylation or gene expression data and that roughly 74% of the variance in gene expression can be predicted from methylation data.

Download Full-text

A general index for linear and nonlinear correlations for high dimensional genomic data

BMC Genomics ◽

10.1186/s12864-020-07246-x ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Zhihao Yao ◽

Jing Zhang ◽

Xiufen Zou

Keyword(s):

Gene Regulatory Networks ◽

Regulatory Networks ◽

High Dimensional Data ◽

Kernel Functions ◽

High Dimensional ◽

Vector Correlation ◽

General Index ◽

Gene Regulatory ◽

Linear And Nonlinear ◽

Rv Coefficient

Abstract Background With the advance of high throughput sequencing, high-dimensional data are generated. Detecting dependence/correlation between these datasets is becoming one of most important issues in multi-dimensional data integration and co-expression network construction. RNA-sequencing data is widely used to construct gene regulatory networks. Such networks could be more accurate when methylation data, copy number aberration data and other types of data are introduced. Consequently, a general index for detecting relationships between high-dimensional data is indispensable. Results We proposed a Kernel-Based RV-coefficient, named KBRV, for testing both linear and nonlinear correlation between two matrices by introducing kernel functions into RV2 (the modified RV-coefficient). Permutation test and other validation methods were used on simulated data to test the significance and rationality of KBRV. In order to demonstrate the advantages of KBRV in constructing gene regulatory networks, we applied this index on real datasets (ovarian cancer datasets and exon-level RNA-Seq data in human myeloid differentiation) to illustrate its superiority over vector correlation. Conclusions We concluded that KBRV is an efficient index for detecting both linear and nonlinear relationships in high dimensional data. The correlation method for high dimensional data has possible applications in the construction of gene regulatory network.

Download Full-text

An improved Kohonen self-organizing map clustering algorithm for high-dimensional data sets

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v24.i1.pp600-610 ◽

2021 ◽

Vol 24 (1) ◽

pp. 600

Author(s):

Momotaz Begum ◽

Bimal Chandra Das ◽

Md. Zakir Hossain ◽

Antu Saha ◽

Khaleda Akther Papry

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

High Dimensional Data ◽

Predictive Performance ◽

High Dimensional ◽

Data Sets ◽

Self Organizing Map ◽

Distance Measurements ◽

Cancer Data ◽

Self Organizing

<p>Manipulating high-dimensional data is a major research challenge in the ﬁeld of computer science in recent years. To classify this data, a lot of clustering algorithms have already been proposed. Kohonen self-organizing map (KSOM) is one of them. However, this algorithm has some drawbacks like overlapping clusters and non-linear separability problems. Therefore, in this paper, we propose an improved KSOM (I-KSOM) to reduce the problems that measures distances among objects using EISEN Cosine correlation formula. So far as we know, no previous work has used EISEN Cosine correlation distance measurements to classify high-dimensional data sets. To the robustness of the proposed KSOM, we carry out the experiments on several popular datasets like Iris, Seeds, Glass, Vertebral column, and Wisconsin breast cancer data sets. Our proposed algorithm shows better result compared to the existing original KSOM and another modiﬁed KSOM in terms of predictive performance with topographic and quantization error.</p>

Download Full-text

Inferring propensity amongst lung and breast carcinomas via overlapped gene expression profiles

10.1101/178558 ◽

2017 ◽

Author(s):

Rajni Jaiswal ◽

Sabin Dhakal ◽

Shaurya Jauhari

Keyword(s):

Gene Expression ◽

Lung Cancer ◽

Biological Networks ◽

Drug Targets ◽

Regulatory Networks ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Specific Gene ◽

Cancer Data ◽

Gene Regulatory

ABSTRACTReconstruction of biological networks for topological analyses helps in correlation identification between various types of biomarkers. These networks have been vital components of System Biology in present era. Genes are the basic physical and structural unit of heredity. Genes act as instructions to make molecules called proteins. Alterations in the normal sequence of these genes are the root cause of various diseases and cancer is the prominent example disease caused by gene alteration or mutation. These slight alterations can be detected by microarray analysis. The high throughput data obtained by microarray experiments aid scientists in reconstructing cancer specific gene regulatory networks. The purpose of experiment performed is to find out the overlapping of the gene expression profiles of breast and lung cancer data, so that the common hub genes can be sifted and utilized as drug targets which could be used for the treatment of diseased conditions. In this study, first the differentially expressed genes have been identified (lung cancer and breast cancer), followed by a filtration approach and most significant genes are chosen using paired t-test and gene regulatory network construction. The obtained result has been checked and validated with the available databases and literature.

Download Full-text