Reconstructing Gene Networks of Forest Trees from Gene Expression Data: Toward Higher-Resolution Approaches

Author(s):  
Matt Zinkgraf ◽  
Andrew Groover ◽  
Vladimir Filkov
Author(s):  
Crescenzio Gallo

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.


F1000Research ◽  
2022 ◽  
Vol 9 ◽  
pp. 1159
Author(s):  
Qian (Vicky) Wu ◽  
Wei Sun ◽  
Li Hsu

Gene expression data have been used to infer gene-gene networks (GGN) where an edge between two genes implies the conditional dependence of these two genes given all the other genes. Such gene-gene networks are of-ten referred to as gene regulatory networks since it may reveal expression regulation. Most of existing methods for identifying GGN employ penalized regression with L1 (lasso), L2 (ridge), or elastic net penalty, which spans the range of L1 to L2 penalty. However, for high dimensional gene expression data, a penalty that spans the range of L0 and L1 penalty, such as the log penalty, is often needed for variable selection consistency. Thus, we develop a novel method that em-ploys log penalty within the framework of an earlier network identification method space (Sparse PArtial Correlation Estimation), and implement it into a R package space-log. We show that the space-log is computationally efficient (source code implemented in C), and has good performance comparing with other methods, particularly for networks with hubs.Space-log is open source and available at GitHub, https://github.com/wuqian77/SpaceLog


Author(s):  
Guro Dørum ◽  
Lars Snipen ◽  
Margrete Solheim ◽  
Solve Saebo

Gene set analysis methods have become a widely used tool for including prior biological knowledge in the statistical analysis of gene expression data. Advantages of these methods include increased sensitivity, easier interpretation and more conformity in the results. However, gene set methods do not employ all the available information about gene relations. Genes are arranged in complex networks where the network distances contain detailed information about inter-gene dependencies. We propose a method that uses gene networks to smooth gene expression data with the aim of reducing the number of false positives and identify important subnetworks. Gene dependencies are extracted from the network topology and are used to smooth genewise test statistics. To find the optimal degree of smoothing, we propose using a criterion that considers the correlation between the network and the data. The network smoothing is shown to improve the ability to identify important genes in simulated data. Applied to a real data set, the smoothing accentuates parts of the network with a high density of differentially expressed genes.


2021 ◽  
Vol 22 (17) ◽  
pp. 9432
Author(s):  
Laurent A. Winckers ◽  
Chris T. Evelo ◽  
Egon L. Willighagen ◽  
Martina Kutmon

Some engineered nanomaterials incite toxicological effects, but the underlying molecular processes are understudied. The varied physicochemical properties cause different initial molecular interactions, complicating toxicological predictions. Gene expression data allow us to study the responses of genes and biological processes. Overrepresentation analysis identifies enriched biological processes using the experimental data but prompts broad results instead of detailed toxicological processes. We demonstrate a targeted filtering approach to compare public gene expression data for low and high exposure on three cell lines to titanium dioxide nanobelts. Our workflow finds cell and concentration-specific changes in affected pathways linked to four Gene Ontology terms (apoptosis, inflammation, DNA damage, and oxidative stress) to select pathways with a clear toxicity focus. We saw more differentially expressed genes at higher exposure, but our analysis identifies clear differences between the cell lines in affected processes. Colorectal adenocarcinoma cells showed resilience to both concentrations. Small airway epithelial cells displayed a cytotoxic response to the high concentration, but not as strongly as monocytic-like cells. The pathway-gene networks highlighted the gene overlap between altered toxicity-related pathways. The automated workflow is flexible and can focus on other biological processes by selecting other GO terms.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1159
Author(s):  
Qian (Vicky) Wu ◽  
Wei Sun ◽  
Li Hsu

Gene expression data have been used to infer gene-gene networks (GGN) where an edge between two genes implies the conditional dependence of these two genes given all the other genes. Such gene-gene networks are of-ten referred to as gene regulatory networks since it may reveal expression regulation. Most of existing methods for identifying GGN employ penalized regression with L1 (lasso), L2 (ridge), or elastic net penalty, which spans the range of L1 to L2 penalty. However, for high dimensional gene expression data, a penalty that spans the range of L0 and L1 penalty, such as the log penalty, is often needed for variable selection consistency. Thus, we develop a novel method that em-ploys log penalty within the framework of an earlier network identification method space (Sparse PArtial Correlation Estimation), and implement it into a R package space-log. We show that the space-log is computationally efficient (source code implemented in C), and has good performance comparing with other methods, particularly for networks with hubs.Space-log is open source and available at GitHub, https://github.com/wuqian77/SpaceLog


2021 ◽  
Author(s):  
Jingyi Zhang ◽  
Farhan Ibrahim ◽  
Doaa Altarawy ◽  
Lenwood S Heath ◽  
Sarah Tulin

Abstract BackgroundGene regulatory network (GRN) inference can now take advantage of powerful machine learning algorithms to predict the entire landscape of gene-to-gene interactions with the potential to complement traditional experimental methods in building gene networks. However, the dynamical nature of embryonic development -- representing the time-dependent interactions between thousands of transcription factors, signaling molecules, and effector genes -- is one of the most challenging arenas for GRN prediction. ResultsIn this work, we show that successful GRN predictions for developmental systems from gene expression data alone can be obtained with the Priors Enriched Absent Knowledge (PEAK) network inference algorithm. PEAK is a noise-robust method that models gene expression dynamics via ordinary differential equations and selects the best network based on information-theoretic criteria coupled with the machine learning algorithm Elastic net. We test our GRN prediction methodology using two gene expression data sets for the purple sea urchin (S. purpuratus) and cross-check our results against existing GRN models that have been constructed and validated by over 30 years of experimental results. Our results found a remarkably high degree of sensitivity in identifying known gene interactions in the network (maximum 76.32%). We also generated 838 novel predictions for interactions that have not yet been described, which provide a resource for researchers to use to further complete the sea urchin GRN. ConclusionsGRN predictions that match known gene interactions can be produced using gene expression data alone from developmental time series experiments.


Sign in / Sign up

Export Citation Format

Share Document