ReCodLiver0.9: Overcoming challenges in genome-scale metabolic reconstruction of a non-model species

AbstractThe availability of genome sequences, annotations and knowledge of the biochemistry underlying metabolic transformations has led to the generation of metabolic network reconstructions for a wide range of organisms in bacteria, archaea, and eukaryotes. When modeled using mathematical representations, a reconstruction can simulate underlying genotype-phenotype relationships. Accordingly, genome-scale models (GEMs) can be used to predict the response of organisms to genetic and environmental variations. A bottom-up reconstruction procedure typically starts by generating a draft model from existing annotation data on a target organism. For model species, this part of the process can be straightforward, due to the abundant organism-specific biochemical data. However, the process becomes complicated for non-model less-annotated species. In this paper, we present a draft liver reconstruction, ReCodLiver0.9, of Atlantic cod (Gadus morhua), a non-model teleost fish, as a practicable guide for cases with comparably few resources. Although the reconstruction is considered a draft version, we show that it already has utility in elucidating metabolic response mechanisms to environmental toxicants by mapping gene expression data of exposure experiments to the resulting model.Author summaryGenome-scale metabolic models (GEMs) are constructed based upon reconstructed networks that are carried out by an organism. The underlying biochemical knowledge in such networks can be transformed into mathematical models that could serve as a platform to answer biological questions. The availability of high-throughput biological data, including genomics, proteomics, and metabolomics data, supports the generation of such models for a large number of organisms. Nevertheless, challenges arise for non-model species which are typically less annotated. In this paper, we discuss these challenges and possible solutions in the context of generation of a draft liver reconstruction of Atlantic cod (Gadus morhua). We also show how experimental data, here gene expression data, can be mapped to the resulting model to understand the metabolic response of cod liver to environmental toxicants.

Download Full-text

ReCodLiver0.9: Overcoming Challenges in Genome-Scale Metabolic Reconstruction of a Non-model Species

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2020.591406 ◽

2020 ◽

Vol 7 ◽

Author(s):

Eileen Marie Hanna ◽

Xiaokang Zhang ◽

Marta Eide ◽

Shirin Fallahi ◽

Tomasz Furmanek ◽

...

Keyword(s):

Gadus Morhua ◽

Metabolic Response ◽

Atlantic Cod ◽

Metabolic Reconstruction ◽

Environmental Toxicants ◽

Model Species ◽

Reconstruction Procedure ◽

Wide Range ◽

Draft Version ◽

Genome Scale

The availability of genome sequences, annotations, and knowledge of the biochemistry underlying metabolic transformations has led to the generation of metabolic network reconstructions for a wide range of organisms in bacteria, archaea, and eukaryotes. When modeled using mathematical representations, a reconstruction can simulate underlying genotype-phenotype relationships. Accordingly, genome-scale metabolic models (GEMs) can be used to predict the response of organisms to genetic and environmental variations. A bottom-up reconstruction procedure typically starts by generating a draft model from existing annotation data on a target organism. For model species, this part of the process can be straightforward, due to the abundant organism-specific biochemical data. However, the process becomes complicated for non-model less-annotated species. In this paper, we present a draft liver reconstruction, ReCodLiver0.9, of Atlantic cod (Gadus morhua), a non-model teleost fish, as a practicable guide for cases with comparably few resources. Although the reconstruction is considered a draft version, we show that it already has utility in elucidating metabolic response mechanisms to environmental toxicants by mapping gene expression data of exposure experiments to the resulting model.

Download Full-text

Graph Convolutional Network for Drug Response Prediction Using Gene Expression Data

Mathematics ◽

10.3390/math9070772 ◽

2021 ◽

Vol 9 (7) ◽

pp. 772

Author(s):

Seonghun Kim ◽

Seockhun Bae ◽

Yinhua Piao ◽

Kyuri Jo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Drug Response ◽

Response Prediction ◽

Biological Data ◽

Expression Data ◽

Convolutional Network ◽

Essential Information ◽

Protein Protein Interaction

Genomic profiles of cancer patients such as gene expression have become a major source to predict responses to drugs in the era of personalized medicine. As large-scale drug screening data with cancer cell lines are available, a number of computational methods have been developed for drug response prediction. However, few methods incorporate both gene expression data and the biological network, which can harbor essential information about the underlying process of the drug response. We proposed an analysis framework called DrugGCN for prediction of Drug response using a Graph Convolutional Network (GCN). DrugGCN first generates a gene graph by combining a Protein-Protein Interaction (PPI) network and gene expression data with feature selection of drug-related genes, and the GCN model detects the local features such as subnetworks of genes that contribute to the drug response by localized filtering. We demonstrated the effectiveness of DrugGCN using biological data showing its high prediction accuracy among the competing methods.

Download Full-text

HisCoM-PAGE: Hierarchical Structural Component Models for Pathway Analysis of Gene Expression Data

Genes ◽

10.3390/genes10110931 ◽

2019 ◽

Vol 10 (11) ◽

pp. 931 ◽

Cited By ~ 4

Author(s):

Mok ◽

Kim ◽

Lee ◽

Choi ◽

Lee ◽

...

Keyword(s):

Gene Expression ◽

Pancreatic Cancer ◽

Gene Expression Data ◽

Pathway Analysis ◽

Structural Component ◽

Biological Data ◽

Gene Set Enrichment Analysis ◽

Expression Data ◽

Global Test ◽

Causal Pathways

Although there have been several analyses for identifying cancer-associated pathways, based on gene expression data, most of these are based on single pathway analyses, and thus do not consider correlations between pathways. In this paper, we propose a hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE), which accounts for the hierarchical structure of genes and pathways, as well as the correlations among pathways. Specifically, HisCoM-PAGE focuses on the survival phenotype and identifies its associated pathways. Moreover, its application to real biological data analysis of pancreatic cancer data demonstrated that HisCoM-PAGE could successfully identify pathways associated with pancreatic cancer prognosis. Simulation studies comparing the performance of HisCoM-PAGE with other competing methods such as Gene Set Enrichment Analysis (GSEA), Global Test, and Wald-type Test showed HisCoM-PAGE to have the highest power to detect causal pathways in most simulation scenarios.

Download Full-text

Clustering Genes Using Heterogeneous Data Sources

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2010040102 ◽

2010 ◽

Vol 1 (2) ◽

pp. 12-28 ◽

Cited By ~ 3

Author(s):

Erliang Zeng ◽

Chengyong Yang ◽

Tao Li ◽

Giri Narasimhan

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Incomplete Data ◽

Clustering Algorithm ◽

Biological Data ◽

Exploratory Analysis ◽

Data Sources ◽

Modular Organization ◽

Constrained Clustering ◽

Expression Data

Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical techniques to deal with multiple complete and incomplete information sources. The Multi-Source Clustering (MSC) algorithm developed performs clustering with multiple, but complete, sources of data. To deal with incomplete data sources, the authors adopted the MPCK-means clustering algorithms to perform exploratory analysis on one complete source and other potentially incomplete sources provided in the form of constraints. This paper presents a new clustering algorithm MSC to perform exploratory analysis using two or more diverse but complete data sources, studies the effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporates such incomplete data into constrained clustering algorithm in form of constraints sets.

Download Full-text

Integration of gene expression data into genome-scale metabolic models

Metabolic Engineering ◽

10.1016/j.ymben.2003.12.002 ◽

2004 ◽

Vol 6 (4) ◽

pp. 285-293 ◽

Cited By ~ 136

Author(s):

Mats Åkesson ◽

Jochen Förster ◽

Jens Nielsen

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Metabolic Models ◽

Genome Scale

Download Full-text

A Novel Biomarker Identification Approach for Gastric Cancer Using Gene Expression and DNA Methylation Dataset

Frontiers in Genetics ◽

10.3389/fgene.2021.644378 ◽

2021 ◽

Vol 12 ◽

Author(s):

Ge Zhang ◽

Zijing Xue ◽

Chaokun Yan ◽

Jianlin Wang ◽

Huimin Luo

Keyword(s):

Gene Expression ◽

Gastric Cancer ◽

Dna Methylation ◽

Feature Selection ◽

Gene Expression Data ◽

Complex Disease ◽

Biological Data ◽

Computational Method ◽

Superior Performance ◽

Expression Data

As one type of complex disease, gastric cancer has high mortality rate, and there are few effective treatments for patients in advanced stage. With the development of biological technology, a large amount of multiple-omics data of gastric cancer are generated, which enables computational method to discover potential biomarkers of gastric cancer. That will be very important to detect gastric cancer at earlier stages and thus assist in providing timely treatment. However, most of biological data have the characteristics of high dimension and low sample size. It is hard to process directly without feature selection. Besides, only using some omic data, such as gene expression data, provides limited evidence to investigate gastric cancer associated biomarkers. In this research, gene expression data and DNA methylation data are integrated to analyze gastric cancer, and a feature selection approach is proposed to identify the possible biomarkers of gastric cancer. After the original data are pre-processed, the mutual information (MI) is applied to select some top genes. Then, fold change (FC) and T-test are adopted to identify differentially expressed genes (DEG). In particular, false discover rate (FDR) is introduced to revise p_value to further screen genes. For chosen genes, a deep neural network (DNN) model is utilized as the classifier to measure the quality of classification. The experimental results show that the approach can achieve superior performance in terms of accuracy and other metrics. Biological analysis for chosen genes further validates the effectiveness of the approach.

Download Full-text

A new GRASP metaheuristic for biclustering of gene expression data

10.7287/peerj.preprints.1679v1 ◽

2016 ◽

Author(s):

Daniele Ferone ◽

Angelo Facchiano ◽

Anna Marabotti ◽

Paola Festa

Keyword(s):

Gene Expression ◽

Local Search ◽

Gene Expression Data ◽

Spanning Trees ◽

Complete Solution ◽

Optimal Solution ◽

Biological Data ◽

Data Matrix ◽

Expression Data ◽

Local Search Procedure

The term biclustering stands for simultaneous clustering of both genes and conditions. This task has generated considerable interest over the past few decades, particularly related to the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining [1]. Since the problem has been shown to be NP-complete, we have recently designed and implemented a GRASP metaheuristic [2,3,4]. The greedy criterion used in the construction phase uses the Euclidean distance to build spanning trees of the graph representing the input data matrix. Once obtained a complete solution, the local search procedure tries to both enlarge the current solution and to improve its H-score exchanging rows and columns. The proposed approach has been tested on 5 synthetic datasets [5]: 1) constant biclusters; 2) constant, upregulated biclusters; 3) shift-scale biclusters; 4) shift biclusters, and 5) scale biclusters. Compared with state-of-the-art competitors, its behaviour is excellent on shift datasets and is very good on all other datasets except for scaled ones. In order to improve its behaviour on scaled data as well and to reduce running times, we have designed and preliminarily tested a variant of the existing GRASP, whose local search phase returns an approximate local optimal solution. The resulting algorithm promises to be a more efficient, general, and robust method for the biclustering of all kinds of possible biological data.

Download Full-text

Computational Models for the Analysis of Modern Biological Data

Handbook of Research on Systems Biology Applications in Medicine ◽

10.4018/978-1-60566-076-9.ch006 ◽

2009 ◽

pp. 117-125

Author(s):

Tuan D. Pham

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Computational Models ◽

Biological Data ◽

Computational Techniques ◽

Expression Data ◽

Microarray Gene Expression ◽

Proteomic Data ◽

Microarray Gene ◽

Increasing Demand

Computational models have been playing a significant role for the computer-based analysis of biological and biomedical data. Given the recent availability of genomic sequences and microarray gene expression, and proteomic data, there is an increasing demand for developing and applying advanced computational techniques for exploring these types of data such as: functional interpretation of gene expression data, deciphering of how genes, and proteins work together in pathways and networks, extracting and analysing phenotypic features of mitotic cells for high throughput screening of novel anti-mitotic drugs. Successful applications of advanced computational algorithms to solving modern life-science problems will make significant impacts on several important and promising issues related to genomic medicine, molecular imaging, and the scientific knowledge of the genetic basis of diseases. This chapter reviews the fusion of engineering, computer science, and information sciences with biology and medicine to address some latest technical developments in the computational analyses of modern biological data: microarray gene expression data, mass spectrometry data, and bioimaging.

Download Full-text

A dynamic programing approach to integrate gene expression data and network information for pathway model generation

Bioinformatics ◽

10.1093/bioinformatics/btz467 ◽

2019 ◽

Vol 36 (1) ◽

pp. 169-176 ◽

Cited By ~ 3

Author(s):

Yuexu Jiang ◽

Yanchun Liang ◽

Duolin Wang ◽

Dong Xu ◽

Trupti Joshi

Keyword(s):

Gene Expression ◽

Lung Cancer ◽

Gene Expression Data ◽

Biological Data ◽

Human Lung Cancer ◽

Supplementary Information ◽

Model Generation ◽

Expression Data ◽

Pathway Model ◽

Dynamic Programing

Abstract Motivation As large amounts of biological data continue to be rapidly generated, a major focus of bioinformatics research has been aimed toward integrating these data to identify active pathways or modules under certain experimental conditions or phenotypes. Although biologically significant modules can often be detected globally by many existing methods, it is often hard to interpret or make use of the results toward pathway model generation and testing. Results To address this gap, we have developed the IMPRes algorithm, a new step-wise active pathway detection method using a dynamic programing approach. IMPRes takes advantage of the existing pathway interaction knowledge in Kyoto Encyclopedia of Genes and Genomes. Omics data are then used to assign penalties to genes, interactions and pathways. Finally, starting from one or multiple seed genes, a shortest path algorithm is applied to detect downstream pathways that best explain the gene expression data. Since dynamic programing enables the detection one step at a time, it is easy for researchers to trace the pathways, which may lead to more accurate drug design and more effective treatment strategies. The evaluation experiments conducted on three yeast datasets have shown that IMPRes can achieve competitive or better performance than other state-of-the-art methods. Furthermore, a case study on human lung cancer dataset was performed and we provided several insights on genes and mechanisms involved in lung cancer, which had not been discovered before. Availability and implementation IMPRes visualization tool is available via web server at http://digbio.missouri.edu/impres. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data

Entropy ◽

10.3390/e23010002 ◽

2020 ◽

Vol 23 (1) ◽

pp. 2

Author(s):

Malik Yousef ◽

Abhishek Kumar ◽

Burcu Bakir-Gungor

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Data Analysis ◽

Gene Expression Data ◽

Gene Selection ◽

Biological Data ◽

Biological Information ◽

Background Information ◽

Biological Knowledge ◽

Expression Data

In the last two decades, there have been massive advancements in high throughput technologies, which resulted in the exponential growth of public repositories of gene expression datasets for various phenotypes. It is possible to unravel biomarkers by comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc. This problem refers to a well-studied problem in the machine learning domain, i.e., the feature selection problem. In biological data analysis, most of the computational feature selection methodologies were taken from other fields, without considering the nature of the biological data. Thus, integrative approaches that utilize the biological knowledge while performing feature selection are necessary for this kind of data. The main idea behind the integrative gene selection process is to generate a ranked list of genes considering both the statistical metrics that are applied to the gene expression data, and the biological background information which is provided as external datasets. One of the main goals of this review is to explore the existing methods that integrate different types of information in order to improve the identification of the biomolecular signatures of diseases and the discovery of new potential targets for treatment. These integrative approaches are expected to aid the prediction, diagnosis, and treatment of diseases, as well as to enlighten us on disease state dynamics, mechanisms of their onset and progression. The integration of various types of biological information will necessitate the development of novel techniques for integration and data analysis. Another aim of this review is to boost the bioinformatics community to develop new approaches for searching and determining significant groups/clusters of features based on one or more biological grouping functions.

Download Full-text