Persistent homology analysis of brain transcriptome data in autism

Persistent homology methods have found applications in the analysis of multiple types of biological data, particularly imaging data or data with a spatial and/or temporal component. However, few studies have assessed the use of persistent homology for the analysis of gene expression data. Here we apply persistent homology methods to investigate the global properties of gene expression in post-mortem brain tissue (cerebral cortex) of individuals with autism spectrum disorders (ASD) and matched controls. We observe a significant difference in the geometry of inter-sample relationships between autism and healthy controls as measured by the sum of the death times of zero-dimensional components and the Euler characteristic. This observation is replicated across two distinct datasets, and we interpret it as evidence for an increased heterogeneity of gene expression in autism. We also assessed the topology of gene-level point clouds and did not observe significant differences between ASD and control transcriptomes, suggesting that the overall transcriptome organization is similar in ASD and healthy cerebral cortex. Overall, our study provides a novel framework for persistent homology analyses of gene expression data for genetically complex disorders.

Download Full-text

Graph Convolutional Network for Drug Response Prediction Using Gene Expression Data

Mathematics ◽

10.3390/math9070772 ◽

2021 ◽

Vol 9 (7) ◽

pp. 772

Author(s):

Seonghun Kim ◽

Seockhun Bae ◽

Yinhua Piao ◽

Kyuri Jo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Drug Response ◽

Response Prediction ◽

Biological Data ◽

Expression Data ◽

Convolutional Network ◽

Essential Information ◽

Protein Protein Interaction

Genomic profiles of cancer patients such as gene expression have become a major source to predict responses to drugs in the era of personalized medicine. As large-scale drug screening data with cancer cell lines are available, a number of computational methods have been developed for drug response prediction. However, few methods incorporate both gene expression data and the biological network, which can harbor essential information about the underlying process of the drug response. We proposed an analysis framework called DrugGCN for prediction of Drug response using a Graph Convolutional Network (GCN). DrugGCN first generates a gene graph by combining a Protein-Protein Interaction (PPI) network and gene expression data with feature selection of drug-related genes, and the GCN model detects the local features such as subnetworks of genes that contribute to the drug response by localized filtering. We demonstrated the effectiveness of DrugGCN using biological data showing its high prediction accuracy among the competing methods.

Download Full-text

HisCoM-PAGE: Hierarchical Structural Component Models for Pathway Analysis of Gene Expression Data

Genes ◽

10.3390/genes10110931 ◽

2019 ◽

Vol 10 (11) ◽

pp. 931 ◽

Cited By ~ 4

Author(s):

Mok ◽

Kim ◽

Lee ◽

Choi ◽

Lee ◽

...

Keyword(s):

Gene Expression ◽

Pancreatic Cancer ◽

Gene Expression Data ◽

Pathway Analysis ◽

Structural Component ◽

Biological Data ◽

Gene Set Enrichment Analysis ◽

Expression Data ◽

Global Test ◽

Causal Pathways

Although there have been several analyses for identifying cancer-associated pathways, based on gene expression data, most of these are based on single pathway analyses, and thus do not consider correlations between pathways. In this paper, we propose a hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE), which accounts for the hierarchical structure of genes and pathways, as well as the correlations among pathways. Specifically, HisCoM-PAGE focuses on the survival phenotype and identifies its associated pathways. Moreover, its application to real biological data analysis of pancreatic cancer data demonstrated that HisCoM-PAGE could successfully identify pathways associated with pancreatic cancer prognosis. Simulation studies comparing the performance of HisCoM-PAGE with other competing methods such as Gene Set Enrichment Analysis (GSEA), Global Test, and Wald-type Test showed HisCoM-PAGE to have the highest power to detect causal pathways in most simulation scenarios.

Download Full-text

Comparing deep belief networks with support vector machines for classifying gene expression data from complex disorders

FEBS Open Bio ◽

10.1002/2211-5463.12652 ◽

2019 ◽

Vol 9 (7) ◽

pp. 1232-1248 ◽

Cited By ~ 3

Author(s):

Johannes Smolander ◽

Matthias Dehmer ◽

Frank Emmert‐Streib

Keyword(s):

Gene Expression ◽

Support Vector Machines ◽

Gene Expression Data ◽

Support Vector ◽

Belief Networks ◽

Expression Data ◽

Deep Belief Networks ◽

Complex Disorders ◽

Vector Machines

Download Full-text

SFARI Genes and where to find them; classification modelling to identify genes associated with Autism Spectrum Disorder from RNA-seq data

10.1101/2021.01.29.428754 ◽

2021 ◽

Author(s):

Magdalena Navarro ◽

T Ian Simpson

Keyword(s):

Gene Expression ◽

Autism Spectrum Disorder ◽

Differential Gene Expression ◽

Gene Expression Data ◽

Gene List ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Expression Data ◽

Link Type ◽

Differential Gene

AbstractMotivationAutism spectrum disorder (ASD) has a strong, yet heterogeneous, genetic component. Among the various methods that are being developed to help reveal the underlying molecular aetiology of the disease, one that is gaining popularity is the combination of gene expression and clinical genetic data. For ASD, the SFARI-gene database comprises lists of curated genes in which presumed causative mutations have been identified in patients. In order to predict novel candidate SFARI-genes we built classification models combining differential gene expression data for ASD patients and unaffected individuals with a gene’s status in the SFARI-gene list.ResultsSFARI-genes were not found to be significantly associated with differential gene expression patterns, nor were they enriched in gene co-expression network modules that had a strong correlation with ASD diagnosis. However, network analysis and machine learning models that incorporate information from the whole gene co-expression network were able to predict novel candidate genes that share features of existing SFARI genes and have support for roles in ASD in the literature. We found a statistically significant bias related to the absolute level of gene expression for existing SFARI genes and their scores. It is essential that this bias be taken into account when studies interpret ASD gene expression data at gene, module and whole-network levels.AvailabilitySource code is available from GitHub (https://doi.org/10.5281/zenodo.4463693) and the accompanying data from The University of Edinburgh DataStore (https://doi.org/10.7488/ds/2980)[email protected]

Download Full-text

Clustering Genes Using Heterogeneous Data Sources

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2010040102 ◽

2010 ◽

Vol 1 (2) ◽

pp. 12-28 ◽

Cited By ~ 3

Author(s):

Erliang Zeng ◽

Chengyong Yang ◽

Tao Li ◽

Giri Narasimhan

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Incomplete Data ◽

Clustering Algorithm ◽

Biological Data ◽

Exploratory Analysis ◽

Data Sources ◽

Modular Organization ◽

Constrained Clustering ◽

Expression Data

Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical techniques to deal with multiple complete and incomplete information sources. The Multi-Source Clustering (MSC) algorithm developed performs clustering with multiple, but complete, sources of data. To deal with incomplete data sources, the authors adopted the MPCK-means clustering algorithms to perform exploratory analysis on one complete source and other potentially incomplete sources provided in the form of constraints. This paper presents a new clustering algorithm MSC to perform exploratory analysis using two or more diverse but complete data sources, studies the effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporates such incomplete data into constrained clustering algorithm in form of constraints sets.

Download Full-text

A Novel Biomarker Identification Approach for Gastric Cancer Using Gene Expression and DNA Methylation Dataset

Frontiers in Genetics ◽

10.3389/fgene.2021.644378 ◽

2021 ◽

Vol 12 ◽

Author(s):

Ge Zhang ◽

Zijing Xue ◽

Chaokun Yan ◽

Jianlin Wang ◽

Huimin Luo

Keyword(s):

Gene Expression ◽

Gastric Cancer ◽

Dna Methylation ◽

Feature Selection ◽

Gene Expression Data ◽

Complex Disease ◽

Biological Data ◽

Computational Method ◽

Superior Performance ◽

Expression Data

As one type of complex disease, gastric cancer has high mortality rate, and there are few effective treatments for patients in advanced stage. With the development of biological technology, a large amount of multiple-omics data of gastric cancer are generated, which enables computational method to discover potential biomarkers of gastric cancer. That will be very important to detect gastric cancer at earlier stages and thus assist in providing timely treatment. However, most of biological data have the characteristics of high dimension and low sample size. It is hard to process directly without feature selection. Besides, only using some omic data, such as gene expression data, provides limited evidence to investigate gastric cancer associated biomarkers. In this research, gene expression data and DNA methylation data are integrated to analyze gastric cancer, and a feature selection approach is proposed to identify the possible biomarkers of gastric cancer. After the original data are pre-processed, the mutual information (MI) is applied to select some top genes. Then, fold change (FC) and T-test are adopted to identify differentially expressed genes (DEG). In particular, false discover rate (FDR) is introduced to revise p_value to further screen genes. For chosen genes, a deep neural network (DNN) model is utilized as the classifier to measure the quality of classification. The experimental results show that the approach can achieve superior performance in terms of accuracy and other metrics. Biological analysis for chosen genes further validates the effectiveness of the approach.

Download Full-text

ReCodLiver0.9: Overcoming challenges in genome-scale metabolic reconstruction of a non-model species

10.1101/2020.06.23.162792 ◽

2020 ◽

Cited By ~ 1

Author(s):

Eileen Marie Hanna ◽

Xiaokang Zhang ◽

Marta Eide ◽

Shirin Fallahi ◽

Tomasz Furmanek ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gadus Morhua ◽

Metabolic Response ◽

Atlantic Cod ◽

Biological Data ◽

Environmental Toxicants ◽

Expression Data ◽

Model Species ◽

Genome Scale

AbstractThe availability of genome sequences, annotations and knowledge of the biochemistry underlying metabolic transformations has led to the generation of metabolic network reconstructions for a wide range of organisms in bacteria, archaea, and eukaryotes. When modeled using mathematical representations, a reconstruction can simulate underlying genotype-phenotype relationships. Accordingly, genome-scale models (GEMs) can be used to predict the response of organisms to genetic and environmental variations. A bottom-up reconstruction procedure typically starts by generating a draft model from existing annotation data on a target organism. For model species, this part of the process can be straightforward, due to the abundant organism-specific biochemical data. However, the process becomes complicated for non-model less-annotated species. In this paper, we present a draft liver reconstruction, ReCodLiver0.9, of Atlantic cod (Gadus morhua), a non-model teleost fish, as a practicable guide for cases with comparably few resources. Although the reconstruction is considered a draft version, we show that it already has utility in elucidating metabolic response mechanisms to environmental toxicants by mapping gene expression data of exposure experiments to the resulting model.Author summaryGenome-scale metabolic models (GEMs) are constructed based upon reconstructed networks that are carried out by an organism. The underlying biochemical knowledge in such networks can be transformed into mathematical models that could serve as a platform to answer biological questions. The availability of high-throughput biological data, including genomics, proteomics, and metabolomics data, supports the generation of such models for a large number of organisms. Nevertheless, challenges arise for non-model species which are typically less annotated. In this paper, we discuss these challenges and possible solutions in the context of generation of a draft liver reconstruction of Atlantic cod (Gadus morhua). We also show how experimental data, here gene expression data, can be mapped to the resulting model to understand the metabolic response of cod liver to environmental toxicants.

Download Full-text

A new GRASP metaheuristic for biclustering of gene expression data

10.7287/peerj.preprints.1679v1 ◽

2016 ◽

Author(s):

Daniele Ferone ◽

Angelo Facchiano ◽

Anna Marabotti ◽

Paola Festa

Keyword(s):

Gene Expression ◽

Local Search ◽

Gene Expression Data ◽

Spanning Trees ◽

Complete Solution ◽

Optimal Solution ◽

Biological Data ◽

Data Matrix ◽

Expression Data ◽

Local Search Procedure

The term biclustering stands for simultaneous clustering of both genes and conditions. This task has generated considerable interest over the past few decades, particularly related to the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining [1]. Since the problem has been shown to be NP-complete, we have recently designed and implemented a GRASP metaheuristic [2,3,4]. The greedy criterion used in the construction phase uses the Euclidean distance to build spanning trees of the graph representing the input data matrix. Once obtained a complete solution, the local search procedure tries to both enlarge the current solution and to improve its H-score exchanging rows and columns. The proposed approach has been tested on 5 synthetic datasets [5]: 1) constant biclusters; 2) constant, upregulated biclusters; 3) shift-scale biclusters; 4) shift biclusters, and 5) scale biclusters. Compared with state-of-the-art competitors, its behaviour is excellent on shift datasets and is very good on all other datasets except for scaled ones. In order to improve its behaviour on scaled data as well and to reduce running times, we have designed and preliminarily tested a variant of the existing GRASP, whose local search phase returns an approximate local optimal solution. The resulting algorithm promises to be a more efficient, general, and robust method for the biclustering of all kinds of possible biological data.

Download Full-text

Computational Models for the Analysis of Modern Biological Data

Handbook of Research on Systems Biology Applications in Medicine ◽

10.4018/978-1-60566-076-9.ch006 ◽

2009 ◽

pp. 117-125

Author(s):

Tuan D. Pham

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Computational Models ◽

Biological Data ◽

Computational Techniques ◽

Expression Data ◽

Microarray Gene Expression ◽

Proteomic Data ◽

Microarray Gene ◽

Increasing Demand

Computational models have been playing a significant role for the computer-based analysis of biological and biomedical data. Given the recent availability of genomic sequences and microarray gene expression, and proteomic data, there is an increasing demand for developing and applying advanced computational techniques for exploring these types of data such as: functional interpretation of gene expression data, deciphering of how genes, and proteins work together in pathways and networks, extracting and analysing phenotypic features of mitotic cells for high throughput screening of novel anti-mitotic drugs. Successful applications of advanced computational algorithms to solving modern life-science problems will make significant impacts on several important and promising issues related to genomic medicine, molecular imaging, and the scientific knowledge of the genetic basis of diseases. This chapter reviews the fusion of engineering, computer science, and information sciences with biology and medicine to address some latest technical developments in the computational analyses of modern biological data: microarray gene expression data, mass spectrometry data, and bioimaging.

Download Full-text

A dynamic programing approach to integrate gene expression data and network information for pathway model generation

Bioinformatics ◽

10.1093/bioinformatics/btz467 ◽

2019 ◽

Vol 36 (1) ◽

pp. 169-176 ◽

Cited By ~ 3

Author(s):

Yuexu Jiang ◽

Yanchun Liang ◽

Duolin Wang ◽

Dong Xu ◽

Trupti Joshi

Keyword(s):

Gene Expression ◽

Lung Cancer ◽

Gene Expression Data ◽

Biological Data ◽

Human Lung Cancer ◽

Supplementary Information ◽

Model Generation ◽

Expression Data ◽

Pathway Model ◽

Dynamic Programing

Abstract Motivation As large amounts of biological data continue to be rapidly generated, a major focus of bioinformatics research has been aimed toward integrating these data to identify active pathways or modules under certain experimental conditions or phenotypes. Although biologically significant modules can often be detected globally by many existing methods, it is often hard to interpret or make use of the results toward pathway model generation and testing. Results To address this gap, we have developed the IMPRes algorithm, a new step-wise active pathway detection method using a dynamic programing approach. IMPRes takes advantage of the existing pathway interaction knowledge in Kyoto Encyclopedia of Genes and Genomes. Omics data are then used to assign penalties to genes, interactions and pathways. Finally, starting from one or multiple seed genes, a shortest path algorithm is applied to detect downstream pathways that best explain the gene expression data. Since dynamic programing enables the detection one step at a time, it is easy for researchers to trace the pathways, which may lead to more accurate drug design and more effective treatment strategies. The evaluation experiments conducted on three yeast datasets have shown that IMPRes can achieve competitive or better performance than other state-of-the-art methods. Furthermore, a case study on human lung cancer dataset was performed and we provided several insights on genes and mechanisms involved in lung cancer, which had not been discovered before. Availability and implementation IMPRes visualization tool is available via web server at http://digbio.missouri.edu/impres. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text