Prediction of a Cell-Class-Specific Mouse Mesoconnectome Using Gene Expression Data

Nestor Timonidis; Rembrandt Bakker; Paul Tiesinga

doi:10.1007/s12021-020-09471-x

Prediction of a Cell-Class-Specific Mouse Mesoconnectome Using Gene Expression Data

Neuroinformatics ◽

10.1007/s12021-020-09471-x ◽

2020 ◽

Vol 18 (4) ◽

pp. 611-626

Author(s):

Nestor Timonidis ◽

Rembrandt Bakker ◽

Paul Tiesinga

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Computational Models ◽

Brain Connectivity ◽

Expression Patterns ◽

Enrichment Analysis ◽

Expression Data ◽

Binary Forms ◽

Major Step ◽

Cell Class

Abstract Reconstructing brain connectivity at sufficient resolution for computational models designed to study the biophysical mechanisms underlying cognitive processes is extremely challenging. For such a purpose, a mesoconnectome that includes laminar and cell-class specificity would be a major step forward. We analyzed the ability of gene expression patterns to predict cell-class and layer-specific projection patterns and assessed the functional annotations of the most predictive groups of genes. To achieve our goal we used publicly available volumetric gene expression and connectivity data and we trained computational models to learn and predict cell-class and layer-specific axonal projections using gene expression data. Predictions were done in two ways, namely predicting projection strengths using the expression of individual genes and using the co-expression of genes organized in spatial modules, as well as predicting binary forms of projection. For predicting the strength of projections, we found that ridge (L2-regularized) regression had the highest cross-validated accuracy with a median r2 score of 0.54 which corresponded for binarized predictions to a median area under the ROC value of 0.89. Next, we identified 200 spatial gene modules using a dictionary learning and sparse coding approach. We found that these modules yielded predictions of comparable accuracy, with a median r2 score of 0.51. Finally, a gene ontology enrichment analysis of the most predictive gene groups resulted in significant annotations related to postsynaptic function. Taken together, we have demonstrated a prediction workflow that can be used to perform multimodal data integration to improve the accuracy of the predicted mesoconnectome and support other neuroscience use cases.

Download Full-text

Prediction of a cell-type specific mouse mesoconnectome using gene expression data

10.1101/736520 ◽

2019 ◽

Author(s):

Nestor Timonidis ◽

Rembrandt Bakker ◽

Paul Tiesinga

Keyword(s):

Gene Expression ◽

Computational Models ◽

Missing Values ◽

Expression Patterns ◽

Enrichment Analysis ◽

Brain Regions ◽

Cell Type ◽

Cell Type Specificity ◽

Major Step ◽

Gene Modules

AbstractReconstructing brain connectivity at sufficient resolution for computational models designed to study the biophysical mechanisms underlying cognitive processes is extremely challenging. For such a purpose, a mesoconnectome that includes laminar and cell-type specificity would be a major step forward. We analysed the ability of gene expression patterns to predict cell-type and laminar specific projection patterns and analyzed the biological context of the most predictive groups of genes. To achieve our goal, we used publicly available volumetric gene expression and connectivity data and pre-processed it for prediction by averaging across brain regions, imputing missing values and rescaling. Afterwards, we predicted the strength of axonal projections and their binary form using expression patterns of individual genes and co-expression patterns of spatial gene modules.For predicting projection strength, we found that ridge (L2-regularized) regression had the highest cross-validated accuracy with a median r2 score of 0.54 which corresponded for binarized predictions to a median area under the ROC value of 0.89. Next, we identified 200 spatial gene modules using the dictionary learning and sparse coding approach. We found that these modules yielded predictions of comparable accuracy, with a median r2 score of 0.51. Finally, a gene ontology enrichment analysis of the most predictive gene groups resulted in significant annotations related to postsynaptic function.Taken together, we have demonstrated a prediction pipeline that can be used to perform multimodal data integration to improve the accuracy of the predicted mesoconnectome and support other neuroscience use cases.

Download Full-text

Pathways Enrichment Analysis of Gene Expression Data in Type 2 Diabetes

Methods in Molecular Biology - Type 2 Diabetes ◽

10.1007/978-1-4939-9882-1_7 ◽

2019 ◽

pp. 119-128

Author(s):

Maysson Ibrahim

Keyword(s):

Gene Expression ◽

Type 2 Diabetes ◽

Gene Expression Data ◽

Enrichment Analysis ◽

Expression Data

Download Full-text

Building Gene Networks by Analyzing Gene Expression Profiles

Advanced Methodologies and Technologies in Medicine and Healthcare - Advances in Medical Diagnosis, Treatment, and Care ◽

10.4018/978-1-5225-7489-7.ch003 ◽

2019 ◽

pp. 27-44

Author(s):

Crescenzio Gallo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Networks ◽

Dna Microarrays ◽

Expression Profiles ◽

Expression Patterns ◽

Gene Expression Profiles ◽

Expression Data ◽

Gene Expressions ◽

Over Time

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.

Download Full-text

CURVE-BASED CLUSTERING OF TIME COURSE GENE EXPRESSION DATA USING SELF-ORGANIZING MAPS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720009004291 ◽

2009 ◽

Vol 07 (04) ◽

pp. 645-661 ◽

Cited By ~ 11

Author(s):

XIN CHEN

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Regulatory Networks ◽

Time Course ◽

Clustering Algorithm ◽

Expression Patterns ◽

Self Organizing Map ◽

Expression Data ◽

Wide Range ◽

Self Organizing

There is an increasing interest in clustering time course gene expression data to investigate a wide range of biological processes. However, developing a clustering algorithm ideal for time course gene express data is still challenging. As timing is an important factor in defining true clusters, a clustering algorithm shall explore expression correlations between time points in order to achieve a high clustering accuracy. Moreover, inter-cluster gene relationships are often desired in order to facilitate the computational inference of biological pathways and regulatory networks. In this paper, a new clustering algorithm called CurveSOM is developed to offer both features above. It first presents each gene by a cubic smoothing spline fitted to the time course expression profile, and then groups genes into clusters by applying a self-organizing map-based clustering on the resulting splines. CurveSOM has been tested on three well-studied yeast cell cycle datasets, and compared with four popular programs including Cluster 3.0, GENECLUSTER, MCLUST, and SSClust. The results show that CurveSOM is a very promising tool for the exploratory analysis of time course expression data, as it is not only able to group genes into clusters with high accuracy but also able to find true time-shifted correlations of expression patterns across clusters.

Download Full-text

Revealing Biological Pathways Implicated in Lung Cancer from TCGA Gene Expression Data Using Gene Set Enrichment Analysis

Cancer Informatics ◽

10.4137/cin.s13882 ◽

2014 ◽

Vol 13s1 ◽

pp. CIN.S13882 ◽

Cited By ~ 4

Author(s):

Binghuang Cai ◽

Xia Jiang

Keyword(s):

Gene Expression ◽

Lung Cancer ◽

Gene Expression Data ◽

Lung Squamous Cell Carcinoma ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Expression Data ◽

Gene Set Enrichment ◽

Gene Set ◽

Pathway Gene

Analyzing biological system abnormalities in cancer patients based on measures of biological entities, such as gene expression levels, is an important and challenging problem. This paper applies existing methods, Gene Set Enrichment Analysis and Signaling Pathway Impact Analysis, to pathway abnormality analysis in lung cancer using microarray gene expression data. Gene expression data from studies of Lung Squamous Cell Carcinoma (LUSC) in The Cancer Genome Atlas project, and pathway gene set data from the Kyoto Encyclopedia of Genes and Genomes were used to analyze the relationship between pathways and phenotypes. Results, in the form of pathway rankings, indicate that some pathways may behave abnormally in LUSC. For example, both the cell cycle and viral carcinogenesis pathways ranked very high in LUSC. Furthermore, some pathways that are known to be associated with cancer, such as the p53 and the PI3K-Akt signal transduction pathways, were found to rank high in LUSC. Other pathways, such as bladder cancer and thyroid cancer pathways, were also ranked high in LUSC.

Download Full-text

Gene Expression Studies to Identify Significant Genes in AR, MTOR, MAPK Pathways and their Overlapping Regulatory Role in Prostate Cancer

Journal of Integrative Bioinformatics ◽

10.1515/jib-2018-0080 ◽

2019 ◽

Vol 16 (3) ◽

Author(s):

Nimisha Asati ◽

Abhinav Mishra ◽

Ankita Shukla ◽

Tiratha Raj Singh

Keyword(s):

Gene Expression ◽

Prostate Cancer ◽

Gene Expression Data ◽

Meta Analysis ◽

Expression Patterns ◽

Mitogen Activated Protein Kinase ◽

Expression Data ◽

Mapk Pathways ◽

Expression Studies ◽

Gene Expression Studies

AbstractGene expression studies revealed a large degree of variability in gene expression patterns particularly in tissues even in genetically identical individuals. It helps to reveal the components majorly fluctuating during the disease condition. With the advent of gene expression studies many microarray studies have been conducted in prostate cancer, but the results have varied across different studies. To better understand the genetic and biological regulatory mechanisms of prostate cancer, we conducted a meta-analysis of three major pathways i.e. androgen receptor (AR), mechanistic target of rapamycin (mTOR) and Mitogen-Activated Protein Kinase (MAPK) on prostate cancer. Meta-analysis has been performed for the gene expression data for the human species that are exposed to prostate cancer. Twelve datasets comprising AR, mTOR, and MAPK pathways were taken for analysis, out of which thirteen potential biomarkers were identified through meta-analysis. These findings were compiled based upon the quantitative data analysis by using different tools. Also, various interconnections were found amongst the pathways in study. Our study suggests that the microarray analysis of the gene expression data and their pathway level connections allows detection of the potential predictors that can prove to be putative therapeutic targets with biological and functional significance in progression of prostate cancer.

Download Full-text

A performance analysis of clustering based algorithms for the microarray gene expression data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.21.12172 ◽

2018 ◽

Vol 7 (2.21) ◽

pp. 201 ◽

Cited By ~ 2

Author(s):

K Yuvaraj ◽

D Manjula

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Dna Sequences ◽

Clustering Algorithms ◽

Expression Patterns ◽

Microarray Gene Expression Data ◽

Gene Clustering ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene

Current advancements in microarray technology permit simultaneous observing of the expression levels of huge number of genes over various time points. Microarrays have obtained amazing implication in the field of bioinformatics. It includes an ordered set of huge different Deoxyribonucleic Acid (DNA) sequences that can be used to measure both DNA as well as Ribonucleic Acid (RNA) dissimilarities. The Gene Expression (GE) summary aids in understanding the basic cause of gene activities, the growth of genes, determining recent disorders like cancer and as well analysing their molecular pharmacology. Clustering is a significant tool applied for analyzing such microarray gene expression data. It has developed into a greatest part of gene expression analysis. Grouping the genes having identical expression patterns is known as gene clustering. A number of clustering algorithms have been applied for the analysis of microarray gene expression data. The aim of this paper is to analyze the precision level of the microarray data by using various clustering algorithms.

Download Full-text

Performance Analysis of Hard and Soft Clustering Approaches For Gene Expression Data

International Journal of Rough Sets and Data Analysis ◽

10.4018/ijrsda.2015010104 ◽

2015 ◽

Vol 2 (1) ◽

pp. 58-69 ◽

Cited By ~ 8

Author(s):

P. K. Nizar Banu ◽

S. Andrews

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Mean Squared Error ◽

Clustering Algorithms ◽

Expression Patterns ◽

Absolute Error ◽

Tumor Formation ◽

Expression Data ◽

Self Organizing Maps ◽

Soft Clustering

Mining gene expression data is growing rapidly to predict gene expression patterns and assist clinicians in early diagnosis of tumor formation. Clustering gene expression data is the most important phase, helps in finding group of genes that are highly expressed and suppressed. This paper analyses the performance of most representative hard and soft off-line clustering algorithms: K-Means, Fuzzy C-Means, Self Organizing Maps (SOM) based clustering and Genetic Algorithm (GA) based clustering for brain tumor gene expression dataset. Clusters produced by the clustering algorithms are the indications of the cellular processes. Clustering results are evaluated using clustering indices such as Xie-Beni index (XB), Davies-Bouldin index (DB), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and Dunn's Index (DI) along with the time taken to find the compactness and separation of clusters. Experimental results prove soft clustering approaches works well to predict clusters of highly expressed and suppressed genes.

Download Full-text

A Fast Quad-Tree Based Two Dimensional Hierarchical Clustering

Bioinformatics and Biology Insights ◽

10.4137/bbi.s10383 ◽

2012 ◽

Vol 6 ◽

pp. BBI.S10383

Author(s):

Priscilla Rajadurai ◽

Swamynathan Sankaranarayanan

Keyword(s):

Gene Expression ◽

Hierarchical Clustering ◽

Gene Expression Data ◽

Biological Networks ◽

Processing Time ◽

Clustering Algorithm ◽

Expression Patterns ◽

Expression Data ◽

Important Time ◽

Analogous Expression

Recently, microarray technologies have become a robust technique in the area of genomics. An important step in the analysis of gene expression data is the identification of groups of genes disclosing analogous expression patterns. Cluster analysis partitions a given dataset into groups based on specified features. Euclidean distance is a widely used similarity measure for gene expression data that considers the amount of changes in gene expression. However, the huge number of genes and the intricacy of biological networks have highly increased the challenges of comprehending and interpreting the resulting group of data, increasing processing time. The proposed technique focuses on a QT based fast 2-dimensional hierarchical clustering algorithm to perform clustering. The construction of the closest pair data structure is an each level is an important time factor, which determines the processing time of clustering. The proposed model reduces the processing time and improves analysis of gene expression data.

Download Full-text

TimesVector-Web: A Web Service for Analysing Time Course Transcriptome Data with Multiple Conditions

Genes ◽

10.3390/genes13010073 ◽

2021 ◽

Vol 13 (1) ◽

pp. 73

Author(s):

Jaeyeon Jang ◽

Inseung Hwang ◽

Inuk Jung

Keyword(s):

Gene Expression ◽

Web Service ◽

Gene Expression Data ◽

Time Course ◽

Expression Patterns ◽

Differentially Expressed Gene ◽

Expression Data ◽

Transcriptomic Response ◽

Time Course Data ◽

Multiple Conditions

From time course gene expression data, we may identify genes that modulate in a certain pattern across time. Such patterns are advantageous to investigate the transcriptomic response to a certain condition. Especially, it is of interest to compare two or more conditions to detect gene expression patterns that significantly differ between them. Time course analysis can become difficult using traditional differentially expressed gene (DEG) analysis methods since they are based on pair-wise sample comparison instead of a series of time points. Most importantly, the related tools are mostly available as local Software, requiring technical expertise. Here, we present TimesVector-web, which is an easy to use web service for analysing time course gene expression data with multiple conditions. The web-service was developed to (1) alleviate the burden for analyzing multi-class time course data and (2) provide downstream analysis on the results for biological interpretation including TF, miRNA target, gene ontology and pathway analysis. TimesVector-web was validated using three case studies that use both microarray and RNA-seq time course data and showed that the results captured important biological findings from the original studies.

Download Full-text