scholarly journals GSAn: an alternative to enrichment analysis for annotating gene sets

2020 ◽  
Vol 2 (2) ◽  
Author(s):  
Aaron Ayllon-Benitez ◽  
Romain Bourqui ◽  
Patricia Thébault ◽  
Fleur Mougin

Abstract The revolution in new sequencing technologies is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze data that are grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in biology. However, these methods synthesize the biological information by a priori selecting the over-represented terms and may suffer from focusing on the most studied genes that represent a limited coverage of annotated genes within a gene set. Semantic similarity measures have shown great results within the pairwise gene comparison by making advantage of the underlying structure of the Gene Ontology. We developed GSAn, a novel gene set annotation method that uses semantic similarity measures to synthesize a priori Gene Ontology annotation terms. The originality of our approach is to identify the best compromise between the number of retained annotation terms that has to be drastically reduced and the number of related genes that has to be as large as possible. Moreover, GSAn offers interactive visualization facilities dedicated to the multi-scale analysis of gene set annotations. Compared to enrichment analysis tools, GSAn has shown excellent results in terms of maximizing the gene coverage while minimizing the number of terms.

2019 ◽  
Author(s):  
Aaron Ayllon-Benitez ◽  
Romain Bourqui ◽  
Patricia Thébaut ◽  
Fleur Mougin

AbstractThe revolution in new sequencing technologies, by strongly improving the production of omics data, is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze these massive data that are grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in biology. However, these methods synthesize the biological information by a priori selecting the over-represented terms and may suffer from focusing on the most studied genes that represent a limited coverage of annotated genes within the gene set.To address these limitations, we developed GSAn, a novel gene set annotation Web server that uses semantic similarity measures to reduce a priori Gene Ontology annotation terms. The originality of this new approach is to identify the best compromise between the number of retained annotation terms that has to be drastically reduced and the number of related genes that has to be as large as possible. Moreover, GSAn offers interactive visualization facilities dedicated to the multi-scale analysis of gene set annotations. GSAn is available at: https://gsan.labri.fr.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yi Chen ◽  
Fons. J. Verbeek ◽  
Katherine Wolstencroft

Abstract Background The hallmarks of cancer provide a highly cited and well-used conceptual framework for describing the processes involved in cancer cell development and tumourigenesis. However, methods for translating these high-level concepts into data-level associations between hallmarks and genes (for high throughput analysis), vary widely between studies. The examination of different strategies to associate and map cancer hallmarks reveals significant differences, but also consensus. Results Here we present the results of a comparative analysis of cancer hallmark mapping strategies, based on Gene Ontology and biological pathway annotation, from different studies. By analysing the semantic similarity between annotations, and the resulting gene set overlap, we identify emerging consensus knowledge. In addition, we analyse the differences between hallmark and gene set associations using Weighted Gene Co-expression Network Analysis and enrichment analysis. Conclusions Reaching a community-wide consensus on how to identify cancer hallmark activity from research data would enable more systematic data integration and comparison between studies. These results highlight the current state of the consensus and offer a starting point for further convergence. In addition, we show how a lack of consensus can lead to large differences in the biological interpretation of downstream analyses and discuss the challenges of annotating changing and accumulating biological data, using intermediate knowledge resources that are also changing over time.


2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Gaston K. Mazandu ◽  
Nicola J. Mulder

Several approaches have been proposed for computing term information content (IC) and semantic similarity scores within the gene ontology (GO) directed acyclic graph (DAG). These approaches contributed to improving protein analyses at the functional level. Considering the recent proliferation of these approaches, a unified theory in a well-defined mathematical framework is necessary in order to provide a theoretical basis for validating these approaches. We review the existing IC-based ontological similarity approaches developed in the context of biomedical and bioinformatics fields to propose a general framework and unified description of all these measures. We have conducted an experimental evaluation to assess the impact of IC approaches, different normalization models, and correction factors on the performance of a functional similarity metric. Results reveal that considering only parents or only children of terms when assessing information content or semantic similarity scores negatively impacts the approach under consideration. This study produces a unified framework for current and future GO semantic similarity measures and provides theoretical basics for comparing different approaches. The experimental evaluation of different approaches based on different term information content models paves the way towards a solution to the issue of scoring a term’s specificity in the GO DAG.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Katie Ovens ◽  
Farhad Maleki ◽  
B. Frank Eames ◽  
Ian McQuillan

Abstract Background Gene co-expression networks (GCNs) are not easily comparable due to their complex structure. In this paper, we propose a tool, Juxtapose, together with similarity measures that can be utilized for comparative transcriptomics between a set of organisms. While we focus on its application to comparing co-expression networks across species in evolutionary studies, Juxtapose is also generalizable to co-expression network comparisons across tissues or conditions within the same species. Methods A word embedding strategy commonly used in natural language processing was utilized in order to generate gene embeddings based on walks made throughout the GCNs. Juxtapose was evaluated based on its ability to embed the nodes of synthetic structures in the networks consistently while also generating biologically informative results. Evaluation of the techniques proposed in this research utilized RNA-seq datasets from GTEx, a multi-species experiment of prefrontal cortex samples from the Gene Expression Omnibus, as well as synthesized datasets. Biological evaluation was performed using gene set enrichment analysis and known gene relationships in literature. Results We show that Juxtapose is capable of globally aligning synthesized networks as well as identifying areas that are conserved in real gene co-expression networks without reliance on external biological information. Furthermore, output from a matching algorithm that uses cosine distance between GCN embeddings is shown to be an informative measure of similarity that reflects the amount of topological similarity between networks. Conclusions Juxtapose can be used to align GCNs without relying on known biological similarities and enables post-hoc analyses using biological parameters, such as orthology of genes, or conserved or variable pathways. Availability A development version of the software used in this paper is available at https://github.com/klovens/juxtapose


2020 ◽  
Vol 21 (21) ◽  
pp. 8333
Author(s):  
Chiara C. Bortolasci ◽  
Briana Spolding ◽  
Srisaiyini Kidnapillai ◽  
Timothy Connor ◽  
Trang T.T. Truong ◽  
...  

Although neurogenesis is affected in several psychiatric diseases, the effects and mechanisms of action of psychoactive drugs on neurogenesis remain unknown and/or controversial. This study aims to evaluate the effects of psychoactive drugs on the expression of genes involved in neurogenesis. Neuronal-like cells (NT2-N) were treated with amisulpride (10 µM), aripiprazole (0.1 µM), clozapine (10 µM), lamotrigine (50 µM), lithium (2.5 mM), quetiapine (50 µM), risperidone (0.1 µM), or valproate (0.5 mM) for 24 h. Genome wide mRNA expression was quantified and analysed using gene set enrichment analysis, with the neurogenesis gene set retrieved from the Gene Ontology database and the Mammalian Adult Neurogenesis Gene Ontology (MANGO) database. Transcription factors that are more likely to regulate these genes were investigated to better understand the biological processes driving neurogenesis. Targeted metabolomics were performed using gas chromatography-mass spectrometry. Six of the eight drugs decreased the expression of genes involved in neurogenesis in both databases. This suggests that acute treatment with these psychoactive drugs negatively regulates the expression of genes involved in neurogenesis in vitro. SOX2 and three of its target genes (CCND1, BMP4, and DKK1) were also decreased after treatment with quetiapine. This can, at least in part, explain the mechanisms by which these drugs decrease neurogenesis at a transcriptional level in vitro. These results were supported by the finding of increased metabolite markers of mature neurons following treatment with most of the drugs tested, suggesting increased proportions of mature relative to immature neurons consistent with reduced neurogenesis.


Author(s):  
Kosa Goucher-Lambert ◽  
Joshua T. Gyory ◽  
Kenneth Kotovsky ◽  
Jonathan Cagan

Abstract Design activity can be supported using inspirational stimuli (e.g., analogies, patents, etc.), by helping designers overcome impasses or in generating solutions with more positive characteristics during ideation. Design researchers typically generate inspirational stimuli a priori in order to investigate their impact. However, for a chosen stimulus to possess maximal utility, it should automatically reflect the current and ongoing progress of the designer. In this work, designers receive computationally selected inspirational stimuli midway through an ideation session in response to the state of their current solution. Sourced from a broad database of related example solutions, the semantic similarity between the content of the current design and concepts within the database determine which potential stimulus is received. Designers receive a particular stimulus based on three experimental conditions: a semantically near stimulus, a semantically far stimulus, or no stimulus (control). Results indicate that adaptive inspirational stimuli can be determined using Latent Semantic Analysis (LSA) and that semantic similarity measures are a promising approach for real-time monitoring of the design process. The ability to achieve differentiable near vs. far stimuli was validated using both semantic cosine similarity values and participant self-response ratings. As a further contribution, this work also explores the impact of different types of adaptive inspirational stimuli on design outcomes. Here, near inspirational stimuli increase the feasibility of design solutions. Results also demonstrate the significant impact of the overall inspirational stimulus innovativeness on final design outcomes, which may be greater than differences across individual sub-dimensions.


Sign in / Sign up

Export Citation Format

Share Document