scholarly journals GOblet: Annotation of anonymous sequence data with Gene Ontology and Pathway terms

2008 ◽  
Vol 5 (2) ◽  
Author(s):  
Detlef Groth ◽  
Stefanie Hartmann ◽  
Georgia Panopoulou ◽  
Albert J. Poustka ◽  
Steffen Hennig

SummaryThe functional annotation of genomic data has become a major task for the ever-growing number of sequencing projects. In order to address this challenge, we recently developed GOblet, a free web service for the annotation of anonymous sequences with Gene Ontology (GO) terms. However, to overcome limitations of the GO terminology, and to aid in understanding not only single components but as well systemic interactions between the individual components, we have now extended the GOblet web service to integrate also pathway annotations. Furthermore, we extended and upgraded the data analysis pipeline with improved summaries, and added term enrichment and clustering algorithms. Finally, we are now making GOblet available as a stand-alone application for high-throughput processing on local machines. The advantages of this frequently requested feature is that a) the user can avoid restrictions of our web service for uploading and processing large amounts of data, and that b) confidential data can be analysed without insecure transfer to a public web server. The stand-alone version of the web service has been implemented using platform independent Tcl-scripts, which can be run with just a single runtime file utilizing the Starkit technology. The GOblet web service and the stand-alone application are freely available at http://goblet.molgen.mpg.de.

2020 ◽  
Vol 11 ◽  
Author(s):  
Alejandro Abdala Asbun ◽  
Marc A. Besseling ◽  
Sergio Balzano ◽  
Judith D. L. van Bleijswijk ◽  
Harry J. Witte ◽  
...  

Marker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene, or even only parts of a single gene rather than the entire genome, the number of reads needed per sample to assess the microbial community structure is lower than that required for metagenome sequencing. This makes marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a scalable, flexible, and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) or Amplicon Sequence Variants (ASVs) in BIOM and text format and representative sequences. Cascabel is a highly versatile software that allows users to customize several steps of the pipeline, such as selecting from a set of OTU clustering methods or performing ASV analysis. In addition, we designed Cascabel to run in any linux/unix computing environment from desktop computers to computing servers making use of parallel processing if possible. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL.


2019 ◽  
Author(s):  
Alejandro Abdala Asbun ◽  
Marc A Besseling ◽  
Sergio Balzano ◽  
Judith van Bleijswijk ◽  
Harry Witte ◽  
...  

ABSTRACTMarker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene rather than the entire genome, the number of reads needed per sample is lower than that required for metagenome sequencing, making marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a flexible and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) and a representative sequence tree. Our pipeline allows customizing the analyses by offering several choices for most of the steps, for example different OTU generating methods. The pipeline can make use of multiple computing nodes and scales from personal computers to computing servers. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL and licensed under GNU GPLv3.


2020 ◽  
Author(s):  
Maarten JMF Reijnders ◽  
Robert M Waterhouse

AbstractThe Gene Ontology (GO) is a cornerstone of functional genomics research that drives discoveries through knowledge-informed computational analysis of biological data from large- scale assays. Key to this success is how the GO can be used to support hypotheses or conclusions about the biology or evolution of a study system by identifying annotated functions that are overrepresented in subsets of genes of interest. Graphical visualisations of such GO term enrichment results are critical to aid interpretation and avoid biases by presenting researchers with intuitive visual data summaries. Amongst current visualisation tools and resources there is a lack of standalone open-source software solutions that facilitate systematic comparisons of multiple lists of GO terms. To address this we developed GO-Figure!, an open-source Python software for producing user-customisable semantic similarity scatterplots of redundancy-reduced GO term lists. The lists are simplified by grouping together GO terms with similar functions using their quantified information contents and semantic similarities, with user-control over grouping thresholds. Representatives are then selected for plotting in two-dimensional semantic space where similar GO terms are placed closer to each other on the scatterplot, with an array of user-customisable graphical attributes. GO-Figure! offers a simple solution for command-line plotting of informative summary visualisations of lists of GO terms, designed to support exploratory data analyses and multiple dataset comparisons.


2021 ◽  
Vol 1 ◽  
Author(s):  
Maarten J. M. F. Reijnders ◽  
Robert M. Waterhouse

The Gene Ontology (GO) is a cornerstone of functional genomics research that drives discoveries through knowledge-informed computational analysis of biological data from large-scale assays. Key to this success is how the GO can be used to support hypotheses or conclusions about the biology or evolution of a study system by identifying annotated functions that are overrepresented in subsets of genes of interest. Graphical visualizations of such GO term enrichment results are critical to aid interpretation and avoid biases by presenting researchers with intuitive visual data summaries. Amongst current visualization tools and resources there is a lack of standalone open-source software solutions that facilitate explorations of key features of multiple lists of GO terms. To address this we developed GO-Figure!, an open-source Python software for producing user-customisable semantic similarity scatterplots of redundancy-reduced GO term lists. The lists are simplified by grouping together terms with similar functions using their quantified information contents and semantic similarities, with user-control over grouping thresholds. Representatives are then selected for plotting in two-dimensional semantic space where similar terms are placed closer to each other on the scatterplot, with an array of user-customisable graphical attributes. GO-Figure! offers a simple solution for command-line plotting of informative summary visualizations of lists of GO terms, designed to support exploratory data analyses and dataset comparisons.


2015 ◽  
pp. 27-43 ◽  
Author(s):  
Rui Yamaguchi ◽  
Seiya Imoto ◽  
Satoru Miyano

2018 ◽  
Author(s):  
Eugene W. Hinderer ◽  
Robert M. Flight ◽  
Rashmi Dubey ◽  
James N. MacLeod ◽  
Hunter N.B. Moseley

AbstractGene-annotation enrichment is a common method for utilizing ontology-based annotations in these gene and gene-product centric knowledgebases. Effective utilization of these annotations requires inferring semantic linkages by tracing paths through the ontology through edges in the ontological graph, referred to as relations. However, some relations are semantically problematic with respect to scope, necessitating their omission lest erroneous term mappings occur. To address these issues, we present GOcats, a novel tool that organizes the Gene Ontology (GO) into subgraphs representing user-defined concepts, while ensuring that all appropriate relations are congruent with respect to scoping semantics. Here, we demonstrate the improvements in annotation enrichment by re-interpreting edges that would otherwise be omitted by traditional ancestor path-tracing methods.We demonstrate that GOcats’ unique handling of relations improves enrichment over conventional methods in the analysis of two different gene-expression datasets: a breast cancer microarray dataset and several horse cartilage development RNAseq datasets. With the breast cancer microarray dataset, we observed significant improvement (one-sided binomial test p-value=1.86E-25) in 182 of 217 significantly enriched GO terms identified from the conventional path traversal method when GOcats’ path traversal was used. We also found new significantly enriched terms using GOcats, whose biological relevancy has been experimentally demonstrated elsewhere. Likewise, on the horse RNAseq datasets, we observed a significant improvement in GO term enrichment when using GOcat’s path traversal: one-sided binomial test p-values range from 1.32E-03 to 2.58E-44.


2019 ◽  
Vol 24 (3) ◽  
pp. 213-223 ◽  
Author(s):  
Raimo Franke ◽  
Bettina Hinkelmann ◽  
Verena Fetz ◽  
Theresia Stradal ◽  
Florenz Sasse ◽  
...  

Mode of action (MoA) identification of bioactive compounds is very often a challenging and time-consuming task. We used a label-free kinetic profiling method based on an impedance readout to monitor the time-dependent cellular response profiles for the interaction of bioactive natural products and other small molecules with mammalian cells. Such approaches have been rarely used so far due to the lack of data mining tools to properly capture the characteristics of the impedance curves. We developed a data analysis pipeline for the xCELLigence Real-Time Cell Analysis detection platform to process the data, assess and score their reproducibility, and provide rank-based MoA predictions for a reference set of 60 bioactive compounds. The method can reveal additional, previously unknown targets, as exemplified by the identification of tubulin-destabilizing activities of the RNA synthesis inhibitor actinomycin D and the effects on DNA replication of vioprolide A. The data analysis pipeline is based on the statistical programming language R and is available to the scientific community through a GitHub repository.


2004 ◽  
Vol 32 (Web Server) ◽  
pp. W313-W317 ◽  
Author(s):  
D. Groth ◽  
H. Lehrach ◽  
S. Hennig

2021 ◽  
Author(s):  
Michelle Bieger ◽  
Quentin Changeat

<p>Retrieval tools provide a way of determining an exoplanet atmosphere's temperature structure and composition with an observed planetary spectrum, working backwards to determine the chemistry and temperature by iteratively comparing synthetic spectra that have been constructed via a forward model to the observed spectra and determining a best-fit result (Barstow and Heng, 2020). This talk will be presenting the emission and reanalysed transmission spectrum and retrieval analysis of WASP-79b, an inflated hot Jupiter first detected by Smalley et al. (2012). Previous transmission spectra of WASP-79b has been analysed in Sozten et al. (2020), Skaf et al. (2020), and Rathcke et al. (2021); all studies agreeing on detections of H2O with various confidence levels, with the latter finding moderate evidence of an H- bound-free opacity compared to iron hydride abundance found by the other studies. Using the publicly available \verb+Iraclis+ data analysis pipeline and the Bayesian atmospheric retrieval framework TauREx 3, we will be adding to the global picture of this planet by examining the Hubble Space Telescope emission spectra as captured by the Wide Field Camera 3 G141 grism (PI: David Sing, proposal ID: 14767). </p>


2010 ◽  
Vol 6 (S274) ◽  
pp. 268-273
Author(s):  
N. Mandolesi ◽  
C. Burigana ◽  
A. Gruppuso ◽  
P. Procopio ◽  
S. Ricciardi

AbstractThis paper provides an overview of the ESA Planck mission and its scientific promises. Planck is equipped with a 1.5–m effective aperture telescope with two actively-cooled instruments observing the sky in nine frequency channels from 30 GHz to 857 GHz: the Low Frequency Instrument (LFI) operating at 20 K with pseudo-correlation radiometers, and the High Frequency Instrument (HFI) with bolometers operating at 100 mK. After the successful launch in May 2009, Planck has already mapped the sky twice (at the time of writing this review) with the expected behavior and it is planned to complete at least two further all-sky surveys. The first scientific results, consisting of an Early Release Compact Source Catalog (ERCSC) and in about twenty papers on instrument performance in flight, data analysis pipeline, and main astrophysical results, will be released on January 2011. The first publications of the main cosmological implications are expected in 2012.


Sign in / Sign up

Export Citation Format

Share Document