scholarly journals Crowdsourcing biocuration: the Community Assessment of Community Annotation with Ontologies (CACAO)

2021 ◽  
Author(s):  
Jolene Ramsey ◽  
Brenley McIntosh ◽  
Daniel Renfro ◽  
Suzanne A Aleksander ◽  
Sandra LaBonte ◽  
...  

Experimental data about known gene functions curated from the primary literature have enormous value for research scientists in understanding biology. Using the Gene Ontology (GO), manual curation by experts has provided an important resource for studying gene function, especially within model organisms. Unprecedented expansion of the scientific literature and validation of the predicted proteins have increased both data value and the challenges of keeping pace. Capturing literature-based functional annotations is limited by the ability of biocurators to handle the massive and rapidly growing scientific literature. Within the community-oriented wiki framework for GO annotation called the Gene Ontology Normal Usage Tracking System (GONUTS), we describe an approach to expand biocuration through crowdsourcing with undergraduates. This multiplies the number of high-quality annotations in international databases, enriches our coverage of the literature on normal gene function, and pushes the field in new directions. From an intercollegiate competition judged by experienced biocurators, Community Assessment of Community Annotation with Ontologies (CACAO), we have contributed nearly 5000 literature-based annotations. Many of those annotations are to organisms not currently well-represented within GO. Over a ten-year history, our community contributors have spurred changes to the ontology not traditionally covered by professional biocurators. The CACAO principle of relying on community members to participate in and shape the future of biocuration in GO is a powerful and scalable model used to promote the scientific enterprise. It also provides undergraduate students with a unique and enriching introduction to critical reading of primary literature and acquisition of marketable skills. Significance Statement: The primary scientific literature catalogs the results from publicly funded scientific research about gene function in human-readable format. Information captured from those studies in a widely adopted, machine-readable standard format comes in the form of Gene Ontology annotations about gene functions from all domains of life. Manual annotations based on inferences directly from the scientific literature, including the evidence used to make such inferences, represents the best return on investment by improving data accessibility across the biological sciences. To supplement professional curation, our CACAO project enabled annotation of the scientific literature by community annotators, in this case undergraduates, which resulted in contribution of thousands of validated entries to public resources. These annotations are now being used by scientists worldwide.

2019 ◽  
Vol 48 (D1) ◽  
pp. D650-D658 ◽  
Author(s):  
◽  
Julie Agapite ◽  
Laurent-Philippe Albou ◽  
Suzi Aleksander ◽  
Joanna Argasinska ◽  
...  

Abstract The Alliance of Genome Resources (Alliance) is a consortium of the major model organism databases and the Gene Ontology that is guided by the vision of facilitating exploration of related genes in human and well-studied model organisms by providing a highly integrated and comprehensive platform that enables researchers to leverage the extensive body of genetic and genomic studies in these organisms. Initiated in 2016, the Alliance is building a central portal (www.alliancegenome.org) for access to data for the primary model organisms along with gene ontology data and human data. All data types represented in the Alliance portal (e.g. genomic data and phenotype descriptions) have common data models and workflows for curation. All data are open and freely available via a variety of mechanisms. Long-term plans for the Alliance project include a focus on coverage of additional model organisms including those without dedicated curation communities, and the inclusion of new data types with a particular focus on providing data and tools for the non-model-organism researcher that support enhanced discovery about human health and disease. Here we review current progress and present immediate plans for this new bioinformatics resource.


2018 ◽  
Author(s):  
Valerie Wood ◽  
Antonia Lock ◽  
Midori A. Harris ◽  
Kim Rutherford ◽  
Jürg Bähler ◽  
...  

AbstractThe first decade of genome sequencing stimulated an explosion in the characterization of unknown proteins. More recently, the pace of functional discovery has slowed, leaving around 20% of the proteins even in well-studied model organisms without informative descriptions of their biological roles. Remarkably, many uncharacterized proteins are conserved from yeasts to human, suggesting that they contribute to fundamental biological processes. To fully understand biological systems in health and disease, we need to account for every part of the system. Unstudied proteins thus represent a collective blind spot that limits the progress of both basic and applied biosciences.We use a simple yet powerful metric based on Gene Ontology (GO) biological process terms to define characterized and uncharacterized proteins for human, budding yeast, and fission yeast. We then identify a set of conserved but unstudied proteins in S. pombe, and classify them based on a combination of orthogonal attributes determined by large-scale experimental and comparative methods. Finally, we explore possible reasons why these proteins remain neglected, and propose courses of action to raise their profile and thereby reap the benefits of completing the catalog of proteins’ biological roles.


mSphere ◽  
2020 ◽  
Vol 5 (1) ◽  
Author(s):  
Michelle Spoto ◽  
Changhui Guan ◽  
Elizabeth Fleming ◽  
Julia Oh

ABSTRACT The CRISPR/Cas system has significant potential to facilitate gene editing in a variety of bacterial species. CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) represent modifications of the CRISPR/Cas9 system utilizing a catalytically inactive Cas9 protein for transcription repression and activation, respectively. While CRISPRi and CRISPRa have tremendous potential to systematically investigate gene function in bacteria, few programs are specifically tailored to identify guides in draft bacterial genomes genomewide. Furthermore, few programs offer open-source code with flexible design parameters for bacterial targeting. To address these limitations, we created GuideFinder, a customizable, user-friendly program that can design guides for any annotated bacterial genome. GuideFinder designs guides from NGG protospacer-adjacent motif (PAM) sites for any number of genes by the use of an annotated genome and FASTA file input by the user. Guides are filtered according to user-defined design parameters and removed if they contain any off-target matches. Iteration with lowered parameter thresholds allows the program to design guides for genes that did not produce guides with the more stringent parameters, one of several features unique to GuideFinder. GuideFinder can also identify paired guides for targeting multiplicity, whose validity we tested experimentally. GuideFinder has been tested on a variety of diverse bacterial genomes, finding guides for 95% of genes on average. Moreover, guides designed by the program are functionally useful—focusing on CRISPRi as a potential application—as demonstrated by essential gene knockdown in two staphylococcal species. Through the large-scale generation of guides, this open-access software will improve accessibility to CRISPR/Cas studies of a variety of bacterial species. IMPORTANCE With the explosion in our understanding of human and environmental microbial diversity, corresponding efforts to understand gene function in these organisms are strongly needed. CRISPR/Cas9 technology has revolutionized interrogation of gene function in a wide variety of model organisms. Efficient CRISPR guide design is required for systematic gene targeting. However, existing tools are not adapted for the broad needs of microbial targeting, which include extraordinary species and subspecies genetic diversity, the overwhelming majority of which is characterized by draft genomes. In addition, flexibility in guide design parameters is important to consider the wide range of factors that can affect guide efficacy, many of which can be species and strain specific. We designed GuideFinder, a customizable, user-friendly program that addresses the limitations of existing software and that can design guides for any annotated bacterial genome with numerous features that facilitate guide design in a wide variety of microorganisms.


2019 ◽  
Vol 18 (4) ◽  
pp. ar56 ◽  
Author(s):  
April A. Nelms ◽  
Miriam Segura-Totten

Student engagement in the analysis of primary scientific literature increases critical thinking, scientific literacy, data evaluation, and science process skills. However, little is known about the process by which expertise in reading scientific articles develops. For this reason, we decided to compare how faculty experts and student novices engage with a research article. We performed think-aloud interviews of biology faculty and undergraduates as they read through a scientific article. We analyzed these interviews using qualitative methods. We grounded data interpretation in cognitive load theory and the ICAP (interactive, constructive, active, and passive) framework. Our results revealed that faculty have more complex schemas than students and that they reduce cognitive load through two main mechanisms: summarizing and note-taking. Faculty also engage with articles at a higher cognitive level, described as constructive by the ICAP framework, when compared with students. More complex schemas, effectively lowering cognitive load, and deeper engagement with the text may help explain why faculty encounter fewer comprehension difficulties than students in our study. Finally, faculty analyze and evaluate data more often than students when reading the text. Findings include a discussion of successful pedagogical approaches for instructors wishing to enhance undergraduates’ comprehension and analysis of research articles.


Parasitology ◽  
2012 ◽  
Vol 139 (5) ◽  
pp. 589-604 ◽  
Author(s):  
JOHNATHAN J. DALZELL ◽  
NEIL D. WARNOCK ◽  
PAUL MCVEIGH ◽  
NIKKI J. MARKS ◽  
ANGELA MOUSLEY ◽  
...  

SUMMARYAlmost a decade has passed since the first report of RNA interference (RNAi) in a parasitic helminth. Whilst much progress has been made with RNAi informing gene function studies in disparate nematode and flatworm parasites, substantial and seemingly prohibitive difficulties have been encountered in some species, hindering progress. An appraisal of current practices, trends and ideals of RNAi experimental design in parasitic helminths is both timely and necessary for a number of reasons: firstly, the increasing availability of parasitic helminth genome/transcriptome resources means there is a growing need for gene function tools such as RNAi; secondly, fundamental differences and unique challenges exist for parasite species which do not apply to model organisms; thirdly, the inherent variation in experimental design, and reported difficulties with reproducibility undermine confidence. Ideally, RNAi studies of gene function should adopt standardised experimental design to aid reproducibility, interpretation and comparative analyses. Although the huge variations in parasite biology and experimental endpoints make RNAi experimental design standardization difficult or impractical, we must strive to validate RNAi experimentation in helminth parasites. To aid this process we identify multiple approaches to RNAi experimental validation and highlight those which we deem to be critical for gene function studies in helminth parasites.


2019 ◽  
Author(s):  
Matej Mihelčić ◽  
Tomislav Šmuc ◽  
Fran Supek

AbstractGenes with similar roles in the cell are known to cluster on chromosomes, thus benefiting from coordinated regulation. This allows gene function to be inferred by transferring annotations from genomic neighbors, following the guilt-by-association principle. We performed a systematic search for co-occurrence of >1000 gene functions in genomic neighborhoods across 1669 prokaryotic, 49 fungal and 80 metazoan genomes, revealing prevalent patterns that cannot be explained by clustering of functionally similar genes. It is a very common occurrence that pairs of dissimilar gene functions – corresponding to semantically distant Gene Ontology terms – are significantly co-located on chromosomes. These neighborhood associations are often as conserved across genomes as the known associations between similar functions, suggesting selective benefits from clustering of certain diverse functions, which may conceivably play complementary roles in the cell. We propose a simple encoding of chromosomal gene order, the neighborhood function profiles (NFP), which draws on diverse gene clustering patterns to predict gene function and phenotype. NFPs yield a 26-46% increase in predictive power over state-of-the-art approaches that propagate function across neighborhoods, thus providing hundreds of novel, high-confidence gene function inferences per genome. Furthermore, we demonstrate that the effect of structural variation on gene function distribution across chromosomes may be used to predict phenotype of individuals from their genome sequence.


FACETS ◽  
2020 ◽  
Vol 5 (1) ◽  
pp. 642-650
Author(s):  
Christopher J. Lortie ◽  
Malory Owen

There is a gap between fundamental science and managers. There are many general solutions including the need to better leverage the primary scientific literature for decision-making. Herein, we provide a list of 10 simple rules to support environmental management through better scientific writing and suggest practices for more transparent publications. These rules can also be used as a checklist for reusing the primary literature when searching for relevant evidence in the environmental sciences. We need to better structure knowledge in papers for connections within sustainable societies.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 153 ◽  
Author(s):  
Chirag Gupta ◽  
Andy Pereira

Predicting gene functions from genome sequence alone has been difficult, and the functions of a large fraction of plant genes remain unknown. However, leveraging the vast amount of currently available gene expression data has the potential to facilitate our understanding of plant gene functions, especially in determining complex traits. Gene coexpression networks—created by integrating multiple expression datasets—connect genes with similar patterns of expression across multiple conditions. Dense gene communities in such networks, commonly referred to as modules, often indicate that the member genes are functionally related. As such, these modules serve as tools for generating new testable hypotheses, including the prediction of gene function and importance. Recently, we have seen a paradigm shift from the traditional “global” to more defined, context-specific coexpression networks. Such coexpression networks imply genetic correlations in specific biological contexts such as during development or in response to a stress. In this short review, we highlight a few recent studies that attempt to fill the large gaps in our knowledge about cellular functions of plant genes using context-specific coexpression networks.


Sign in / Sign up

Export Citation Format

Share Document