community curation
Recently Published Documents


TOTAL DOCUMENTS

29
(FIVE YEARS 11)

H-INDEX

6
(FIVE YEARS 2)

Author(s):  
Wouter Addink ◽  
Sharif Islam ◽  
Jose Alonso

DiSSCo (Distributed System of Scientific Collections) is a research infrastructure (RI) under development, which will provide services for the global research community to support and enhance physical and digital access to the natural history collections in Europe. These services include training, support, documentation and e-services. This talk will focus on the e-services and will give an overview of the current status, roadmap and first results as an introduction to the next talks in the session, which focus on some of the services in more detail and the standards work undertaken in Biodiversity Information Standards (TDWG) to enable them. The RI community will provide the envisioned e-services, which will use the novel FAIR Digital Object (FDO) infrastructure serving digital specimens from the European collections. The infrastructure will provide integrated data analysis, enhanced interpretation, annotation and access services for community curation and visualisation. The FDO infrastructure enables specimen data to be (re-)connected with genomic, geographical, morphological, taxonomic and environmental information through the digital specimen, making them Digital Extended Specimens. A large number of user stories have been collected through the DiSSCo-linked projects ICEDIG, SYNTHESYS+ and DiSSCo Prepare, to guide which e-Services to build and what functionality to provide. These user stories are publicly available in a github repository. The e-services are developed based on the user stories and prioritization provided by collection providers and the scientific community. A variety of mechanisms are used to collect input: surveys, workshops, roundtables and workpackage meetings, and feedback from users that have already been using beta versions of some of the services. DiSSCo aims to become operational in 2026 but several of the services are already being piloted or implemented. Experimental services and demonstrators are publicly available through DiSSCo Labs for testing and feedback. By connecting the specimen data with derived and related information in a FAIR way (Findable, Accessible, Interoperable and Reusable), the e-services will accelerate biodiversity discovery and support novel research questions. The FDO infrastructure has a data model that also integrates the PROV Ontology (PROV-O), which allows for the e-services to capture activities and improve the visibility of researcher contributions. This vision towards FAIR and high quality data is essential for community curation of the specimen data and making better use of the limited number of experts available. To provide the DiSSCo e-services in a FAIR way, the data derived from the natural history collections in Europe needs to be integrated as one virtual collection. The data has to be findable and accessible as soon as it is being created for services like a Specimen Data Refinery prior to publication in a facility like GBIF (Global Biodiversity Information Facility). This requires new standards for describing collections and specimen data. Standards being created to fill these gaps are TDWG CD (Collection Descriptions) and TDWG MIDS (Minimum Information about a Digital Specimen). The DiSSCo e-Services vision brings the data, standards, and processes together to serve the user community.


2020 ◽  
Author(s):  
John Zobolas ◽  
Jin-Dong Kim ◽  
Martin Kuiper ◽  
Steven Vercruysse

One of the many challenges that biocurators face, is the continuous evolution of ontologies and controlled vocabularies and their lack of coverage of biological concepts. To help biocurators annotate new information that cannot yet be covered with terms from authoritative resources, we produced an update of PubDictionaries: a resource of publicly editable, simple-structured dictionaries, accessible through a dedicated REST API. PubDictionaries was equipped with both an enhanced API and a new software client that connects it to the Unified Biological Dictionaries (UBDs) uniform data exchange format. This client enables efficient search and retrieval of ad hoc created terms, and easy integration with tools that further support the curator’s specific annotation tasks. A demo that combines the Visual Syntax Method (VSM) interface for general-purpose knowledge formalization, with this new PubDictionaries-powered UBD client, shows it is now easy to incorporate the user-created PubDictionaries terminologies into biocuration tools.


2020 ◽  
Vol 49 (D1) ◽  
pp. D1464-D1471 ◽  
Author(s):  
Guignon Valentin ◽  
Toure Abdel ◽  
Droc Gaëtan ◽  
Dufayard Jean-François ◽  
Conte Matthieu ◽  
...  

Abstract Comparative genomics is the analysis of genomic relationships among different species and serves as a significant base for evolutionary and functional genomic studies. GreenPhylDB (https://www.greenphyl.org) is a database designed to facilitate the exploration of gene families and homologous relationships among plant genomes, including staple crops critically important for global food security. GreenPhylDB is available since 2007, after the release of the Arabidopsis thaliana and Oryza sativa genomes and has undergone multiple releases. With the number of plant genomes currently available, it becomes challenging to select a single reference for comparative genomics studies but there is still a lack of databases taking advantage several genomes by species for orthology detection. GreenPhylDBv5 introduces the concept of comparative pangenomics by harnessing multiple genome sequences by species. We created 19 pangenes and processed them with other species still relying on one genome. In total, 46 plant species were considered to build gene families and predict their homologous relationships through phylogenetic-based analyses. In addition, since the previous publication, we rejuvenated the website and included a new set of original tools including protein-domain combination, tree topologies searches and a section for users to store their own results in order to support community curation efforts.


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Valerio Arnaboldi ◽  
Daniela Raciti ◽  
Kimberly Van Auken ◽  
Juancarlos N Chan ◽  
Hans-Michael Müller ◽  
...  

Abstract Biological knowledgebases rely on expert biocuration of the research literature to maintain up-to-date collections of data organized in machine-readable form. To enter information into knowledgebases, curators need to follow three steps: (i) identify papers containing relevant data, a process called triaging; (ii) recognize named entities; and (iii) extract and curate data in accordance with the underlying data models. WormBase (WB), the authoritative repository for research data on Caenorhabditis elegans and other nematodes, uses text mining (TM) to semi-automate its curation pipeline. In addition, WB engages its community, via an Author First Pass (AFP) system, to help recognize entities and classify data types in their recently published papers. In this paper, we present a new WB AFP system that combines TM and AFP into a single application to enhance community curation. The system employs string-searching algorithms and statistical methods (e.g. support vector machines (SVMs)) to extract biological entities and classify data types, and it presents the results to authors in a web form where they validate the extracted information, rather than enter it de novo as the previous form required. With this new system, we lessen the burden for authors, while at the same time receive valuable feedback on the performance of our TM tools. The new user interface also links out to specific structured data submission forms, e.g. for phenotype or expression pattern data, giving the authors the opportunity to contribute a more detailed curation that can be incorporated into WB with minimal curator review. Our approach is generalizable and could be applied to additional knowledgebases that would like to engage their user community in assisting with the curation. In the five months succeeding the launch of the new system, the response rate has been comparable with that of the previous AFP version, but the quality and quantity of the data received has greatly improved.


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Antonia Lock ◽  
Midori A Harris ◽  
Kim Rutherford ◽  
Jacqueline Hayles ◽  
Valerie Wood

Abstract Maximizing the impact and value of scientific research requires efficient knowledge distribution, which increasingly depends on the integration of standardized published data into online databases. To make data integration more comprehensive and efficient for fission yeast research, PomBase has pioneered a community curation effort that engages publication authors directly in FAIR-sharing of data representing detailed biological knowledge from hypothesis-driven experiments. Canto, an intuitive online curation tool that enables biologists to describe their detailed functional data using shared ontologies, forms the core of PomBase’s system. With 8 years’ experience, and as the author response rate reaches 50%, we review community curation progress and the insights we have gained from the project. We highlight incentives and nudges we deploy to maximize participation, and summarize project outcomes, which include increased knowledge integration and dissemination as well as the unanticipated added value arising from co-curation by publication authors and professional curators.


PLoS ONE ◽  
2019 ◽  
Vol 14 (10) ◽  
pp. e0224086 ◽  
Author(s):  
Marcela K. Tello-Ruiz ◽  
Cristina F. Marco ◽  
Fei-Man Hsu ◽  
Rajdeep S. Khangura ◽  
Pengfei Qiao ◽  
...  

2019 ◽  
Vol 21 (5) ◽  
pp. 1697-1705 ◽  
Author(s):  
Jon Ison ◽  
Hervé Ménager ◽  
Bryan Brancotte ◽  
Erik Jaaniso ◽  
Ahto Salumets ◽  
...  

Abstract The corpus of bioinformatics resources is huge and expanding rapidly, presenting life scientists with a growing challenge in selecting tools that fit the desired purpose. To address this, the European Infrastructure for Biological Information is supporting a systematic approach towards a comprehensive registry of tools and databases for all domains of bioinformatics, provided under a single portal (https://bio.tools). We describe here the practical means by which scientific communities, including individual developers and projects, through major service providers and research infrastructures, can describe their own bioinformatics resources and share these via bio.tools.


2019 ◽  
Vol 67 (1) ◽  
Author(s):  
Lina Ma ◽  
Jiabao Cao ◽  
Lin Liu ◽  
Zhao Li ◽  
Huma Shireen ◽  
...  

2019 ◽  
Author(s):  
Marcela K. Tello-Ruiz ◽  
Cristina F. Marco ◽  
Fei-Man Hsu ◽  
Rajdeep S. Khangura ◽  
Pengfei Qiao ◽  
...  

AbstractThe sophistication of gene prediction algorithms and the abundance of RNA-based evidence for the maize genome may suggest that manual curation of gene models is no longer necessary. However, quality metrics generated by the MAKER-P gene annotation pipeline identified 17,225 of 130,330 (13%) protein-coding transcripts in the B73 Reference Genome V4 gene set with models of low concordance to available biological evidence. Working with eight graduate students, we used the Apollo annotation editor to curate 86 transcript models flagged by quality metrics and a complimentary method using the Gramene gene tree visualizer. All of the triaged models had significant errors – including missing or extra exons, non-canonical splice sites, and incorrect UTRs. A correct transcript model existed for about 60% of genes (or transcripts) flagged by quality metrics; we attribute this to the convention of elevating the transcript with the longest coding sequence (CDS) to the canonical, or first, position. The remaining 40% of flagged genes resulted in novel annotations and represent a manual curation space of about 10% of the maize genome (~4,000 protein-coding genes). MAKER-P metrics have a specificity of 100%, and a sensitivity of 85%; the gene tree visualizer has a specificity of 100%. Together with the Apollo graphical editor, our double triage provides an infrastructure to support the community curation of eukaryotic genomes by scientists, students, and potentially even citizen scientists.


Sign in / Sign up

Export Citation Format

Share Document