scholarly journals PlutoF: Biodiversity data management platform for the complete data lifecycle

Author(s):  
Urmas Kõljalg ◽  
Kessy Abarenkov ◽  
Allan Zirk ◽  
Veljo Runnel ◽  
Timo Piirmann ◽  
...  

PlutoF online platform (https://plutof.ut.ee) is built for the management of biodiversity data. The concept is to provide a common workbench where the full data lifecycle can be managed and support seamless data sharing between single users, workgroups and institutions. Today, large and sophisticated biodiversity datasets are increasingly developed and managed by international workgroups. PlutoF's ambition is to serve such collaborative projects as well as to provide data management services to single users, museum or private collections and research institutions. Data management in PlutoF follows a logical order of the data lifecycle Fig. 1. At first, project metadata is uploaded including the project description, data management plan, participants, sampling areas, etc. Data upload and management activities then follow which is often linked to the internal data sharing. Some data analyses can be performed directly in the workbench or data can be exported in standard formats. PlutoF includes also data publishing module. Users can publish their data, generating a citable DOI without datasets leaving PlutoF workbench. PlutoF is part of the DataCite collaboration (https://datacite.org) and so far released more than 600 000 DOIs. Another option is to publish observation or collection datasets via the GBIF (Global Biodiversity Information Facility) portal. A. new feature implemented in 2019 allows users to publish High Throughput Sequencing data as taxon occurrences in GBIF. There is an additional option to send specific datasets directly to the Pensoft online journals. Ultimately, PlutoF works as a data archive which completes the data life cycle. In PlutoF users can manage different data types. Most common types include specimen and living specimen data, nucleotide sequences, human observations, material samples, taxonomic backbones and ecological data. Another important feature is that these data types can be managed as a single datasets or projects. PlutoF follows several biodiversity standards. Examples include Darwin Core, GGBN (Global Genome Biodiversity Network), EML (Ecological Metadata Language), MCL (Microbiological Common Language), and MIxS (Minimum Information about any (x) Sequence).

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11333
Author(s):  
Daniyar Karabayev ◽  
Askhat Molkenov ◽  
Kaiyrgali Yerulanuly ◽  
Ilyas Kabimoldayev ◽  
Asset Daniyarov ◽  
...  

Background High-throughput sequencing platforms generate a massive amount of high-dimensional genomic datasets that are available for analysis. Modern and user-friendly bioinformatics tools for analysis and interpretation of genomics data becomes essential during the analysis of sequencing data. Different standard data types and file formats have been developed to store and analyze sequence and genomics data. Variant Call Format (VCF) is the most widespread genomics file type and standard format containing genomic information and variants of sequenced samples. Results Existing tools for processing VCF files don’t usually have an intuitive graphical interface, but instead have just a command-line interface that may be challenging to use for the broader biomedical community interested in genomics data analysis. re-Searcher solves this problem by pre-processing VCF files by chunks to not load RAM of computer. The tool can be used as standalone user-friendly multiplatform GUI application as well as web application (https://nla-lbsb.nu.edu.kz). The software including source code as well as tested VCF files and additional information are publicly available on the GitHub repository (https://github.com/LabBandSB/re-Searcher).


GigaScience ◽  
2019 ◽  
Vol 8 (12) ◽  
Author(s):  
Julien Tremblay ◽  
Etienne Yergeau

Abstract Background With the advent of high-throughput sequencing, microbiology is becoming increasingly data-intensive. Because of its low cost, robust databases, and established bioinformatic workflows, sequencing of 16S/18S/ITS ribosomal RNA (rRNA) gene amplicons, which provides a marker of choice for phylogenetic studies, has become ubiquitous. Many established end-to-end bioinformatic pipelines are available to perform short amplicon sequence data analysis. These pipelines suit a general audience, but few options exist for more specialized users who are experienced in code scripting, Linux-based systems, and high-performance computing (HPC) environments. For such an audience, existing pipelines can be limiting to fully leverage modern HPC capabilities and perform tweaking and optimization operations. Moreover, a wealth of stand-alone software packages that perform specific targeted bioinformatic tasks are increasingly accessible, and finding a way to easily integrate these applications in a pipeline is critical to the evolution of bioinformatic methodologies. Results Here we describe AmpliconTagger, a short rRNA marker gene amplicon pipeline coded in a Python framework that enables fine tuning and integration of virtually any potential rRNA gene amplicon bioinformatic procedure. It is designed to work within an HPC environment, supporting a complex network of job dependencies with a smart-restart mechanism in case of job failure or parameter modifications. As proof of concept, we present end results obtained with AmpliconTagger using 16S, 18S, ITS rRNA short gene amplicons and Pacific Biosciences long-read amplicon data types as input. Conclusions Using a selection of published algorithms for generating operational taxonomic units and amplicon sequence variants and for computing downstream taxonomic summaries and diversity metrics, we demonstrate the performance and versatility of our pipeline for systematic analyses of amplicon sequence data.


2011 ◽  
Vol 148-149 ◽  
pp. 397-402
Author(s):  
Yang Cheng ◽  
Yang Liang ◽  
Luo Hong

The management and sharing of varied automotive intake and exhaust system data makes it imperative for modern automotive companies. Based on needs of a certain company’s NVH department, much work has been done to analyze the data management method and data features, then a web platform based on B/S model has been built and also the solutions to transformation methods between different data types, data visualization, processing of massive data and data dynamic expansion are presented. Based on the key technologies, a practical web-based information system is designed for the management of intake and exhaust system data, which remarkably facilitates the company’s data management work.


Author(s):  
Florencia Grattarola ◽  
Daniel Pincheira-Donoso

Data-sharing has become a key component in the modern scientific era of large-scale research, with numerous advantages for both data collectors and users. However, data-sharing in Uruguay remains neglected given that major public sources of biodiversity information (government and academia) are not open-access. As a consequence, the patterns and drivers of biodiversity in this country remain poorly understood and so does our ability to manage and conserve its biodiversity. To overcome this critical gap, collaborative strategies are needed to communicate the importance and benefits of data openness, exchange and provide technical tools and training on all aspects of data management, sharing practices, focus on incentives, and motivation structures for data-holders. Here, we introduce the Biodiversidata initiative (www.biodiversidata.org) – a novel Uruguayan Consortium of Biodiversity Data. Biodiversidata is a collaboration among experts with the aim of improving the country’s biodiversity knowledge and the open-access of the vast resources they generate. Biodiversidata aims to collate the first comprehensive open-access database on Uruguay's whole biodiversity, to support advancements in scientific research and conservation actions. Currently, Biodiversidata consists of over 30 experts from across national and international institutions, studying diverse biodiversity groups. After less than two years, we have collected, curated and standardised a dataset of ~70,000 records of primary biodiversity data of tetrapod species – the first and most comprehensive open biodiversity database ever gathered for Uruguay to date. However, the process is hampered by multiple challenges: the lack of support for sampling of specimens and maintenance of collections has contributed to the situation were data are often perceived as personal property rather than collective resources; institutions have no plans or strategies directed to digitisation of their collections which actually places biodiversity data in Uruguay ‘at risk’ of being lost; the scarce governmental and academic incentive structures towards open scientific research relegates data-sharing to a personal decision; although scientists individually are willing to share their research data, the lack of data management plans within their research groups hampers the capacity to digitise the data and thus, to make them available; former initiatives aimed to create comprehensive biodiversity databases did not consider the balance between openness and gain for researchers, setting the subject of data-sharing more of an obligation than a path of promotion, which impacted negatively in the perception of scientist to open their data. the lack of support for sampling of specimens and maintenance of collections has contributed to the situation were data are often perceived as personal property rather than collective resources; institutions have no plans or strategies directed to digitisation of their collections which actually places biodiversity data in Uruguay ‘at risk’ of being lost; the scarce governmental and academic incentive structures towards open scientific research relegates data-sharing to a personal decision; although scientists individually are willing to share their research data, the lack of data management plans within their research groups hampers the capacity to digitise the data and thus, to make them available; former initiatives aimed to create comprehensive biodiversity databases did not consider the balance between openness and gain for researchers, setting the subject of data-sharing more of an obligation than a path of promotion, which impacted negatively in the perception of scientist to open their data. To overcome some of these challenges, we decided to direct Biodiversidata to individual researchers/experts and not institutions. We called them with the plan of collecting the maximum possible amount of data from vertebrate, invertebrate and plant species, use it to collaboratively generate impactful scientific research. An important aspect was that we requested data only to fit the premise of being primary biodiversity data (i.e., data records that document the occurrence of a species in space and time). This meant cleaning and standardising very heterogeneous information, from a variety of source types and formats, including updating scientific names and georeferentiating sampling locations. However, centralising the cleaning process allowed researchers to send their raw records without spending time cleaning them themselves and, as a consequence, enlarged the amount of data being collated. Collectively, Biodiversidata’s approach towards changing the culture of data-sharing practices has relied on the reinforcement of a scientific collaboration culture that benefits not only researchers at the individual level, but the progress of larger-scale issues as a whole. There is a long way to go on the subject of open research data in Uruguay, though, aiming strategies to people, capitalising data management and progressing with step-by-step rewards, is already showing some preliminary encouraging results.


2021 ◽  
Author(s):  
Danying Shao ◽  
Gretta Kellogg ◽  
Ali Nematbakhsh ◽  
Prashant Kuntala ◽  
Shaun Mahony ◽  
...  

Reproducibility is a significant challenge in (epi)genomic research due to the complexity of experiments composed of traditional biochemistry and informatics. Recent advances have exacerbated this challenge as high-throughput sequencing data is generated at an unprecedented pace. Here we report on our development of a Platform for Epi-Genomic Research (PEGR), a web-based project management platform that tracks and quality controls experiments from conception to publication-ready figures, compatible with multiple assays and bioinformatic pipelines. It supports rigor and reproducibility for biochemists working at the wet bench, while continuing to fully support reproducibility and reliability for bioinformaticians through integration with the Galaxy platform.


Author(s):  
Philipp Conzett ◽  
Trond Kvamme

The Research Data Alliance (RDA) is a neutral international network aiming at promoting data sharing and data-driven research. The efforts of RDA are organized in a number of groups, including national nodes, where contributors work together to develop and adopt approaches that foster the uptake of standards and good practice of research data management through all stages of the data lifecycle. Since 2019, Norway has had its national RDA group. This article gives a short introduction to the Norwegian RDA group. In section 1 we provide some background information about RDA. Section 2 describes the Norwegian RDA group, including its background and organisational structure, as well as past and future activities.


2021 ◽  
Vol 9 ◽  
Author(s):  
Ferdinando Urbano ◽  
Francesca Cagnacci ◽  

The current and future consequences of anthropogenic impacts such as climate change and habitat loss on ecosystems will be better understood and therefore addressed if diverse ecological data from multiple environmental contexts are more effectively shared. Re-use requires that data are readily available to the scientific scrutiny of the research community. A number of repositories to store shared data have emerged in different ecological domains and developments are underway to define common data and metadata standards. Nevertheless, the goal is far from being achieved and many challenges still need to be addressed. The definition of best practices for data sharing and re-use can benefit from the experience accumulated by pilot collaborative projects. The Euromammals bottom-up initiative has pioneered collaborative science in spatial animal ecology since 2007. It involves more than 150 institutes to address scientific, management and conservation questions regarding terrestrial mammal species in Europe using data stored in a shared database. In this manuscript we present some key lessons that we have learnt from the process of making shared data and knowledge accessible to researchers and we stress the importance of data management for data quality assurance. We suggest putting in place a pro-active data review before data are made available in shared repositories via robust technical support and users’ training in data management and standards. We recommend pursuing the definition of common data collection protocols, data and metadata standards, and shared vocabularies with direct involvement of the community to boost their implementation. We stress the importance of knowledge sharing, in addition to data sharing. We show the crucial relevance of collaborative networking with pro-active involvement of data providers in all stages of the scientific process. Our main message is that for data-sharing collaborative efforts to obtain substantial and durable scientific returns, the goals should not only consist in the creation of e-infrastructures and software tools but primarily in the establishment of a network and community trust. This requires moderate investment, but over long-term horizons.


Sign in / Sign up

Export Citation Format

Share Document