scholarly journals biotoolsSchema: a formalized schema for bioinformatics software description

GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Jon Ison ◽  
Hans Ienasescu ◽  
Emil Rydza ◽  
Piotr Chmura ◽  
Kristoffer Rapacki ◽  
...  

Abstract Background Life scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources. The diversity of information used to describe life-scientific digital resources presents an obstacle to their utilization. Although several standardization efforts are emerging, no information schema has been sufficiently detailed to enable uniform semantic and syntactic description—and cataloguing—of bioinformatics resources. Findings Here we describe biotoolsSchema, a formalized information model that balances the needs of conciseness for rapid adoption against the provision of rich technical information and scientific context. biotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources. We compare our approach to related initiatives and provide alignments to foster interoperability and reusability. Conclusions biotoolsSchema supports the formalized, rigorous, and consistent specification of the syntax and semantics of bioinformatics resources, and enables cataloguing efforts such as bio.tools that help scientists to find, comprehend, and compare resources. The use of biotoolsSchema in bio.tools promotes the FAIRness of research software, a key element of open and reproducible developments for data-intensive sciences.

2018 ◽  
Vol 42 (3) ◽  
pp. 433-440
Author(s):  
Wolfram Horstmann

AbstractAcademic libraries are under a severe pressure of transforming towards a novel form of information organization. Book circulation, learning space and licensing digital content continue to be core services. But the sciences and the humanities are increasingly requesting support for their novel publishing activities. Open Access in science or digital editions in the humanities, and data-intensive operations in research, e.g. data management plans, research software support or data curation and preservation services become mission critical. Thus, in order to stay the central partner for academic information within the institution, libraries need to change. But how fast could libraries possibly change, if the existing services were also to be continued, particularly considering that budget increases are rare? Experiences at the State and University Library Göttingen (SUB) shall elucidate opportunities and challenges.


2015 ◽  
Author(s):  
Martin Fenner

Yesterday Julie McMurry and co-authors published a preprint 10 Simple rules for design, provision, and reuse of persistent identifiers for life science data. This is an important paper trying to address a fundamental problem: how can we make persistent ...


2009 ◽  
Vol 2 (4) ◽  
pp. 36-52 ◽  
Author(s):  
Dimitrios A. Koutsomitropoulos ◽  
Georgia D. Solomou ◽  
Andreas D. Alexopoulos ◽  
Theodore S. Papatheodorou

Metadata applications have evolved in time into highly structured “islands of information” about digital resources, often bearing a strong semantic interpretation. Scarcely, however, are these semantics being communicated in machine readable and understandable ways. At the same time, the process for transforming the implied metadata knowledge into explicit Semantic Web descriptions can be problematic and is not always evident. In this article we take upon the well-established Dublin Core metadata standard as well as other metadata schemata, which often appear in digital repositories set-ups, and suggest a proper Semantic Web OWL ontology. In this process the authors cope with discrepancies and incompatibilities, indicative of such attempts, in novel ways. Moreover, we show the potential and necessity of this approach by demonstrating inferences on the resulting ontology, instantiated with actual metadata records. The authors conclude by presenting a working prototype that provides for inference-based querying on top of digital repositories.


2014 ◽  
Vol 644-650 ◽  
pp. 3256-3259
Author(s):  
Yu Kai Li ◽  
Di Xin ◽  
Hong Gang Liu

Cloud computing is a typical network computing model.Large-scale network applications based on cloud computing presents the characteristics and trends of distributed, heterogeneous data-intensive.Smart grid is a complex system running a real-time online.It is a data-intensive applications.How to effectively integrate multiple data centers in the smart grid and let them work together in the cloud computing environment is a question.And how to make rational distribution of data in the smart grid is also a question.Therefore, we propose a global placement strategy based on genetic algorithm.And we give the data placement scheme for solving on data-intensive applications.Through simulation software CloudSim, we conducted simulation experiments and analyzed the effectiveness of the program.


2021 ◽  
Author(s):  
Renato Alves ◽  
Dimitrios Bampalikis ◽  
Leyla Jael Castro ◽  
José María Fernández ◽  
Jennifer Harrow ◽  
...  

Data Management Plans are now considered a key element of Open Science. They describe the data management life cycle for the data to be collected, processed and/or generated within the lifetime of a particular project or activity. A Software Manag ement Plan (SMP) plays the same role but for software. Beyond its management perspective, the main advantage of an SMP is that it both provides clear context to the software that is being developed and raises awareness. Although there are a few SMPs already available, most of them require significant technical knowledge to be effectively used. ELIXIR has developed a low-barrier SMP, specifically tailored for life science researchers, aligned to the FAIR Research Software principles. Starting from the Four Recommendations for Open Source Software, the ELIXIR SMP was iteratively refined by surveying the practices of the community and incorporating the received feedback. Currently available as a survey, future plans of the ELIXIR SMP include a human- and machine-readable version, that can be automatically queried and connected to relevant tools and metrics within the ELIXIR Tools ecosystem and beyond.


F1000Research ◽  
2015 ◽  
Vol 4 ◽  
pp. 127
Author(s):  
Andreas Drakos ◽  
Vassilis Protonotarios ◽  
Nikos Manouselis

The agINFRA project (www.aginfra.eu) was a European Commission funded project under the 7th Framework Programme that aimed to introduce agricultural scientific communities to the vision of open and participatory data-intensive science. agINFRA has now evolved into the European hub for data-powered research on agriculture, food and the environment, serving the research community through multiple roles.Working on enhancing the interoperability between heterogeneous data sources, the agINFRA project has left a set of grid- and cloud- based services that can be reused by future initiatives and adopted by existing ones, in order to facilitate the dissemination of agricultural research, educational and other types of data. On top of that, agINFRA provided a set of domain-specific recommendations for the publication of agri-food research outcomes. This paper discusses the concept of the agINFRA project and presents its major outcomes, as adopted by existing initiatives activated in the context of agricultural research and education.


2018 ◽  
Vol 60 (5-6) ◽  
pp. 327-333 ◽  
Author(s):  
René Jäkel ◽  
Eric Peukert ◽  
Wolfgang E. Nagel ◽  
Erhard Rahm

Abstract The efficient and intelligent handling of large, often distributed and heterogeneous data sets increasingly determines the scientific and economic competitiveness in most application areas. Mobile applications, social networks, multimedia collections, sensor networks, data intense scientific experiments, and complex simulations nowadays generate a huge data deluge. Nonetheless, processing and analyzing these data sets with innovative methods open up new opportunities for its exploitation and new insights. Nevertheless, the resulting resource requirements exceed usually the possibilities of state-of-the-art methods for the acquisition, integration, analysis and visualization of data and are summarized under the term big data. ScaDS Dresden/Leipzig, as one Germany-wide competence center for collaborative big data research, bundles efforts to realize data-intensive applications for a wide range of applications in science and industry. In this article, we present the basic concept of the competence center and give insights in some of its research topics.


2012 ◽  
Vol 249-250 ◽  
pp. 533-541
Author(s):  
Gui Yang Jin ◽  
Fu Zai Lv ◽  
Zhan Qin Xiang

Many capital-intensive industries such as iron and steel, energy, refining, petrochemical and manufacturing companies have the pressure of continuous increase in maintenance costs. In recent years, condition based maintenance becomes an important tool for reducing maintenance costs in these industries. It is a very information intensive domain. Good condition based maintenance systems need to integrate multiple heterogeneous data sources. In this paper, we will use ontology to describe the semantics of condition based maintenance concepts. It serves two main purposes: (1) offering a common understanding of the condition based maintenance domain, and (2) the knowledge held in the ontology is machine-readable and explicit, thus making the knowledge easy to be processed and reused.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 742 ◽  
Author(s):  
Bjorn Gruening ◽  
Olivier Sallou ◽  
Pablo Moreno ◽  
Felipe da Veiga Leprevost ◽  
Hervé Ménager ◽  
...  

Software Containers are changing the way scientists and researchers develop, deploy and exchange scientific software. They allow labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. However, containers and software packages should be produced under certain rules and standards in order to be reusable, compatible and easy to integrate into pipelines and analysis workflows. Here, we presented a set of recommendations developed by the BioContainers Community to produce standardized bioinformatics packages and containers. These recommendations provide practical guidelines to make bioinformatics software more discoverable, reusable and transparent.  They are aimed to guide developers, organisations, journals and funders to increase the quality and sustainability of research software.


Author(s):  
Gaurav Vaidya ◽  
Hilmar Lapp ◽  
Nico Cellinese

Most biological data and knowledge are directly or indirectly linked to biological taxa via taxon names. Using taxon names is one of the most fundamental and ubiquitous ways in which a wide range of biological data are integrated, aggregated, and indexed, from genomic and microbial diversity to macro-ecological data. To this day, the names used, as well as most methods and resources developed for this purpose, are drawn from Linnaean nomenclature. This leads to numerous problems when applied to data-intensive science that depends on computation to take full advantage of the vast – and rapidly increasing – amount of available digital biodiversity data. The theoretical and practical complexities of reconciling taxon names and concepts has plagued the systematics community for decades and now more than ever before, Linnaean names based in Linnaean taxonomy, by far the most prevalent means of linking data to taxa, are unfit for the age of computation-driven data science, due to fundamental theoretical and practical shortfalls that cannot be cured. We propose an alternate approach based on the use of phylogenetic clade definitions, which is a well-developed method for unambiguously defining the semantics of a clade concept in terms of shared evolutionary ancestry (de Queiroz and Gauthier 1990, de Queiroz and Gauthier 1994). These semantics allow locating the defined clade on any phylogeny, or showing that a clade is inconsistent with the topology of a given phylogeny and hence cannot be present on it at all. We have built a workflow for defining phylogenetic clade definitions in terms of shared ancestor and excluded lineage properties, and locating these definitions on any input phylogeny. Once these definitions have been located, we can use the list of species found within that clade on that phylogeny in order to aggregate occurrence data from the Global Biodiversity Information Facility (GBIF). Thus, our approach uses clade definitions with machine-understandable semantics to programmatically and reproducibly aggregate biodiversity data by higher-level taxonomic concepts. This approach has several advantages over the use of taxonomic hierarchies: Unlike taxa, the semantics of clade definitions can be expressed in unambiguous, machine-understandable and reproducible terms and language. The resolution of a given clade definition will depend on the phylogeny being used. Thus, if the phylogeny of groups of interest is updated in light of new evolutionary knowledge, the clade definition can be applied to the new phylogeny to obtain an updated list of clade members consistent with the updated evolutionary knowledge. Machine reproducibility of analyses is possible simply by archiving the machine-readable representations of the clade definition and the phylogeny being used. Unlike taxa, the semantics of clade definitions can be expressed in unambiguous, machine-understandable and reproducible terms and language. The resolution of a given clade definition will depend on the phylogeny being used. Thus, if the phylogeny of groups of interest is updated in light of new evolutionary knowledge, the clade definition can be applied to the new phylogeny to obtain an updated list of clade members consistent with the updated evolutionary knowledge. Machine reproducibility of analyses is possible simply by archiving the machine-readable representations of the clade definition and the phylogeny being used. Clade definitions can be created by biologists as needed or can be reused from those published in peer-reviewed journals. In addition, nearly 300 peer-reviewed clade definitions were recently published as part of the Phylonym volume of the PhyloCode (de Queiroz et al. 2020) and are now available on the Regnum website. As part of the Phyloreferencing Project, we digitize this collection as a machine-readable ontology, where each clade is represented as a class defined by logical conjunctions for class membership, corresponding to a set of necessary and sufficient conditions of shared or divergent evolutionary ancestry. We call these classes phyloreferences, and have created a fully automated workflow for digitizing the Regnum database content into an OWL ontology (W3C OWL Working Group 2012) that we call the Clade Ontology. This ontology includes reference phylogenies and additional metadata about the verbatim clade definitions. Once complete, the Clade Ontology will include all clade definitions from RegNum, both those included in Phylonym after passing peer-review, and those contributed by the community, whether or not under the PhyloCode nomenclature. As an openly available community resource, this will allow researchers to use them to aggregate biodiversity data for comparative biology with grouping semantics that are transparent, machine-processable, and reproducible. In our presentation, we will demonstrate the use of phyloreferences to locate clades on the Open Tree of Life synthetic tree (Hinchliff et al. 2015), to retrieve lists of species in each clade, and to use them to find and aggregate occurrence records in GBIF. We will also describe the workflow we are currently using to build and test the Clade Ontology, and describe our plans for publishing this resource. Finally, we will discuss the advantages and disadvantages of this approach as compared to taxonomic checklists.


Sign in / Sign up

Export Citation Format

Share Document