scholarly journals Enabling Machines to Integrate Biodiversity Data with Evolutionary Knowledge

Author(s):  
Gaurav Vaidya ◽  
Hilmar Lapp ◽  
Nico Cellinese

Most biological data and knowledge are directly or indirectly linked to biological taxa via taxon names. Using taxon names is one of the most fundamental and ubiquitous ways in which a wide range of biological data are integrated, aggregated, and indexed, from genomic and microbial diversity to macro-ecological data. To this day, the names used, as well as most methods and resources developed for this purpose, are drawn from Linnaean nomenclature. This leads to numerous problems when applied to data-intensive science that depends on computation to take full advantage of the vast – and rapidly increasing – amount of available digital biodiversity data. The theoretical and practical complexities of reconciling taxon names and concepts has plagued the systematics community for decades and now more than ever before, Linnaean names based in Linnaean taxonomy, by far the most prevalent means of linking data to taxa, are unfit for the age of computation-driven data science, due to fundamental theoretical and practical shortfalls that cannot be cured. We propose an alternate approach based on the use of phylogenetic clade definitions, which is a well-developed method for unambiguously defining the semantics of a clade concept in terms of shared evolutionary ancestry (de Queiroz and Gauthier 1990, de Queiroz and Gauthier 1994). These semantics allow locating the defined clade on any phylogeny, or showing that a clade is inconsistent with the topology of a given phylogeny and hence cannot be present on it at all. We have built a workflow for defining phylogenetic clade definitions in terms of shared ancestor and excluded lineage properties, and locating these definitions on any input phylogeny. Once these definitions have been located, we can use the list of species found within that clade on that phylogeny in order to aggregate occurrence data from the Global Biodiversity Information Facility (GBIF). Thus, our approach uses clade definitions with machine-understandable semantics to programmatically and reproducibly aggregate biodiversity data by higher-level taxonomic concepts. This approach has several advantages over the use of taxonomic hierarchies: Unlike taxa, the semantics of clade definitions can be expressed in unambiguous, machine-understandable and reproducible terms and language. The resolution of a given clade definition will depend on the phylogeny being used. Thus, if the phylogeny of groups of interest is updated in light of new evolutionary knowledge, the clade definition can be applied to the new phylogeny to obtain an updated list of clade members consistent with the updated evolutionary knowledge. Machine reproducibility of analyses is possible simply by archiving the machine-readable representations of the clade definition and the phylogeny being used. Unlike taxa, the semantics of clade definitions can be expressed in unambiguous, machine-understandable and reproducible terms and language. The resolution of a given clade definition will depend on the phylogeny being used. Thus, if the phylogeny of groups of interest is updated in light of new evolutionary knowledge, the clade definition can be applied to the new phylogeny to obtain an updated list of clade members consistent with the updated evolutionary knowledge. Machine reproducibility of analyses is possible simply by archiving the machine-readable representations of the clade definition and the phylogeny being used. Clade definitions can be created by biologists as needed or can be reused from those published in peer-reviewed journals. In addition, nearly 300 peer-reviewed clade definitions were recently published as part of the Phylonym volume of the PhyloCode (de Queiroz et al. 2020) and are now available on the Regnum website. As part of the Phyloreferencing Project, we digitize this collection as a machine-readable ontology, where each clade is represented as a class defined by logical conjunctions for class membership, corresponding to a set of necessary and sufficient conditions of shared or divergent evolutionary ancestry. We call these classes phyloreferences, and have created a fully automated workflow for digitizing the Regnum database content into an OWL ontology (W3C OWL Working Group 2012) that we call the Clade Ontology. This ontology includes reference phylogenies and additional metadata about the verbatim clade definitions. Once complete, the Clade Ontology will include all clade definitions from RegNum, both those included in Phylonym after passing peer-review, and those contributed by the community, whether or not under the PhyloCode nomenclature. As an openly available community resource, this will allow researchers to use them to aggregate biodiversity data for comparative biology with grouping semantics that are transparent, machine-processable, and reproducible. In our presentation, we will demonstrate the use of phyloreferences to locate clades on the Open Tree of Life synthetic tree (Hinchliff et al. 2015), to retrieve lists of species in each clade, and to use them to find and aggregate occurrence records in GBIF. We will also describe the workflow we are currently using to build and test the Clade Ontology, and describe our plans for publishing this resource. Finally, we will discuss the advantages and disadvantages of this approach as compared to taxonomic checklists.

Author(s):  
Dmitry Schigel ◽  
Anders Andersson ◽  
Andrew Bissett ◽  
Anders Finstad ◽  
Frode Fossøy ◽  
...  

Most users will foresee the use of genetic sequences in the context of molecular ecology or phylogenetic research, however, a sequence with coordinates and a timestamp is a valuable biodiversity occurrence that is useful in a much broader context than its original purpose. To uncover this potential, sequence-derived data need to become findable, accessible, interoperable, and reusable through generalist biodiversity data platforms. Stimulated by the Biodiversity_Next discussions in 2019, we have worked for about 10 months to put together practical data mapping and data publishing experiences in Norway, Australia, Sweden, and Denmark, as well as in the UNITE and the GBIF (Global Biodiversity Information Facility) networks. The resulting guide was put together to provide practical instruction for mapping sequence-derived data. Biodiversity data communities remain dominated by the macroscopic, easily detectable, morphologically identifiable species. This is not only true for citizen science and other forms of biodiversity popularization, but is also visible in the university and museum department structures, financial resource allocations, biodiversity legislation, and policy design. Recent decades of molecular advances have increased the power of genetic methods for detecting, describing, and documenting global biodiversity. We have yet to see the wide shift of data generating efforts from the traditional taxonomic foci of biodiversity assesments to the more balanced and inclusive systems focusing on all functionally important taxa and environments. These include soil, limnic and marine environments, decomposing plants and deadwood, and all life therein. Environmental DNA data enable recording of present and past presence of micro- and macroscopic organisms with minimal effort and by non-invasive methods. The apparent ease of these methods requires a cautious approach to the resulting data and their interpretation. It remains important to define and agree on the organism recording and reporting routines for genetic data. DNA data represent a major addition to the many ways in which GBIF and other biodiversity data platforms index the living world. Our guide is resting on the shoulders of those who have been developing and improving MIxS (Minimum Information about any (x) Sequence), GGBN (Global Genome Biodiversity Network) and other data standards. The added value of publishing sequence-derived data through non-genetic biodiversity discovery platforms relates to spatio-temporal occurrences and sequence-based names. Reporting sequence-derived occurrences in an open and reproducible way has a wide range of benefits: notably, it increases citability, highlights the taxa concerned in the context of biological conservation, and contributes to taxonomic and ecological knowledge.


PeerJ ◽  
2019 ◽  
Vol 6 ◽  
pp. e6193 ◽  
Author(s):  
Simon Orozco-Arias ◽  
Ana María Núñez-Rincón ◽  
Reinel Tabares-Soto ◽  
Diana López-Álvarez

The co-occurrence of plant species is a fundamental aspect of plant ecology that contributes to understanding ecological processes, including the establishment of ecological communities and its applications in biological conservation. A priori algorithms can be used to measure the co-occurrence of species in a spatial distribution given by coordinates. We used 17 species of the genus Brachypodium, downloaded from the Global Biodiversity Information Facility data repository or obtained from bibliographical sources, to test an algorithm with the spatial points process technique used by Silva et al. (2016), generating association rules for co-occurrence analysis. Brachypodium spp. has emerged as an effective model for monocot species, growing in different environments, latitudes, and elevations; thereby, representing a wide range of biotic and abiotic conditions that may be associated with adaptive natural genetic variation. We created seven datasets of two, three, four, six, seven, 15, and 17 species in order to test the algorithm with four different distances (1, 5, 10, and 20 km). Several measurements (support, confidence, lift, Chi-square, and p-value) were used to evaluate the quality of the results generated by the algorithm. No negative association rules were created in the datasets, while 95 positive co-occurrences rules were found for datasets with six, seven, 15, and 17 species. Using 20 km in the dataset with 17 species, we found 16 positive co-occurrences involving five species, suggesting that these species are coexisting. These findings are corroborated by the results obtained in the dataset with 15 species, where two species with broad range distributions present in the previous dataset are eliminated, obtaining seven positive co-occurrences. We found that B. sylvaticum has co-occurrence relations with several species, such as B. pinnatum, B. rupestre, B. retusum, and B. phoenicoides, due to its wide distribution in Europe, Asia, and north of Africa. We demonstrate the utility of the algorithm implemented for the analysis of co-occurrence of 17 species of the genus Brachypodium, agreeing with distributions existing in nature. Data mining has been applied in the field of biological sciences, where a great amount of complex and noisy data of unseen proportion has been generated in recent years. Particularly, ecological data analysis represents an opportunity to explore and comprehend biological systems with data mining and bioinformatics tools.


2014 ◽  
Author(s):  
Alexandre Antonelli ◽  
Fabien L. Condamine ◽  
Hannes Hettling ◽  
Karin Nilsson ◽  
R Henrik Nilsson ◽  
...  

Rapidly growing biological data volumes – including molecular sequences, species traits, geographic occurrences, specimen collections, and fossil records – hold an unprecedented, yet largely unexplored potential to reveal how ecological and evolutionary processes generate and maintain biodiversity. Most biodiversity studies integrating ecological data and evolutionary history use an idiosyncratic step-by-step approach for the reconstruction of time-calibrated phylogenies in light of ecological and evolutionary scenarios. Here we introduce a conceptual framework, termed SUPERSMART (Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages, and Relationships of Taxa), and provide a proof of concept for dealing with the moving targets of biodiversity research. This framework reconstructs dated phylogenies based on the assembly of molecular datasets and collects pertinent data on ecology, distribution, and fossils of the focal clade. The data handled for each step are continuously updated as databases accumulate new records. We exemplify the practice of our method by presenting comprehensive phylogenetic and dating analyses for the orders Primates and the Gentianales. We believe that this emerging framework will provide an invaluable tool for a wide range of hypothesis-driven research questions in ecology and evolution.


Author(s):  
Michael Trizna ◽  
Torsten Dikow

Taxonomic revisions contain crucial biodiversity data in the material examined sections for each species. In entomology, material examined lists minimally include the collecting locality, date of collection, and the number of specimens of each collection event. Insect species might be represented in taxonomic revisions by only a single specimen or hundreds to thousands of specimens. Furthermore, revisions of insect genera might treat small genera with few species or include tens to hundreds of species. Summarizing data from such large and complex material examined lists and revisions is cumbersome, time-consuming, and prone to errors. However, providing data on the seasonal incidence, abundance, and collecting period of species is an important way to mobilize primary biodiversity data to understand a species’s occurrence or rarity. Here, we present SpOccSum (Species Occurrence Summary)—a tool to easily obtain metrics of seasonal incidence from specimen occurrence data in taxonomic revisions. SpOccSum is written in Python (Python Software Foundation 2019) and accessible through the Anaconda Python/R Data Science Platform as a Jupyter Notebook (Kluyver et al. 2016). The tool takes a simple list of specimen data containing species name, locality, date of collection (preferably separated by day, month, and year), and number of specimens in CSV format and generates a series of tables and graphs summarizing: number of specimens per species, number of specimens collected per month, number of unique collection events, as well as earliest, and most recent collecting year of each species. number of specimens per species, number of specimens collected per month, number of unique collection events, as well as earliest, and most recent collecting year of each species. The results can be exported as graphics or as csv-formatted tables and can easily be included in manuscripts for publication. An example of an early version of the summary produced by SpOccSum can be viewed in Tables 1, 2 from Markee and Dikow (2018). To accommodate seasonality in the Northern and Southern Hemispheres, users can choose to start the data display with either January or July. When geographic coordinates are available and species have widespread distributions spanning, for example, the equator, the user can itemize particular regions such as North of Tropic of Cancer (23.5˚N), Tropic of Cancer to the Equator, Equator to Tropic of Capricorn, and South of Tropic of Capricorn (23.5˚S). Other features currently in development include the ability to produce distribution maps from the provided data (when geographic coordinates are included) and the option to export specimen occurrence data as a Darwin-Core Archive ready for upload to the Global Biodiversity Information Facility (GBIF).


Author(s):  
Nico Franz ◽  
Edward Gilbert ◽  
Beckett Sterner

We provide an overview and update on initiatives and approaches to add taxonomic data intelligence to distributed biodiversity knowledge networks. "Taxonomic intelligence" for biodiversity data is defined here as the ability to identify and renconcile source-contextualized taxonomic name-to-meaning relationships (Remsen 2016). We review the scientific opportunities, as well as information-technological and socio-economic pathways - both existing and envisioned - to embed de-centralized taxonomic data intelligence into the biodiversity data publication and knowledge intedgration processes. We predict that the success of this project will ultimately rest on our ability to up-value the roles and recognition of systematic expertise and experts in large, aggregated data environments. We will argue that these environments will need to adhere to criteria for responsible data science and interests of coherent communities of practice (Wenger 2000, Stoyanovich et al. 2017). This means allowing for fair, accountable, and transparent representation and propagation of evolving systematic knowledge and enduring or newly apparent conflict in systematic perspective (Sterner and Franz 2017, Franz and Sterner 2018, Sterner et al. 2019). We will demonstrate in principle and through concrete use cases, how to de-centralize systematic knowledge while maintaining alignments between congruent or concflicting taxonomic concept labels (Franz et al. 2016a, Franz et al. 2016b, Franz et al. 2019). The suggested approach uses custom-configured logic representation and reasoning methods, based on the Region Connection Calculus (RCC-5) alignment language. The approach offers syntactic consistency and semantic applicability or scalability across a wide range of biodiversity data products, ranging from occurrence records to phylogenomic trees. We will also illustrate how this kind of taxonomic data intelligence can be captured and propagated through existing or envisioned metadata conventions and standards (e.g., Senderov et al. 2018). Having established an intellectual opportunity, as well as a technical solution pathway, we turn to the issue of developing an implementation and adoption strategy. Which biodiversity data environments are currently the most taxonomically intelligent, and why? How is this level of taxonomic data intelligence created, maintained, and propagated outward? How are taxonomic data intelligence services motivated or incentivized, both at the level of individuals and organizations? Which "concerned entities" within the greater biodiversity data publication enterprise are best positioned to promote such services? Are the most valuable lessons for biodiversity data science "hidden" in successful social media applications? What are good, feasible, incremental steps towards improving taxonomic data intelligence for a diversity of data publishers?


2014 ◽  
Author(s):  
Alexandre Antonelli ◽  
Fabien L. Condamine ◽  
Hannes Hettling ◽  
Karin Nilsson ◽  
R Henrik Nilsson ◽  
...  

Rapidly growing biological data volumes – including molecular sequences, species traits, geographic occurrences, specimen collections, and fossil records – hold an unprecedented, yet largely unexplored potential to reveal how ecological and evolutionary processes generate and maintain biodiversity. Most biodiversity studies integrating ecological data and evolutionary history use an idiosyncratic step-by-step approach for the reconstruction of time-calibrated phylogenies in light of ecological and evolutionary scenarios. Here we introduce a conceptual framework, termed SUPERSMART (Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages, and Relationships of Taxa), and provide a proof of concept for dealing with the moving targets of biodiversity research. This framework reconstructs dated phylogenies based on the assembly of molecular datasets and collects pertinent data on ecology, distribution, and fossils of the focal clade. The data handled for each step are continuously updated as databases accumulate new records. We exemplify the practice of our method by presenting comprehensive phylogenetic and dating analyses for the orders Primates and the Gentianales. We believe that this emerging framework will provide an invaluable tool for a wide range of hypothesis-driven research questions in ecology and evolution.


Author(s):  
O. Y. Balalaieva ◽  

The purpose of the article is to study the dynamics of electronic dictionaries development abroad and in Ukraine using methods of analysis of scientific sources, comparison, generalization and systematization. Electronic dictionaries have been found to be a relatively new phenomenon in the lexicographic market, evolving from machine-readable dictionaries, exact copies of paper editions to complex digital lexicographic systems with a powerful arsenal of functions over the decades. The stages of development of autonomous and online dictionaries are described. Electronic dictionaries due to the advanced search capabilities, speed, simplicity, ease of use, accessibility and compactness have gained popularity among a wide range of users. Today they are used in many spheres of human activity – scientific, educational, professional, everyday communication. However, the analysis of the current level of development of Ukrainian electronic resources indicates a shortage of electronic dictionaries both common and terminological vocabulary. The lack of electronic dictionaries is due to a number of objective problems, both practical and theoretical, that is why research in the field of domestic computer lexicography is a promising area of further research.


Fault Tolerant Reliable Protocol (FTRP) is proposed as a novel routing protocol designed for Wireless Sensor Networks (WSNs). FTRP offers fault tolerance reliability for packet exchange and support for dynamic network changes. The key concept used is the use of node logical clustering. The protocol delegates the routing ownership to the cluster heads where fault tolerance functionality is implemented. FTRP utilizes cluster head nodes along with cluster head groups to store packets in transient. In addition, FTRP utilizes broadcast, which reduces the message overhead as compared to classical flooding mechanisms. FTRP manipulates Time to Live values for the various routing messages to control message broadcast. FTRP utilizes jitter in messages transmission to reduce the effect of synchronized node states, which in turn reduces collisions. FTRP performance has been extensively through simulations against Ad-hoc On-demand Distance Vector (AODV) and Optimized Link State (OLSR) routing protocols. Packet Delivery Ratio (PDR), Aggregate Throughput and End-to-End delay (E-2-E) had been used as performance metrics. In terms of PDR and aggregate throughput, it is found that FTRP is an excellent performer in all mobility scenarios whether the network is sparse or dense. In stationary scenarios, FTRP performed well in sparse network; however, in dense network FTRP’s performance had degraded yet in an acceptable range. This degradation is attributed to synchronized nodes states. Reliably delivering a message comes to a cost, as in terms of E-2-E. results show that FTRP is considered a good performer in all mobility scenarios where the network is sparse. In sparse stationary scenario, FTRP is considered good performer, however in dense stationary scenarios FTRP’s E-2-E is not acceptable. There are times when receiving a network message is more important than other costs such as energy or delay. That makes FTRP suitable for wide range of WSNs applications, such as military applications by monitoring soldiers’ biological data and supplies while in battlefield and battle damage assessment. FTRP can also be used in health applications in addition to wide range of geo-fencing, environmental monitoring, resource monitoring, production lines monitoring, agriculture and animals tracking. FTRP should be avoided in dense stationary deployments such as, but not limited to, scenarios where high application response is critical and life endangering such as biohazards detection or within intensive care units.


2019 ◽  
Vol 70 (10) ◽  
pp. 3738-3740

The Tonsillectomy in children or adults is an intervention commonly encountered in the ENT (Ear Nose and Throat) and Head and Neck surgeon practice. The current tendency is to perform this type of surgery in major ambulatory surgery centers. Two objectives are thus pursued: first of all, the increase of the patient quality of life through the reintegration into the family as quickly as possible and secondly, the expenses associated with continuous hospitalization are reduced. Any tertiary (multidisciplinary) sleep center must ensure the complete diagnosis and treatment (including surgery) of sleep respiratory disorders. Under these conditions the selection of patients and especially the implementation of the specific protocols in order to control the postoperative complications it becomes essential. The present paper describes our experience of tonsillectomy as treatment for selected patients with chronic rhonchopathy (snoring) and mild to moderate obstructive sleep apnoea. It was presented the impact of antibiotics protocols in reducing the main morbid outcomes following tonsillectomy, in our day surgery center. The obtained results can also be a prerequisite for the integrative approach of the patients with sleep apnoea who were recommended surgical treatment. Considering the wide range of therapeutic modalities used in sleep apnoea, each with its specific advantages and disadvantages, more extensive and multicenter studies are needed. Keywords: post-tonsillectomy morbidity, day surgery center, sleep disorders


Nanomaterials ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1861
Author(s):  
Armin Mooranian ◽  
Melissa Jones ◽  
Corina Mihaela Ionescu ◽  
Daniel Walker ◽  
Susbin Raj Wagle ◽  
...  

The utilisation of bioartificial organs is of significant interest to many due to their versatility in treating a wide range of disorders. Microencapsulation has a potentially significant role in such organs. In order to utilise microcapsules, accurate characterisation and analysis is required to assess their properties and suitability. Bioartificial organs or transplantable microdevices must also account for immunogenic considerations, which will be discussed in detail. One of the most characterized cases is the investigation into a bioartificial pancreas, including using microencapsulation of islets or other cells, and will be the focus subject of this review. Overall, this review will discuss the traditional and modern technologies which are necessary for the characterisation of properties for transplantable microdevices or organs, summarizing analysis of the microcapsule itself, cells and finally a working organ. Furthermore, immunogenic considerations of such organs are another important aspect which is addressed within this review. The various techniques, methodologies, advantages, and disadvantages will all be discussed. Hence, the purpose of this review is providing an updated examination of all processes for the analysis of a working, biocompatible artificial organ.


Sign in / Sign up

Export Citation Format

Share Document