scholarly journals Biodiversity databases in Russia: towards a national portal

2017 ◽  
Vol 3 (3) ◽  
pp. 560-576 ◽  
Author(s):  
Natalya V. Ivanova ◽  
Maxim P. Shashkov

Russia holds massive biodiversity data accumulated in botanical and zoological collections, literature publications, annual reports of natural reserves, nature conservation, and monitoring study project reports. While some data have been digitized and organized in databases or spreadsheets, most of the biodiversity data in Russia remain dormant and digitally inaccessible. Concepts of open access to research data is spreading, and the lack of data publishing tradition and of use of data standards remain prominent. A national biodiversity information system is lacking and most of the biodiversity data are not available or the available data are not consolidated. As a result, Russian biodiversity data remain fragmented and inaccessible for researchers. The majority of Russian biodiversity databases do not have web interfaces and are accessible only to a limited numbers of researchers. The main reason for lack of access to these resources relates to the fact that the databases have previously been developed only as a local resource. In addition, many sources have previously been developed in the desktop database environments mainly using MS Access and, in some cases, earlier DBMS for DOS, i.e., file-server system, which does not have the functionality to create access to records through a web interface. Among the databases with a web interface, a few information systems have interactive maps with the species occurrence data and systems allowing registered users to upload data. It is important to note that the conceptual structures of these databases were created without taking into account modern standards of the Darwin Core; furthermore, some data sources were developed prior to the first work version of the Darwin Core release in 2001. Despite the complexity and size of the biodiversity data landscape in Russia, the interest in publishing data through international biodiversity portals is increasing among Russian researchers. Since 2014, institutional data publishers in Russia have published about 140 000 species occurrences through gbif.org. The increase in data publishing activity calls for the creation of a GBIF node in Russia, aiming to support Russian biodiversity experts in international data work.

Author(s):  
Atriya Sen ◽  
Nico Franz ◽  
Beckett Sterner ◽  
Nate Upham

We present a visual and interactive taxonomic Artificial Intelligence (AI) tool, the Automated Taxonomic Concept Reasoner (ATCR), whose graphical web interface is under development and will also become available via an Application Programming Interface (API). The tool employs automated reasoning (Beeson 2014) to align multiple taxonomies visually, in a web browser, using user or expert-provided taxonomic articulations, i.e. "Region Connection Calculus (RCC-5) relationships between taxonomic concepts, provided in a specific logical language (Fig. 1). It does this by representing the problem of taxonomic alignment under these constraints in terms of logical inference, while performing these inferences computationally and leveraging the powerful Microsoft Z3 Satisfiability Modulo Theory (SMT) solver (de Moura and Bjørner 2008). This tool represents further development of utilities for the taxonomic concept approach, which fundamentally addresses the challenge of robust biodiversity data aggregation in light of multiple conflicting sources (and source classifications) from which primary biodiversity data almost invariably originate. The approach has proven superior to aggregation, based just on the syntax and semantics provided by the Darwin Core standard Franz and Sterner 2018). Fig. 1 provides an artificial example of such an alignment. Two taxonomies, A and B, are shown. There are five taxonomic concepts, A.One, A.Two, A.Three, B.One and B.Two. A.Two and A.Three are sub-concepts (children) of A.One, and B.Two is a sub-concept (child) of B.One. These are represented by the direction of the grey arrows. The undirected mustard-coloured lines represent relationships, i.e., the articulations referred to in the previous paragraph. These may be of five kinds: congruent (==), includes (<) and included in (>), overlap (><), and disjointness. These five relationships are known in the AI literature as the Region Connection Calculus-5 (RCC-5) (Randell et al. 1992, Bennett 1994, Bennett 1994), and taken exclusively and in conjunction with each other, have certain desirable properties with respect to the representation of spatial relationships. The provided relationship (i.e. the articulation) may also be an arbitrary disjunction of these five fundamental kinds, thus allowing for representation of some degree of logical uncertainty. Then, and under three assumptions that: "sibling" concepts are disjoint in their instances, all instances of a parent concept are instances of at least one of its child concepts, and every concept has at least one instance - the SMT-based automated reasoner is able to deduce the relationships represented by the undirected green lines. It is also able to deduce disjunctive relationships where these are logically implied. "sibling" concepts are disjoint in their instances, all instances of a parent concept are instances of at least one of its child concepts, and every concept has at least one instance - the SMT-based automated reasoner is able to deduce the relationships represented by the undirected green lines. It is also able to deduce disjunctive relationships where these are logically implied. ATCR is related to Euler/X (Franz et al. 2015), an existing tool for the same kinds of taxonomic alignment problems, which was used, for example, to obtain an alignment of two influential primate classifications (Franz et al. 2016). It differs from Euler/X in that it employs a different logical encoding that enables more efficient and more informative computational reasoning, and also in that it provides a graphical web interface, which Euler/X does not.


Author(s):  
Edward Gilbert ◽  
Corinna Gries ◽  
Nico Franz ◽  
Landrum Leslie R. ◽  
Thomas H. Nash III

The SEINet Portal Network has a complex social and development history spanning nearly two decades. Initially established as a basic online search engine for a select handful of biological collections curated within the southwestern United States, SEINet has since matured into a biodiversity data network incorporating more than 330 institutions and 1,900 individual data contributors. Participating institutions manage and publish over 14 million specimen records, 215,000 observations, and 8 million images. Approximately 70% of the collections make use of the data portal as their primary "live" specimen management platform. The SEINet interface now supports 13 regional data portals distributed across the United States and northern Mexico (http://symbiota.org/docs/seinet/). Through many collaborative efforts, it has matured into a tool for biodiversity data exploration, which includes species inventories, interactive identification keys, specimen and field images, taxonomic information, species distribution maps, and taxonomic descriptions. SEINet’s initial developmental goals were to construct a read-only interface that integrated specimen records harvested from a handful of distributed natural history databases. Intermittent network conductivity and inconsistent data exchange protocols frequently restricted data persistence. National funding opportunities supported a complete redesign towards the development of a centralized data cache model with periodic "snapshot" updates from original data sources. A service-based management infrastructure was integrated into the interface to mobilize small- to medium-sized collections (<1 million specimen records) that commonly lack consistent infrastructure and technical expertise to maintain a standard compliant specimen database. These developments were the precursors to the Symbiota software project (Gries et al. 2014). Through further development of Symbiota, SEINet transformed into a robust specimen management system specifically geared toward specimen digitization with features including data entry from label images, harvesting data from specimen duplicates, batch georeferencing, data validation and cleaning, generating progress reports, and additional tools to improve the efficiency of the digitization process. The central developmental paradigm focused on data mobilization through the production of: a versatile import module capable of ingesting a diverse range of data structures, a robust toolkit to assist in digitizing and managing specimen data and images, and a Darwin Core Archive (DwC-A) compliant data publishing and export toolkit to facilitate data distribution to global aggregators such as Global Biodiversity Information Facility (GBIF) and iDigBio. a versatile import module capable of ingesting a diverse range of data structures, a robust toolkit to assist in digitizing and managing specimen data and images, and a Darwin Core Archive (DwC-A) compliant data publishing and export toolkit to facilitate data distribution to global aggregators such as Global Biodiversity Information Facility (GBIF) and iDigBio. User interfaces consist of a decentralized network of regional data portals, all connecting to a centralized shared data source. Each of the 13 data portals are configured to present a regional perspective specifically tailored to represent the needs of the local research community. This infrastructure has supported the formation of regional consortia, who provide network support to aid local institutions in digitizing and publishing their collections within the network. The community-based infrastructure creates a sense of ownership – perhaps even good-natured competition – by the data providers and provides extra incentive to improve data quality and expand the network. Certain areas of development remain challenging in spite of the project's overall success. For instance, data managers continuously struggle to maintain a current local taxonomic thesaurus used for name validation, data cleaning, and to resolve taxonomic discrepancies commonly encountered when integrating collection datasets. We will discuss the successes and challenges associated with the long-term sustainability model and explore potential future paths for SEINet that support the long-term goal of maintaining a data provider that is in full compliance with the FAIR use principles of making the datasets findable, accessible, interoperable, and reusable (Wilkinson et al. 2016).


Author(s):  
Lyubomir Penev ◽  
Teodor Georgiev ◽  
Viktor Senderov ◽  
Mariya Dimitrova ◽  
Pavel Stoev

As one of the first advocates of open access and open data in the field of biodiversity publishiing, Pensoft has adopted a multiple data publishing model, resulting in the ARPHA-BioDiv toolbox (Penev et al. 2017). ARPHA-BioDiv consists of several data publishing workflows and tools described in the Strategies and Guidelines for Publishing of Biodiversity Data and elsewhere: Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. The above mentioned approaches are supported by a whole ecosystem of additional workflows and tools, for example: (1) pre-publication data auditing, involving both human and machine data quality checks (workflow 2); (2) web-service integration with data repositories and data centres, such as Global Biodiversity Information Facility (GBIF), Barcode of Life Data Systems (BOLD), Integrated Digitized Biocollections (iDigBio), Data Observation Network for Earth (DataONE), Long Term Ecological Research (LTER), PlutoF, Dryad, and others (workflows 1,2); (3) semantic markup of the article texts in the TaxPub format facilitating further extraction, distribution and re-use of sub-article elements and data (workflows 3,4); (4) server-to-server import of specimen data from GBIF, BOLD, iDigBio and PlutoR into manuscript text (workflow 3); (5) automated conversion of EML metadata into data paper manuscripts (workflow 2); (6) export of Darwin Core Archive and automated deposition in GBIF (workflow 3); (7) submission of individual images and supplementary data under own DOIs to the Biodiversity Literature Repository, BLR (workflows 1-3); (8) conversion of key data elements from TaxPub articles and taxonomic treatments extracted by Plazi into RDF handled by OpenBiodiv (workflow 5). These approaches represent different aspects of the prospective scholarly publishing of biodiversity data, which in a combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, lay the ground of an entire data publishing ecosystem for biodiversity, supplying FAIR (Findable, Accessible, Interoperable and Reusable data to several interoperable overarching infrastructures, such as GBIF, BLR, Plazi TreatmentBank, OpenBiodiv and various end users.


Author(s):  
Lauren Weatherdon

Ensuring that we have the data and information necessary to make informed decisions is a core requirement in an era of increasing complexity and anthropogenic impact. With cumulative challenges such as the decline in biodiversity and accelerating climate change, the need for spatially-explicit and methodologically-consistent data that can be compiled to produce useful and reliable indicators of biological change and ecosystem health is growing. Technological advances—including satellite imagery—are beginning to make this a reality, yet uptake of biodiversity information standards and scaling of data to ensure its applicability at multiple levels of decision-making are still in progress. The complementary Essential Biodiversity Variables (EBVs) and Essential Ocean Variables (EOVs), combined with Darwin Core and other data and metadata standards, provide the underpinnings necessary to produce data that can inform indicators. However, perhaps the largest challenge in developing global, biological change indicators is achieving consistent and holistic coverage over time, with recognition of biodiversity data as global assets that are critical to tracking progress toward the UN Sustainable Development Goals and Targets set by the international community (see Jensen and Campbell (2019) for discussion). Through this talk, I will describe some of the efforts towards producing and collating effective biodiversity indicators, such as those based on authoritative datasets like the World Database on Protected Areas (https://www.protectedplanet.net/), and work achieved through the Biodiversity Indicators Partnership (https://www.bipindicators.net/). I will also highlight some of the characteristics of effective indicators, and global biodiversity reporting and communication needs as we approach 2020 and beyond.


2019 ◽  
Vol 2 ◽  
Author(s):  
Lyubomir Penev

"Data ownership" is actually an oxymoron, because there could not be a copyright (ownership) on facts or ideas, hence no data onwership rights and law exist. The term refers to various kinds of data protection instruments: Intellectual Property Rights (IPR) (mostly copyright) asserted to indicate some kind of data ownership, confidentiality clauses/rules, database right protection (in the European Union only), or personal data protection (GDPR) (Scassa 2018). Data protection is often realised via different mechanisms of "data hoarding", that is witholding access to data for various reasons (Sieber 1989). Data hoarding, however, does not put the data into someone's ownership. Nonetheless, the access to and the re-use of data, and biodiversuty data in particular, is hampered by technical, economic, sociological, legal and other factors, although there should be no formal legal provisions related to copyright that may prevent anyone who needs to use them (Egloff et al. 2014, Egloff et al. 2017, see also the Bouchout Declaration). One of the best ways to provide access to data is to publish these so that the data creators and holders are credited for their efforts. As one of the pioneers in biodiversity data publishing, Pensoft has adopted a multiple-approach data publishing model, resulting in the ARPHA-BioDiv toolbox and in extensive Strategies and Guidelines for Publishing of Biodiversity Data (Penev et al. 2017a, Penev et al. 2017b). ARPHA-BioDiv consists of several data publishing workflows: Deposition of underlying data in an external repository and/or its publication as supplementary file(s) to the related article which are then linked and/or cited in-tex. Supplementary files are published under their own DOIs to increase citability). Description of data in data papers after they have been deposited in trusted repositories and/or as supplementary files; the systme allows for data papers to be submitted both as plain text or converted into manuscripts from Ecological Metadata Language (EML) metadata. Import of structured data into the article text from tables or via web services and their susequent download/distribution from the published article as part of the integrated narrative and data publishing workflow realised by the Biodiversity Data Journal. Publication of data in structured, semanticaly enriched, full-text XMLs where data elements are machine-readable and easy-to-harvest. Extraction of Linked Open Data (LOD) from literature, which is then converted into interoperable RDF triples (in accordance with the OpenBiodiv-O ontology) (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph Deposition of underlying data in an external repository and/or its publication as supplementary file(s) to the related article which are then linked and/or cited in-tex. Supplementary files are published under their own DOIs to increase citability). Description of data in data papers after they have been deposited in trusted repositories and/or as supplementary files; the systme allows for data papers to be submitted both as plain text or converted into manuscripts from Ecological Metadata Language (EML) metadata. Import of structured data into the article text from tables or via web services and their susequent download/distribution from the published article as part of the integrated narrative and data publishing workflow realised by the Biodiversity Data Journal. Publication of data in structured, semanticaly enriched, full-text XMLs where data elements are machine-readable and easy-to-harvest. Extraction of Linked Open Data (LOD) from literature, which is then converted into interoperable RDF triples (in accordance with the OpenBiodiv-O ontology) (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph In combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, these approaches show different angles to the future of biodiversity data publishing and, lay the foundations of an entire data publishing ecosystem in the field, while also supplying FAIR (Findable, Accessible, Interoperable and Reusable) data to several interoperable overarching infrastructures, such as Global Biodiversity Information Facility (GBIF), Biodiversity Literature Repository (BLR), Plazi TreatmentBank, OpenBiodiv, as well as to various end users.


Author(s):  
Prabha Selvaraj ◽  
Sumathi Doraikannan ◽  
Vijay Kumar Burugari

Big data and IoT has its impact on various areas like science, health, engineering, medicine, finance, business, and mainly, the society. Due to the growth in security intelligence, there is a requirement for new techniques which need big data and big data analytics. IoT security does not alone deal with the security of the device, but it also has to care about the web interfaces, cloud services, and other devices that interact with it. There are many techniques used for addressing challenges like privacy of individuals, inference, and aggregation, which makes it possible to re-identify individuals' even though they are removed from a dataset. It is understood that a few security vulnerabilities could lead to insecure web interface. This chapter discusses the challenges in security and how big data can be used for it. It also analyzes the various attacks and threat modeling in detail. Two case studies in two different areas are also discussed.


2011 ◽  
pp. 1195-1205
Author(s):  
Muneesh Kumar ◽  
Mamta Sareen

The emergence of Internet has revolutionalized the way businesses are conducted. The impact of e-commerce is pervasive, both on companies and society as a whole. It has the potential to impact the pace of economic development and in turn influence the process of human development at the global level. However, the growth in e-commerce is being impaired by the issue of trust in the buyer-seller relationship which is arising due to the virtual nature of e-commerce environment. The online trading environment is constrained by a number of factors including web interface that in turn influences user experience. This article identifies various dimensions of web interface that have the potential to influence trust in e-commerce. The empirical evidence presented in the article is based on a survey of the web interfaces of 65 Indian e-Marketplaces.


Author(s):  
Muhammad Nazrul Islam ◽  
Franck Tétard

Interface signs are the communication cues of web interfaces, through which users interact. Examples of interface signs are small images, navigational links, buttons and thumbnails. Although intuitive interface signs are crucial elements of a good user interface (UI), prior research ignored these in UI design and usability evaluation process. This chapter outlines how a design science research (DSR) approach is used to develop a Human-Computer Interaction (HCI) artifact (semiotic framework) for design and evaluation of user-intuitive web interface signs. This chapter describes how the principles and guidelines of DSR approach are adopted, while performing the activities of the DSR process model to construct the artifact.


2019 ◽  
Vol 16 (3) ◽  
pp. 297-305
Author(s):  
Anna-Leena Saarela ◽  
Anja Walzer ◽  
Anne Mari Juppo

Background Interactive response technologies are used in clinical trials to provide services such as automated randomization and medication logistics management. The objective of this article is to investigate the usage of telephone (Interactive Voice Response) and web (Interactive Web Response) interfaces of interactive response technologies at clinical investigator sites in clinical trials, to obtain information about the preferences of interactive response technology end users between the telephone and web interfaces, and to explore the relevance of the telephone interface in this setting. Methods The data consist of an online survey conducted in spring 2016 with clinical investigators, study nurses, and pharmacists in 13 countries. Results Ninety-eight percent of survey respondents preferred the web interface over the telephone interface, the most important reason being superior usability. However, the respondents indicated the usability of interactive response technology interfaces is not optimal, and lack of integration and consistency across systems is common. A vast majority of interactive response technology end users at clinical sites prefer to use the web interface over the telephone interface, but most also feel there would need to be a back-up system. Conclusions Based on the results, it would be beneficial to improve the usability of the interactive response technology interfaces, and to increase consistency across systems from the current level. Support to and training of the users, as well as clarifying the responsibilities between sites and the sponsor should also be a focal point. Study sponsors should explore with interactive response technology service providers how removing the telephone interface would impact future studies, and whether there could be a more efficient means to achieve a reliable back-up to the web interface instead of a dedicated telephone interface.


Author(s):  
Alan Rea

In this chapter, the author argues that virtual reality (VR) does have a place in e-commerce as a Web 2.0 application. However, VR is not ready to supplant standard e-commerce Web interfaces with a completely immersive VR environment. Rather, VRCommerce must rely on a mixed platform presentation to accommodate diverse levels of usability, technical feasibility, and user trust. The author proposes that e-commerce sites that want to implement VRCommerce offer at least three layers of interaction: a standard Web interface, embedded VR objects in a Web interface, and semi-immersive VR within an existing Web interface. This system is termed the Layered Virtual Reality Commerce System, or LaVRCS. This proposed LaVRCS framework can work in conjunction with Rich Internet Applications, Webtops, and other Web 2.0 applications to offer another avenue of interaction within the e-commerce realm. With adoption and development, LaVRCS will help propel e-commerce into the Web 3.0 realm and beyond.


Sign in / Sign up

Export Citation Format

Share Document