Digitization of US Herbaria - How close did we get to the 2020 goal?

A discussion session held at a National Science Foundation-sponsored Herbarium Networks Workshop at Michigan State University in September of 2004 resulted in a rallying objective: make all botanical specimen information in United States collections available online by 2020. Rabeler and Macklin 2006 outlined a toolkit for realizing this ambitious goal, which included: a review of relevant and state-of-the-art web resources, data exchange standards and, mechanisms to maximize efficiencies while minimizing costs. a review of relevant and state-of-the-art web resources, data exchange standards and, mechanisms to maximize efficiencies while minimizing costs. Given that we are now in the year 2020, it seems appropriate to examine the progress towards the objective of making all US botanical specimen collections data available online. Our presentation will attempt to answer several questions: How close have we come to meeting the original objective? What fraction of “digitized” specimens are minimally represented by a catalog number, a determination, and/or a photograph? What fraction has been thoroughly transcribed? How close have we come to attaining a seamlessly integrated, comprehensive, and national view of botanical specimen data that guides a stakeholder to appropriate resources regardless of their entry point? What “holes” in this effort still exist and what might be required to fill them? How close have we come to meeting the original objective? What fraction of “digitized” specimens are minimally represented by a catalog number, a determination, and/or a photograph? What fraction has been thoroughly transcribed? How close have we come to attaining a seamlessly integrated, comprehensive, and national view of botanical specimen data that guides a stakeholder to appropriate resources regardless of their entry point? What “holes” in this effort still exist and what might be required to fill them? Given our interest in the success of both the Global Biodiversity Information Facility (GBIF) and the Integrated Digitized Biocollections (iDigBio), as well as the overwhelming likelihood that either one of these initiatives is the usual entry point for someone seeking US-based botanical data, we approached the answers to the above questions by first crafting a repeatable data download and processing workflow in early July 2020. This resulted in 25.6M records of plant, fungi, and Chromista from 216 datasets available through GBIF and 32.8M comparable records available through iDigBio from 525 recordsets. We attempted to align these seemingly discordant sets of records and also chose Darwin Core terms that were best suited to match the four hierarchical levels of digitization defined in the Minimal Information for Digital Specimens (MIDS) (van Egmond et al. 2019). During the analysis/comparison of the datasets, we found several examples where the number of data records from an institution seemed much lower than expected. From a combination of analyzing record content in GBIF/iDigBio and consulting regional/taxonomic portals, it became evident that, besides datasets only being included in either GBIF or iDigBio, there was a significant number of records in regional/taxonomic portals that were not yet made available through either GBIF or iDigBio. Progress on digitization has benefited greatly from the US National Science Foundation's creation of the Advancing Digitization of Biodiversity Collections (ADBC) program, and funding of the 15 Thematic Collection Networks (TCN). The launching of new projects and the ensuing digitization of herbarium collections have led to a multitude of new specimen portals and the enhancement of existing software like Symbiota (Gries et al. 2014). But, it has also led to insufficient data sharing among projects and inadequately aligned data synchronization practices between aggregators. Consistency in terms of data availability and quality between GBIF and iDigBio is low, and the chronic lack of record-level identifiers consistently restricts the flow of enhancements made to records. We conclude that there remains substantial work to be done on the national infrastructure and on international best practices to help facilitate collaboration and to realize the original objective of making all US botanical specimen collections data available online.

Download Full-text

The European Journal of Taxonomy: Enhancing taxonomic publications for dynamic data exchange and navigation

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37199 ◽

2019 ◽

Vol 3 ◽

Author(s):

Laurence Bénichou ◽

Isabelle Gerard ◽

Chloé Chester ◽

Donat Agosti

Keyword(s):

Natural History ◽

Data Exchange ◽

State Of The Art ◽

Global Biodiversity Information Facility ◽

Set Up ◽

Semantic Markup ◽

Pdf Format ◽

Text And Data Mining ◽

Biodiversity Information ◽

Publishing Workflow

The European Journal of Taxonomy (EJT) was initiated by a consortium of European natural history publishers to take advantage of the shift from paper to electronic-only publishing (Benichou et al. 2011). Whilst originally publishing in PDF format has been considered the state of the art, it became recently obvious that complementary dissemination channels help to disseminate taxonomic data - one of the pillars of Natural History institutions research - more widely and efficiently (Côtez et al. 2018). The adoption of semantic markup and assignment of persistent identifiers for content allow more comprehensive citations of the article, including elements therein, such as images, taxonomic treatments, and materials citation. It also allows more in-depth analyses and visualization of the contribution of collections, authors, or specimens to taxonomic output and third parties, such as the Global Biodiversity Information Facility, for reuse of the data or building the catalogue of life. In this presentation, EJT will be used to outline the nature of natural history publishers and their technical set up. This is followed by a description of the post-publishing workflow using the Plazi workflow and dissemination via the Biodiversity Literature Repository (BLR) and TreatmentBank. It outlines switching the publishing workflow to an increased use of extended markup language (XML) and visualization of the output and concludes by publishing guidelines that enable more efficient text and data mining of the content of taxonomic publications.

Download Full-text

All-gather Algorithms Resilient to Imbalanced Process Arrival Patterns

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3460122 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1-22

Author(s):

Jerzy Proficz

Keyword(s):

Experimental Evaluation ◽

Data Exchange ◽

State Of The Art ◽

Monitoring And Evaluation ◽

The Other ◽

Early Data ◽

Cluster Architecture ◽

Novel Algorithms

Two novel algorithms for the all-gather operation resilient to imbalanced process arrival patterns (PATs) are presented. The first one, Background Disseminated Ring (BDR), is based on the regular parallel ring algorithm often supplied in MPI implementations and exploits an auxiliary background thread for early data exchange from faster processes to accelerate the performed all-gather operation. The other algorithm, Background Sorted Linear synchronized tree with Broadcast (BSLB), is built upon the already existing PAP-aware gather algorithm, that is, Background Sorted Linear Synchronized tree (BSLS), followed by a regular broadcast distributing gathered data to all participating processes. The background of the imbalanced PAP subject is described, along with the PAP monitoring and evaluation topics. An experimental evaluation of the algorithms based on a proposed mini-benchmark is presented. The mini-benchmark was performed over 2,000 times in a typical HPC cluster architecture with homogeneous compute nodes. The obtained results are analyzed according to different PATs, data sizes, and process numbers, showing that the proposed optimization works well for various configurations, is scalable, and can significantly reduce the all-gather elapsed times, in our case, up to factor 1.9 or 47% in comparison with the best state-of-the-art solution.

Download Full-text

Genus Allium in CSBG Digital Herbarium

BIO Web of Conferences ◽

10.1051/bioconf/20202400042 ◽

2020 ◽

Vol 24 ◽

pp. 00042

Author(s):

Nataliya Kovtonyuk ◽

Irina Han ◽

Evgeniya Gatilova ◽

Nikolai Friesen

Keyword(s):

Vascular Plants ◽

Russian Far East ◽

Far East ◽

Botanical Garden ◽

International Standards ◽

Type Specimens ◽

Global Biodiversity Information Facility ◽

East Europe ◽

Herbarium Collections ◽

Biodiversity Information

Two herbarium collections (NS and NSK) of the Central Siberian Botanical Garden SB RAS keep about 740,000 specimens of vascular plants, collected in Siberia, Russian Far East, Europe, Asia and North America. Genus Allium s. lat. Is presented by 6224 herbarium sheets, all of them were scanned using international standards: at a resolution of 600 dpi, the barcode for each specimen, 24-color scale and scale bar. Images and metadata are stored at the CSBG SB RAS Digital Herbarium, generated by ScanWizard Botany and MiVapp Botany software (Microtek, Taiwan). Datasets were published via IPT at the Global Biodiversity Information Facility portal (gbif.org). In total 207 species of the genus Allium are placed in the CSBS Digital Herbarium, which includes representatives from 13 subgenera and 49 sections of the genus. 35 type specimens of 18 species and subspecies of the genus Allium are hosted in CSBG Herbarium collections.

Download Full-text

Data Innovations for Transboundary Freshwater Resources Management: Are Obligations Related to Information Exchange Still Needed?

Brill Research Perspectives in International Water Law ◽

10.1163/23529369-12340016 ◽

2020 ◽

Vol 4 (4) ◽

pp. 3-78 ◽

Cited By ~ 1

Author(s):

Christina Leb

Keyword(s):

Spatial Data ◽

Information Exchange ◽

Data Exchange ◽

Data Availability ◽

Water Law ◽

Space Technology ◽

Time Data ◽

Related Data ◽

International Water Law ◽

International Water

AbstractCross-border data and information exchange is one of the most challenging issues for transboundary water management. While the regular exchange of data and information has been identified as one of the general principles of international water law, only a minority of treaties include direct obligations related to mutual data exchange. Technological innovations related to real-time data availability, space technology and earth observation have led to an increase in quality and availability of hydrological, meteorological and geo-spatial data. These innovations open new avenues for access to water related data and transform data and information exchange globally. This monograph is an exploratory assessment of the potential impacts of these disruptive technologies on data and information exchange obligations in international water law.

Download Full-text

The NASA Operation IceBridge Sea Ice Freeboard, Snow Depth and Thickness Product

10.5194/egusphere-egu2020-11405 ◽

2020 ◽

Author(s):

Jeremy Harbeck ◽

Nathan Kurtz ◽

Alek Petty

Keyword(s):

Image Analysis ◽

Sea Ice ◽

Snow Depth ◽

State Of The Art ◽

Early Years ◽

Data Availability ◽

Atmospheric Models ◽

Laser Altimetry ◽

Self Consistent ◽

Along Track

<p>Over the eleven-year lifetime of NASA&#8217;s Operation IceBridge, the Project Science Office has released an along-track sea ice freeboard, snow depth and thickness product in varying forms. Multiple versions of archival products are available for a number of the project&#8217;s early years and more recently quicklook versions, rapid-turnaround products primarily produced for summer sea ice forecasting, have been available for Arctic campaigns. During 2020, the mission&#8217;s close-out year, we are producing a final archival version of the product that will fill gaps in data availability and incorporate multiple improvements in the processing chain. These improvements include laser altimetry and snow radar pre-processing and ingestion upgrades, improved image analysis, updated tide and atmospheric models, updated gridding methodology and enhanced product outputs. The final result will constitute a state-of-the-art, internally self-consistent data product for all springtime Arctic and Antarctic Operation IceBridge campaigns.</p>

Download Full-text

The CHILDES Project: Tools for Analyzing Talk: Vol. 1. Transcription format and programs; Vol. 2. The database (3rd ed.). B. MacWhinney. Mahwah, NJ: Erlbaum, 2000. Pp. 366 (Vol. 1); Pp. 418 (Vol. 2).

Applied Psycholinguistics ◽

10.1017/s0142716402222079 ◽

2002 ◽

Vol 23 (2) ◽

pp. 304-306

Author(s):

Diane E. Beals

Keyword(s):

Data Sharing ◽

Data Exchange ◽

State Of The Art ◽

Child Language ◽

The State ◽

Exchange System ◽

Language Analysis ◽

Language Data ◽

User Friendly

Since the late 1980s, the Child Language Data Exchange System (CHILDES) has defined the state of the art of collection, analysis, archiving, and data sharing of transcriptions of children's language. Starting from scratch in 1987, Brian MacWhinney, along with many other leaders in child language, developed highly useful tools for the computerization of transcripts and their analysis. I have used the transcription conventions and analysis programs since 1989 and have seen the system evolve from a simple DOS-based program to one that handles much broader and more complex analyses within more user-friendly Windows and Macintosh platforms. This latest (third) edition of the manual that accompanies the CHILDES system reflects a more stable version of the Conventions for Human Analysis of Transcripts (CHAT) and Child Language Analysis (CLAN) programs than prior editions, which felt like works in progress. This version is written as a finished product with procedures and programs that have settled down into stable patterns of operation.

Download Full-text

A Dynamic Dashboarding Application for Fleet Monitoring Using Semantic Web of Things Technologies

Sensors ◽

10.3390/s20041152 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1152 ◽

Cited By ~ 2

Author(s):

Sander Vanden Hautte ◽

Pieter Moens ◽

Joachim Van Herwegen ◽

Dieter De Paepe ◽

Bram Steenwinckel ◽

...

Keyword(s):

Semantic Web ◽

Data Exchange ◽

State Of The Art ◽

Sensor Data ◽

Vast Amount ◽

Data Formats ◽

Sensor Configuration ◽

Fixed Structure ◽

The Web ◽

At Will

In industry, dashboards are often used to monitor fleets of assets, such as trains, machines or buildings. In such industrial fleets, the vast amount of sensors evolves continuously, new sensor data exchange protocols and data formats are introduced, new visualization types may need to be introduced and existing dashboard visualizations may need to be updated in terms of displayed sensors. These requirements motivate the development of dynamic dashboarding applications. These, as opposed to fixed-structure dashboard applications, allow users to create visualizations at will and do not have hard-coded sensor bindings. The state-of-the-art in dynamic dashboarding does not cope well with the frequent additions and removals of sensors that must be monitored—these changes must still be configured in the implementation or at runtime by a user. Also, the user is presented with an overload of sensors, aggregations and visualizations to select from, which may sometimes even lead to the creation of dashboard widgets that do not make sense. In this paper, we present a dynamic dashboard that overcomes these problems. Sensors, visualizations and aggregations can be discovered automatically, since they are provided as RESTful Web Things on a Web Thing Model compliant gateway. The gateway also provides semantic annotations of the Web Things, describing what their abilities are. A semantic reasoner can derive visualization suggestions, given the Thing annotations, logic rules and a custom dashboard ontology. The resulting dashboarding application automatically presents the available sensors, visualizations and aggregations that can be used, without requiring sensor configuration, and assists the user in building dashboards that make sense. This way, the user can concentrate on interpreting the sensor data and detecting and solving operational problems early.

Download Full-text

Automatic question generation and answer assessment: a survey

Research and Practice in Technology Enhanced Learning ◽

10.1186/s41039-021-00151-1 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Bidyut Das ◽

Mukta Majumder ◽

Santanu Phadikar ◽

Arif Ahmed Sekh

Keyword(s):

State Of The Art ◽

Assessment System ◽

Learning System ◽

Learning Resources ◽

Question Generation ◽

Web Resources ◽

Assessment Strategies ◽

Self Learning ◽

Automatic Question Generation ◽

Resources Assessment

AbstractLearning through the internet becomes popular that facilitates learners to learn anything, anytime, anywhere from the web resources. Assessment is most important in any learning system. An assessment system can find the self-learning gaps of learners and improve the progress of learning. The manual question generation takes much time and labor. Therefore, automatic question generation from learning resources is the primary task of an automated assessment system. This paper presents a survey of automatic question generation and assessment strategies from textual and pictorial learning resources. The purpose of this survey is to summarize the state-of-the-art techniques for generating questions and evaluating their answers automatically.

Download Full-text

State of the art data exchange in agriculture in the EU27 & Switzerland: survey of the agriXchange project

Suomen Maataloustieteellisen Seuran Tiedote ◽

10.33354/smst.75647 ◽

2012 ◽

pp. 1-4 ◽

Cited By ~ 1

Author(s):

Frederick Teye ◽

Henri Hoslter ◽

Liisa Pesonen

Keyword(s):

Data Integration ◽

Data Exchange ◽

Service Providers ◽

State Of The Art ◽

The State ◽

Future Research ◽

Member States ◽

Data Harmonization ◽

Eu Member States ◽

The Eu

Within the agricultural knowledge-based bio-economy, information sharing is an important issue. Information systems for agricultural supply food chain network are not standardized. This reduces efficiency in the exchange of information in agri-business processes. To address these problems, agriXchange, an EU-funded coordination and support action was setup to develop a sustainable network system for common data exchange in the agricultural sector. The overall objectives are to: a) establish a platform on data exchange in agriculture in the EU, b) develop a reference framework for interoperability of data exchange, and c) identify the main challenges for harmonizing data exchange. Analysis of the situation concerning data exchange in agriculture in individual EU member states (including Switzerland) is an integral component of this harmonization support action. In this paper the results of the investigation of the state-of-the art around agricultural data exchange in EU member states is reported. This research on data exchange and data integration was carried out in 27 EU member states and Switzerland. The investigation employed experts to quantitatively and qualitatively inquire about agricultural data exchange in the EU. A framework was developed to inquire the different integration levels, within as well as between enterprises in agriculture. Based on the analysis of the state of the art, the challenges for future research and trends data exchange in European agriculture were identified. The results showed that there are substantial differences across the EU in relation to the level of data integration and standardization. Member states can be divided into different groups from; none or hardly any data integration to quite well developed infrastructures such as France, Germany, The Netherlands and Denmark. The most important findings identified were with the aging population of farmers which manifests itself through the lack of adaption and investments in new technology, especially in Southern and Eastern countries. Availability of mobile and broadband infrastructure was a major problem in rural areas for most countries in a quantitative sense, but for ICT developed agricultural countries more of a quality of service problem. Cost of acquiring data exchange capable equipment, data exchange formats, proprietary data formats and complexity in machines was also a major concern. As a recommendation, it was noted that open networks with flexible relationships between network partners will facilitate successful integration of systems. The importance of agricultural data exchange in the EU has broadly been recognized, however all service providers and users need to be convinced about the benefits. Finally, focus should be on putting research information into practice to demonstrate how data harmonization processes can work, however, this should be kept flexible and hence keep the rigidity of (formal) standardization processes minimal in agricultural data harmonization.

Download Full-text

Revisiting Mobile Crowdsensing: An Open Challenge

10.5121/csit.2021.111509 ◽

2021 ◽

Author(s):

Vittalis Ayu

Keyword(s):

Data Collection ◽

Data Quality ◽

Context Awareness ◽

Data Exchange ◽

Data Availability ◽

Mobility Patterns ◽

Mobile Crowdsensing ◽

Collection Process ◽

Data Collection Process

Mobile crowdsensing has become a new paradigm that enables citizens to participate in the sensing process by voluntarily gathering data from their smartphones to accomplish some given task. However, performing the sensing task generate lots of data resulting in various quality of the sensed data and high sensing cost in term of resource consumption. This matter became a significant concern in mobile crowdsensing as the mobile nodes which act as crowd sensors have limited resources. Moreover, an opportunistic mobile crowdsensing mechanism does not require user involvement, so the data collection process must be autonomous and intelligent to sense the data in the proper context. That is why context-awareness is also essential in opportunistic crowdsensing to maintain the sensed data quality. In this mini-review, we revisit the possibility of enhancing the mobile crowdsensing mechanism. We argue that improving the data collection process, including context-awareness, can optimize in-node data availability and sensed data quality. Besides, we also argue that finding optimization on inter-node data exchange mechanisms will increase the quality of the in-node data. Furthermore, smartphones that are related to humans as their owners reflect humans' physical and social behavior. We believe that considering contexts such as human social relationships and human mobility patterns can benefit the optimization strategies.

Download Full-text