scholarly journals Unexploited Biodiversity Data Sources: The case of airborne pollen

Author(s):  
Arturo H. Ariño ◽  
Mónica González-Alonso ◽  
Anabel Pérez de Zabalza

With more than one billion primary biodiversity data records (PBR), the Global Biodiversity Information Facility (GBIF) is the largest and, arguably, most comprehensive and accurate resource about the biodiversity data on the planet. Yet, its gaps (taxonomical, geographical or chronological, among others) have often been brought to attention (Gaijy et al. 2013) and efforts are continuously made to ensure more uniform coverage. Especially as data obtained through this resource are increasingly being used for science, policy, and conservation (Ariño et al. 2018), drawing on every possible source of information to complement already existing data opens new opportunities for supplying the integrative knowledge required for global endeavors, such as understanding the global patterns of ecosystem and environment changes. One such potential source that exists, but so far has experienced little integration, is the vast body of data acquired through airborne particle monitoring systems (for example, the European Aeroallergen Network, EAN). A large portion of pollen data is comprised of quantitative sampling of airborne pollen collected through semi-automated spore traps throughout the world. Its main use is clinical, as it forms the basis of the widespread allergen forecast bulletins. While geolocating the source of airborne pollen is fraught with obviously large uncertainty radii, the time and taxon components of the PBR remain highly precise and are therefore fit for many other uses (Hill et al. 2010). Presence data, and under certain circumstances, frequency data inferred from pollen counts have been often proposed as an excellent proxy for past climate change assessments as far back as the start of the Holocene (Mauri et al. 2015) and might therefore also be possibly used for current climate change detection. We call for a concerted effort throughout the palynological community to first increase harmonizing, and then eventually standardizing, pollen data acquisition through the adoption of Darwin Core (DwC) and, eventually, DwC extensions to mine current data and pipeline future airborne pollen data as PBR. mine current data and pipeline future airborne pollen data as PBR. Success in this endeavor may contribute to a better understanding of global change.

2020 ◽  
Vol 8 ◽  
Author(s):  
Dave Karlsson ◽  
Mattias Forshage ◽  
Kevin Holston ◽  
Fredrik Ronquist

Despite Sweden's strong entomological tradition, large portions of its insect fauna remain poorly known. As part of the Swedish Taxonomy Initiative, launched in 2002 to document all multi-cellular species occurring in the country, the first taxonomically-broad inventory of the country's insect fauna was initiated, the Swedish Malaise Trap Project (SMTP). In total, 73 Malaise traps were deployed at 55 localities representing a wide range of habitats across the country. Most traps were run continuously from 2003 to 2006 or for a substantial part of that time period. The total catch is estimated to contain 20 million insects, distributed over 1,919 samples (Karlsson et al. 2020). The samples have been sorted into more than 300 taxonomic units, which are made available for expert identification. Thus far, more than 100 taxonomists have been involved in identifying the sorted material, recording the presence of 4,000 species. One third of these had not been recorded from Sweden before and 700 have tentatively been identified as new to science. Here, we describe the SMTP dataset, published through the Global Biodiversity Information Facility (GBIF). Data on the sorted material are available in the "SMTP Collection Inventory" dataset. It currently includes more than 130,000 records of taxonomically-sorted samples. Data on the identified material are published using the Darwin Core standard for sample-based data. That information is divided up into group-specific datasets, as the sample set processed for each group is different and in most cases non-overlapping. The current data are divided into 79 taxonomic datasets, largely corresponding to taxonomic sorting fractions. The orders Diptera and Hymenoptera together comprise about 90% of the specimens in the material and these orders are mainly sorted to family or subfamily. The remaining insect taxa are mostly sorted to the order level. In total, the 79 datasets currently available comprise around 165,000 specimens, that is, about 1% of the total catch. However, the data are now accumulating rapidly and will be published continuously. The SMTP dataset is unique in that it contains a large proportion of data on previously poorly-known taxa in the Diptera and Hymenoptera.


2020 ◽  
Vol 15 (4) ◽  
pp. 411-437 ◽  
Author(s):  
Marcos Zárate ◽  
Germán Braun ◽  
Pablo Fillottrani ◽  
Claudio Delrieux ◽  
Mirtha Lewis

Great progress to digitize the world’s available Biodiversity and Biogeography data have been made recently, but managing data from many different providers and research domains still remains a challenge. A review of the current landscape of metadata standards and ontologies in Biodiversity sciences suggests that existing standards, such as the Darwin Core terminology, are inadequate for describing Biodiversity data in a semantically meaningful and computationally useful way. As a contribution to fill this gap, we present an ontology-based system, called BiGe-Onto, designed to manage data together from Biodiversity and Biogeography. As data sources, we use two internationally recognized repositories: the Global Biodiversity Information Facility (GBIF) and the Ocean Biogeographic Information System (OBIS). BiGe-Onto system is composed of (i) BiGe-Onto Architecture (ii) a conceptual model called BiGe-Onto specified in OntoUML, (iii) an operational version of BiGe-Onto encoded in OWL 2, and (iv) an integrated dataset for its exploitation through a SPARQL endpoint. We will show use cases that allow researchers to answer questions that manage information from both domains.


2018 ◽  
Vol 2 ◽  
pp. e27087
Author(s):  
Donald Hobern ◽  
Andrea Hahn ◽  
Tim Robertson

For more than a decade, the biodiversity informatics community has recognised the importance of stable resolvable identifiers to enable unambiguous references to data objects and the associated concepts and entities, including museum/herbarium specimens and, more broadly, all records serving as evidence of species occurrence in time and space. Early efforts built on the Darwin Core institutionCode, collectionCode and catalogueNumber terms, treated as a triple and expected to uniquely to identify a specimen. Following review of current technologies for globally unique identifiers, TDWG adopted Life Science Identifiers (LSIDs) (Pereira et al. 2009). Unfortunately, the key stakeholders in the LSID consortium soon withdrew support for the technology, leaving TDWG committed to a moribund technology. Subsequently, publishers of biodiversity data have adopted a range of technologies to provide unique identifiers, including (among others) HTTP Universal Resource Identifiers (URIs), Universal Unique Identifiers (UUIDs), Archival Resource Keys (ARKs), and Handles. Each of these technologies has merit but they do not provide consistent guarantees of persistence or resolvability. More importantly, the heterogeneity of these solutions hampers delivery of services that can treat all of these data objects as part of a consistent linked-open-data domain. The geoscience community has established the System for Earth Sample Registration (SESAR) that enables collections to publish standard metadata records for their samples and for each of these to be associated with an International Geo Sample Number (IGSN http://www.geosamples.org/igsnabout). IGSNs follow a standard format, distribute responsibility for uniqueness between SESAR and the publishing collections, and support resolution via HTTP URI or Handles. Each IGSN resolves to a standard metadata page, roughly equivalent in detail to a Darwin Core specimen record. The standardisation of identifiers has allowed the community to secure support from some journal publishers for promotion and use of IGSNs within articles. The biodiversity informatics community encompasses a much larger number of publishers and greater pre-existing variation in identifier formats. Nevertheless, it would be possible to deliver a shared global identifier scheme with the same features as IGSNs by building off the aggregation services offered by the Global Biodiversity Information Facility (GBIF). The GBIF data index includes normalised Darwin Core metadata for all data records from registered data sources and could serve as a platform for resolution of HTTP URIs and/or Handles for all specimens and for all occurrence records. The most significant trade-off requiring consideration would be between autonomy for collections and other publishers in how they format identifiers within their own data and the benefits that may arise from greater consistency and predictability in the form of resolvable identifiers.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e4059 ◽  
Author(s):  
Marconi Campos-Cerqueira ◽  
T. Mitchell Aide

BackgroundClimate change and infectious diseases threaten animal and plant species, even in natural and protected areas. To cope with these changes, species may acclimate, adapt, move or decline. Here, we test for shifts in anuran distributions in the Luquillo Mountains (LM), a tropical montane forest in Puerto Rico by comparing species distributions from historical (1931–1989)and current data (2015/2016).MethodsHistorical data, which included different methodologies, were gathered through the Global Biodiversity Information Facility (GBIF) and published literature, and the current data were collected using acoustic recorders along three elevational transects.ResultsIn the recordings, we detected the 12 native frog species known to occur in LM. Over a span of ∼25 years, two species have become extinct and four species suffered extirpation in lowland areas. As a consequence, low elevation areas in the LM (<300 m) have lost at least six anuran species.DiscussionWe hypothesize that these extirpations are due to the effects of climate change and infectious diseases, which are restricting many species to higher elevations and a much smaller area. Land use change is not responsible for these changes because LM has been a protected reserve for the past 80 years. However, previous studies indicate that (1) climate change has increased temperatures in Puerto Rico, and (2)Batrachochytrium dendrobatidis (Bd)was found in 10 native species and early detection of Bd coincides with anurans declines in the LM. Our study confirms the general impressions of amphibian population extirpations at low elevations, and corroborates the levels of threat assigned by IUCN.


Author(s):  
Michael Trizna ◽  
Torsten Dikow

Taxonomic revisions contain crucial biodiversity data in the material examined sections for each species. In entomology, material examined lists minimally include the collecting locality, date of collection, and the number of specimens of each collection event. Insect species might be represented in taxonomic revisions by only a single specimen or hundreds to thousands of specimens. Furthermore, revisions of insect genera might treat small genera with few species or include tens to hundreds of species. Summarizing data from such large and complex material examined lists and revisions is cumbersome, time-consuming, and prone to errors. However, providing data on the seasonal incidence, abundance, and collecting period of species is an important way to mobilize primary biodiversity data to understand a species’s occurrence or rarity. Here, we present SpOccSum (Species Occurrence Summary)—a tool to easily obtain metrics of seasonal incidence from specimen occurrence data in taxonomic revisions. SpOccSum is written in Python (Python Software Foundation 2019) and accessible through the Anaconda Python/R Data Science Platform as a Jupyter Notebook (Kluyver et al. 2016). The tool takes a simple list of specimen data containing species name, locality, date of collection (preferably separated by day, month, and year), and number of specimens in CSV format and generates a series of tables and graphs summarizing: number of specimens per species, number of specimens collected per month, number of unique collection events, as well as earliest, and most recent collecting year of each species. number of specimens per species, number of specimens collected per month, number of unique collection events, as well as earliest, and most recent collecting year of each species. The results can be exported as graphics or as csv-formatted tables and can easily be included in manuscripts for publication. An example of an early version of the summary produced by SpOccSum can be viewed in Tables 1, 2 from Markee and Dikow (2018). To accommodate seasonality in the Northern and Southern Hemispheres, users can choose to start the data display with either January or July. When geographic coordinates are available and species have widespread distributions spanning, for example, the equator, the user can itemize particular regions such as North of Tropic of Cancer (23.5˚N), Tropic of Cancer to the Equator, Equator to Tropic of Capricorn, and South of Tropic of Capricorn (23.5˚S). Other features currently in development include the ability to produce distribution maps from the provided data (when geographic coordinates are included) and the option to export specimen occurrence data as a Darwin-Core Archive ready for upload to the Global Biodiversity Information Facility (GBIF).


2020 ◽  
Author(s):  
Annalisa Canu ◽  
Arnoldo Vargiu ◽  
Grazia Pellizzaro

&lt;p&gt;Airborne pollen data are an important source of information on flowering phenology, because they record the response of plants surrounding the sampling station, rather than the responses of individual plants, as with direct phenological observation. Plant phenology represents a good indicator of vegetation responses to long-term variation to temperatures. Furthermore, several studies have evidenced that aerobiological data series and pollen season are often strongly correlated to climate change.&lt;/p&gt;&lt;p&gt;This research aims to analyze airborne pollen data of Poaceae and Fagaceae measured from 1986 to 2008 in a urban area of northern Sardinia (Italy) and to investigate the trends in these data and their relationship with meteorological parameters using time series analysis. The aerobiological monitoring station was located in the center of the city very close to a public garden, and it is part of both the Italian and the European - A.I.A. Aeroallergen monitoring Network. Meteorological data were recorded during the same period by an automatic weather station.&lt;/p&gt;&lt;p&gt;The following parameters were calculated for each pollen: start, end and duration of pollen season, date of peak pollen concentration, number of days from the beginning of the season to the peak, annual pollen index (API), percentage distribution of API and maximum daily concentration.&lt;/p&gt;&lt;p&gt;The correlation between meteorological variables and the different characteristics of pollen seasons was analyzed using Spearman&amp;#8217;s correlation tests.&lt;/p&gt;&lt;p&gt;A linear regression model was used for the trend analysis of the API of airborne pollen spread of the two family from 1986 to 2008.&lt;/p&gt;


Author(s):  
Arthur Chapman ◽  
Lee Belbin ◽  
Paula Zermoglio ◽  
John Wieczorek ◽  
Paul Morris ◽  
...  

The quality of biodiversity data publicly accessible via aggregators such as GBIF (Global Biodiversity Information Facility), the ALA (Atlas of Living Australia), iDigBio (Integrated Digitized Biocollections), and OBIS (Ocean Biogeographic Information System) is often questioned, especially by the research community. The Data Quality Interest Group, established by Biodiversity Information Standards (TDWG) and GBIF, has been engaged in four main activities: developing a framework for the assessment and management of data quality using a fitness for use approach; defining a core set of standardised tests and associated assertions based on Darwin Core terms; gathering and classifying user stories to form contextual-themed use cases, such as species distribution modelling, agrobiodiversity, and invasive species; and developing a standardised format for building and managing controlled vocabularies of values. Using the developed framework, data quality profiles have been built from use cases to represent user needs. Quality assertions can then be used to filter data suitable for a purpose. The assertions can also be used to provide feedback to data providers and custodians to assist in improving data quality at the source. A case study, using two different implementations of tests and assertions based around the Darwin Core "Event Date" terms, were also tested against GBIF data, to demonstrate that the tests are implementation agnostic, can be run on large aggregated datasets, and can make biodiversity data more fit for typical research uses.


Author(s):  
Edward Gilbert ◽  
Corinna Gries ◽  
Nico Franz ◽  
Landrum Leslie R. ◽  
Thomas H. Nash III

The SEINet Portal Network has a complex social and development history spanning nearly two decades. Initially established as a basic online search engine for a select handful of biological collections curated within the southwestern United States, SEINet has since matured into a biodiversity data network incorporating more than 330 institutions and 1,900 individual data contributors. Participating institutions manage and publish over 14 million specimen records, 215,000 observations, and 8 million images. Approximately 70% of the collections make use of the data portal as their primary "live" specimen management platform. The SEINet interface now supports 13 regional data portals distributed across the United States and northern Mexico (http://symbiota.org/docs/seinet/). Through many collaborative efforts, it has matured into a tool for biodiversity data exploration, which includes species inventories, interactive identification keys, specimen and field images, taxonomic information, species distribution maps, and taxonomic descriptions. SEINet’s initial developmental goals were to construct a read-only interface that integrated specimen records harvested from a handful of distributed natural history databases. Intermittent network conductivity and inconsistent data exchange protocols frequently restricted data persistence. National funding opportunities supported a complete redesign towards the development of a centralized data cache model with periodic "snapshot" updates from original data sources. A service-based management infrastructure was integrated into the interface to mobilize small- to medium-sized collections (&lt;1 million specimen records) that commonly lack consistent infrastructure and technical expertise to maintain a standard compliant specimen database. These developments were the precursors to the Symbiota software project (Gries et al. 2014). Through further development of Symbiota, SEINet transformed into a robust specimen management system specifically geared toward specimen digitization with features including data entry from label images, harvesting data from specimen duplicates, batch georeferencing, data validation and cleaning, generating progress reports, and additional tools to improve the efficiency of the digitization process. The central developmental paradigm focused on data mobilization through the production of: a versatile import module capable of ingesting a diverse range of data structures, a robust toolkit to assist in digitizing and managing specimen data and images, and a Darwin Core Archive (DwC-A) compliant data publishing and export toolkit to facilitate data distribution to global aggregators such as Global Biodiversity Information Facility (GBIF) and iDigBio. a versatile import module capable of ingesting a diverse range of data structures, a robust toolkit to assist in digitizing and managing specimen data and images, and a Darwin Core Archive (DwC-A) compliant data publishing and export toolkit to facilitate data distribution to global aggregators such as Global Biodiversity Information Facility (GBIF) and iDigBio. User interfaces consist of a decentralized network of regional data portals, all connecting to a centralized shared data source. Each of the 13 data portals are configured to present a regional perspective specifically tailored to represent the needs of the local research community. This infrastructure has supported the formation of regional consortia, who provide network support to aid local institutions in digitizing and publishing their collections within the network. The community-based infrastructure creates a sense of ownership – perhaps even good-natured competition – by the data providers and provides extra incentive to improve data quality and expand the network. Certain areas of development remain challenging in spite of the project's overall success. For instance, data managers continuously struggle to maintain a current local taxonomic thesaurus used for name validation, data cleaning, and to resolve taxonomic discrepancies commonly encountered when integrating collection datasets. We will discuss the successes and challenges associated with the long-term sustainability model and explore potential future paths for SEINet that support the long-term goal of maintaining a data provider that is in full compliance with the FAIR use principles of making the datasets findable, accessible, interoperable, and reusable (Wilkinson et al. 2016).


Sign in / Sign up

Export Citation Format

Share Document