scholarly journals Going Molecular: Sequence-based spatiotemporal biodiversity evidence in GBIF

Author(s):  
Dmitry Schigel ◽  
Thomas Jeppesen ◽  
Robert Finn ◽  
Guy Cochrane ◽  
Urmas Kõljalg ◽  
...  

The Global Biodiversity Information Facility (GBIF) was established by governments in 2001, largely through the initiative and leadership of the natural history collections community, following the 1999 recommendation by a working group under the Megascience Forum (predecessor of the Global Science Forum) of the Organization for Economic Cooperation and Development (OECD). Over 20 years, GBIF has helped develop standards and convened a global community of data-publishing institutions, aggregrating over one billion specimen occurrence records freely and openly available for use in research and policy making. These GBIF mediated data range from vouchered museum specimens to observation records generated by humans and machines. New data are being generated from integrated remote sensing, ecological sampling, and molecular sequencing that have strong geospatial components but lack traditional vouchers. GBIF is working with partners to develop best practices of bringing this data into the GBIF architecture. Following discussions during the second Global Biodiversity Information Conference in 2018, GBIF and the European Bioinformatics Institute (EMBL-EBI), supported by ELIXIR, have extended collaboration to share species occurrence records known only from their genetic material. When these data providers contribute data coordinates along with the sequences to the European Nucleotide Archive (ENA), the records will appear on GBIF maps and in spatial searches. This collaboration enables significant new molecular data streams to become discoverable through GBIF.org: by mid-March 2019, over 7.8m individual occurrence records via the ENA, and over 13.2m records as standardized Darwin Core sampling-event datasets via MGnify, a resource that provides taxonomic and functional annotations on sequences derived from environmental sequencing projects. Sequence-based occurrence records published by ENA and MGnify boost representation of microbial diversity which was underrepresented at GBIF. The ELIXIR-ENA-MGnify-GBIF partnership is working on further refinement of the dynamic data linkages, frequency of updates and other improvements. The API-based tool that connects GBIF data infrastructures is open to new data contributors and for indexes of molecular occurrences. Indexing of these data streams is dependent on the presence of a name (any rank) with the sequence. Under the current Codes of nomenclature, animals, fungi, plants, and algae cannot be described based on exclusively sequence data. Yet, a significant volume of biodiversity data has only been represented by DNA sequences. Barcoding and sequence clustering procedures vary among taxa and research communities, but clusters can be related to a taxon with a Latin name. Many DNA similarity clusters do not contain a sequence from a formally described taxon; however these sequence clusters provide provisional molecular names for nomenclatural communication. In the best cases, curated libraries of reference sequences, their metadata, clusters, alignments, and links to individuals and physical material become de facto naming conventions for certain taxonomic groups, and co-exist with Latin names. Integration of molecular names into the taxonomic backbone of GBIF started with Fungi and UNITE, a data management and identification environment for fungal ITS barcodes with 87,000+ fungal species hypotheses demarcating 800,000+ sequence specimens as of March 2019. Checklist publication of all names in UNITE through GBIF.org including Linnaean names and stable, DOI-trackable molecular sequence based ‘species hypotheses’, enables indexing of fungal metabarcoding data worldwide, such as BIOWIDE. As names are currently essential to indexing the world’s occurrence data, GBIF will develop similar linkages with names in the Barcode of Life data system (BOLD) and in SILVA - a resource for high-quality ribosomal RNA sequence data and taxonomy, and welcomes other reference systems to this development. Expanding the molecular data streams (Fig. 1) allows GBIF to address spatial, temporal and taxonomic gaps and biases, and to support large-scale data-intensive research openly and worldwide.

Phytotaxa ◽  
2014 ◽  
Vol 176 (1) ◽  
pp. 219 ◽  
Author(s):  
ASHA J. DISSANAYAKE ◽  
RUVISHIKA S. JAYAWARDENA ◽  
SARANYAPHAT BOONMEE ◽  
KASUN M. THAMBUGALA ◽  
QING TIAN ◽  
...  

The family Myriangiaceae is relatively poorly known amongst the Dothideomycetes and includes genera which are saprobic, epiphytic and parasitic on the bark, leaves and branches of various plants. The family has not undergone any recent revision, however, molecular data has shown it to be a well-resolved family closely linked to Elsinoaceae in Myriangiales. Both morphological and molecular characters indicate that Elsinoaceae differs from Myriangiaceae. In Elsinoaceae, small numbers of asci form in locules in light coloured pseudostromata, which form typical scab-like blemishes on leaf or fruit surfaces. The coelomycetous, “Sphaceloma”-like asexual state of Elsinoaceae, form more frequently than the sexual state; conidiogenesis is phialidic and conidia are 1-celled and hyaline. In Myriangiaceae, locules with single asci are scattered in a superficial, coriaceous to sub-carbonaceous, black ascostromata and do not form scab-like blemishes. No asexual state is known. In this study, we revisit the family Myriangiaceae, and accept ten genera, providing descriptions and discussion on the generic types of Anhellia, Ascostratum, Butleria, Dictyocyclus, Diplotheca, Eurytheca, Hemimyriangium, Micularia, Myriangium and Zukaliopsis. The genera of Myriangiaceae are compared and contrasted. Myriangium duriaei is the type species of the family, while Diplotheca is similar and may possibly be congeneric. The placement of Anhellia in Myriangiaceae is supported by morphological and molecular data. Because of similarities with Myriangium, Ascostratum (A. insigne), Butleria (B. inaghatahani), Dictyocyclus (D. hydrangea), Eurytheca (E. trinitensis), Hemimyriangium (H. betulae), Micularia (M. merremiae) and Zukaliopsis (Z. amazonica) are placed in Myriangiaceae. Molecular sequence data from fresh collections is required to confirm the relationships and placement of the genera in this family.


2013 ◽  
Vol 58 (4) ◽  
Author(s):  
Kurt Galbreath ◽  
Kristina Ragaliauskaite ◽  
Leonas Kontrimavichus ◽  
Arseny Makarikov ◽  
Eric Hoberg

AbstractHymenolepidid cestodes in Myodes glareolus from Lithuania and additional specimens originally attributed to Arostrilepis horrida from the Republic of Belarus are now referred to A. tenuicirrosa. Our study includes the first records of A. tenuicirrosa from the European (western) region of the Palearctic, and contributes to the recognition of A. horrida (sensu lato) as a complex of cryptic species distributed broadly across the Holarctic. Specimens of A. tenuicirrosa from Lithuania were compared to cestodes representing apparently disjunct populations in the eastern Palearctic based on structural characters of adult parasites and molecular sequence data from nuclear (ITS2) and mitochondrial (cytochrome b) genes. Morphological and molecular data revealed low levels of divergence between eastern and western populations. Phylogeographic relationships among populations and host biogeographic history suggests that limited intraspecific diversity within A. tenuicirrosa may reflect a Late Pleistocene transcontinental range expansion from an East Asian point of origin.


Parasite ◽  
2021 ◽  
Vol 28 ◽  
pp. 59
Author(s):  
Camila Pantoja ◽  
Anna Faltýnková ◽  
Katie O’Dwyer ◽  
Damien Jouet ◽  
Karl Skírnisson ◽  
...  

The biodiversity of freshwater ecosystems globally still leaves much to be discovered, not least in the trematode parasite fauna they support. Echinostome trematode parasites have complex, multiple-host life-cycles, often involving migratory bird definitive hosts, thus leading to widespread distributions. Here, we examined the echinostome diversity in freshwater ecosystems at high latitude locations in Iceland, Finland, Ireland and Alaska (USA). We report 14 echinostome species identified morphologically and molecularly from analyses of nad1 and 28S rDNA sequence data. We found echinostomes parasitising snails of 11 species from the families Lymnaeidae, Planorbidae, Physidae and Valvatidae. The number of echinostome species in different hosts did not vary greatly and ranged from one to three species. Of these 14 trematode species, we discovered four species (Echinoparyphium sp. 1, Echinoparyphium sp. 2, Neopetasiger sp. 5, and Echinostomatidae gen. sp.) as novel in Europe; we provide descriptions for the newly recorded species and those not previously associated with DNA sequences. Two species from Iceland (Neopetasiger islandicus and Echinoparyphium sp. 2) were recorded in both Iceland and North America. All species found in Ireland are new records for this country. Via an integrative taxonomic approach taken, both morphological and molecular data are provided for comparison with future studies to elucidate many of the unknown parasite life cycles and transmission routes. Our reports of species distributions spanning Europe and North America highlight the need for parasite biodiversity assessments across large geographical areas.


Author(s):  
T.S. Kemp

The vast majority of living and fossil mammals are placentals. Today there are about 4,400 species, which are traditionally organised into 18 Orders, with an extra one if the Pinnipedia are separated from the Carnivora, and a twentieth if the recently extinct Malagasy order Bibymalagasia is recognised as such. There have been many attempts to discover supraordinal groupings from amongst these Orders based on morphological characters, though few proposals have been universally accepted. It is only with the advent of increasingly large sets of molecular sequence data in the last few years that a reasonably robust resolution looks imminent, although these contemporary analyses are remarkably and controversially at odds with the traditional ones. Novacek et al. (1988) summarised the then current situation regarding supraordinal classification of placentals, a time at which morphology was still dominant but molecular data was at the threshold of significance. They accepted a basal group Edentata that combined the Xenarthra of the New World with the Pholidota of the Old, based on a few cranial characters, loss of the anterior teeth, and reduction of the enamel of the remaining ones. This left the rest of the living placentals as a monophyletic group Epitheria, sharing such apparently minor characters as the shape of the stapes bone in the ear. They found very little resolution within the Epitheria, and concluded that there was a polychotomy of no less than nine lineages arranged as a ‘star’ phylogeny. No remnant of the previously recognised taxon Ferungulata, created by Simpson (1945) for the Carnivora plus the ungulate orders Artiodactyla, Perissodactyla, Proboscidea, Hyracoidea, Sirenia, and Tubulidentata remained. On the other hand, three supra ordinal taxa of earlier authors did survive. One was Gregory’s (1910) Archonta, consisting of generally conservative forms and by now composed of the Primates, Dermoptera, Scandentia, and Chiroptera, but excluding the Lipotyphla. The second was Glires, originating with Linnaeus (1758) and widely accepted ever since, for the Rodentia and Lagomorpha; Novacek et al. (1988) tentatively placed the Macroscelidea as the sister-group of the Glires. The third supraordinal taxon recognised was, like Glires, well-established if not universally accepted.


2020 ◽  
Author(s):  
Yang Young Lu ◽  
Jiaxing Bai ◽  
Yiwen Wang ◽  
Ying Wang ◽  
Fengzhu Sun

AbstractMotivationRapid developments in sequencing technologies have boosted generating high volumes of sequence data. To archive and analyze those data, one primary step is sequence comparison. Alignment-free sequence comparison based on k-mer frequencies offers a computationally efficient solution, yet in practice, the k-mer frequency vectors for large k of practical interest lead to excessive memory and storage consumption.ResultsWe report CRAFT, a general genomic/metagenomic search engine to learn compact representations of sequences and perform fast comparison between DNA sequences. Specifically, given genome or high throughput sequencing (HTS) data as input, CRAFT maps the data into a much smaller embedding space and locates the best matching genome in the archived massive sequence repositories. With 102 – 104-fold reduction of storage space, CRAFT performs fast query for gigabytes of data within seconds or minutes, achieving comparable performance as six state-of-the-art alignment-free measures.AvailabilityCRAFT offers a user-friendly graphical user interface with one-click installation on Windows and Linux operating systems, freely available at https://github.com/jiaxingbai/[email protected]; [email protected] informationSupplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Manuela Mejía Estrada ◽  
Luz Fernanda Jiménez-Segura ◽  
Iván Soto Calderón

The Barcoding was proposed motivated by the mismatch between the low number of taxonomists that contrasts with the large number of species, the method requires the construction of reference collections of DNA sequences that represent existing biodiversity. Freshwater fishes are key indicators for understanding biogeography around the world. Colombia with 1610 species of freshwater fishes is the second richest country in the world in this group. However, genetic information of the species continues to be limited, the contribution to a reference library of DNA barcodes for Colombian freshwater fishes highlights the importance of biological collections and seeks to strengthen inventories and taxonomy of such collections in future studies. This dataset contributes to the knowledge on the DNA barcodes and occurrence records of 96 species of Freshwater fishes from Colombia. The species represented in this dataset correspond to an addition to BOLD public databases of 39 species. Forty-nine specimens were collected in Atrato bassin and 708 in Magdalena-Cauca bassin during the period of 2010 to 2020, two species (Loricariichthys brunneus and Poecilia sphenops) are considered exotic to the Atrato, Cauca and Magdalena basins and four species (Oncorhynchu mykiss, Oreochromis niloticus, Parachromis friedrichsthalii and Xiphophorus helleri) are exotic to Colombian hydrogeographic regions. All specimens are deposited in the CIUA collection at University of Antioquia and have their DNA barcodes made publicly available in the Barcode of Life Data System (BOLD) online database and the distribution dataset can be freely accessed through the Global Biodiversity Information Facility (GBIF).


ZooKeys ◽  
2018 ◽  
Vol 751 ◽  
pp. 129-146 ◽  
Author(s):  
Robert Mesibov

A total of ca 800,000 occurrence records from the Australian Museum (AM), Museums Victoria (MV) and the New Zealand Arthropod Collection (NZAC) were audited for changes in selected Darwin Core fields after processing by the Atlas of Living Australia (ALA; for AM and MV records) and the Global Biodiversity Information Facility (GBIF; for AM, MV and NZAC records). Formal taxon names in the genus- and species-groups were changed in 13–21% of AM and MV records, depending on dataset and aggregator. There was little agreement between the two aggregators on processed names, with names changed in two to three times as many records by one aggregator alone compared to records with names changed by both aggregators. The type status of specimen records did not change with name changes, resulting in confusion as to the name with which a type was associated. Data losses of up to 100% were found after processing in some fields, apparently due to programming errors. The taxonomic usefulness of occurrence records could be improved if aggregators included both original and the processed taxonomic data items for each record. It is recommended that end-users check original and processed records for data loss and name replacements after processing by aggregators.


Author(s):  
Beckett Sterner ◽  
Nathan Upham ◽  
Prashant Gupta ◽  
Caleb Powell ◽  
Nico Franz

Making the most of biodiversity data requires linking observations of biological species from multiple sources both efficiently and accurately (Bisby 2000, Franz et al. 2016). Aggregating occurrence records using taxonomic names and synonyms is computationally efficient but known to experience significant limitations on accuracy when the assumption of one-to-one relationships between names and biological entities breaks down (Remsen 2016, Franz and Sterner 2018). Taxonomic treatments and checklists provide authoritative information about the correct usage of names for species, including operational representations of the meanings of those names in the form of range maps, reference genetic sequences, or diagnostic traits. They increasingly provide taxonomic intelligence in the form of precise description of the semantic relationships between different published names in the literature. Making this authoritative information Findable, Accessible, Interoperable, and Reusable (FAIR; Wilkinson et al. 2016) would be a transformative advance for biodiversity data sharing and help drive adoption and novel extensions of existing standards such as the Taxonomic Concept Schema and the OpenBiodiv Ontology (Kennedy et al. 2006, Senderov et al. 2018). We call for the greater, global Biodiversity Information Standards (TDWG) and taxonomy community to commit to extending and expanding on how FAIR applies to biodiversity data and include practical targets and criteria for the publication and digitization of taxonomic concept representations and alignments in taxonomic treatments, checklists, and backbones. As a motivating case, consider the abundantly sampled North American deer mouse—Peromyscus maniculatus (Wagner 1845)—which was recently split from one continental species into five more narrowly defined forms, so that the name P. maniculatus is now only applied east of the Mississippi River (Bradley et al. 2019, Greenbaum et al. 2019). That single change instantly rendered ambiguous ~7% of North American mammal records in the Global Biodiversity Information Facility (n=242,663, downloaded 2021-06-04; GBIF.org 2021) and ⅓ of all National Ecological Observatory Network (NEON) small mammal samples (n=10,256, downloaded 2021-06-27). While this type of ambiguity is common in name-based databases when species are split, the example of P. maniculatus is particularly striking for its impact upon biological questions ranging from hantavirus surveillance in North America to studies of climate change impacts upon rodent life-history traits. Of special relevance to NEON sampling is recent evidence suggesting deer mice potentially transmit SARS-CoV-2 (Griffin et al. 2021). Automating the updating of occurrence records in such cases and others will require operational representations of taxonomic concepts—e.g., range maps, reference sequences, and diagnostic traits—that are FAIR in addition to taxonomic concept alignment information (Franz and Peet 2009). Despite steady progress, it remains difficult to find, access, and reuse authoritative information about how to apply taxonomic names even when it is already digitized. It can also be difficult to tell without manual inspection whether similar types of concept representations derived from multiple sources, such as range maps or reference sequences selected from different research articles or checklists, are in fact interoperable for a particular application. The issue is therefore different from important ongoing efforts to digitize trait information in species circumscriptions, for example, and focuses on how already digitized knowledge can best be packaged to inform human experts and artifical intelligence applications (Sterner and Franz 2017). We therefore propose developing community guidelines and criteria for FAIR taxonomic concept representations as "semantic artefacts" of general relevance to linked open data and life sciences research (Le Franc et al. 2020).


IMA Fungus ◽  
2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Kai Wang ◽  
Timo Sipilä ◽  
Kirk Overmyer

AbstractProtomyces is an understudied genus of yeast-like fungi currently defined as phytopathogens of only Umbelliferae and Compositae. Species relationships and boundaries remain controversial and molecular data are lacking. Of the 82 named Protomyces, we found few recent studies and six available cultures. We previously isolated Protomyces strains from wild Arabidopsis thaliana, a member of Brassicaceae, a family distant from accepted Protomyces hosts. We previously sequenced the genomes of all available Protomyces species, and P. arabidopsidicola sp. nov. strain C29, from Arabidopsis. Phylogenomics suggests this new species occupied a unique position in the genus. Genomic, morphological, and physiological characteristics distinguished P. arabidopsidicola sp. nov. from other Protomyces. Nuclear gene phylogenetic marker analysis suggests actin1 gene DNA sequences could be used with nuclear ribosomal DNA internal transcribed spacer sequences for rapid identification of Protomyces species. Previous studies demonstrated P. arabidopsidicola sp. nov. could persist on the Arabidopsis phyllosphere and Protomyces sequences were discovered on Arabidopsis at multiple sites in different countries. We conclude that the strain C29 represents a novel Protomyces species and propose the name of P. arabidopsidicola sp. nov. Consequently, we propose that Protomyces is not strictly associated only with the previously recognized host plants.


2019 ◽  
Vol 42 (1) ◽  
pp. 75-100 ◽  
Author(s):  
C.G. Boluda ◽  
V.J. Rico ◽  
P.K. Divakar ◽  
O. Nadyeina ◽  
L. Myllys ◽  
...  

In many lichen-forming fungi, molecular phylogenetic analyses lead to the discovery of cryptic species within traditional morphospecies. However, in some cases, molecular sequence data also questions the separation of phenotypically characterised species. Here we apply an integrative taxonomy approach – including morphological, chemical, molecular, and distributional characters – to re-assess species boundaries in a traditionally speciose group of hair lichens, Bryoria sect. Implexae. We sampled multilocus sequence and microsatellite data from 142 specimens from a broad intercontinental distribution. Molecular data included DNA sequences of the standard fungal markers ITS, IGS, GAPDH, two newly tested loci (FRBi15 and FRBi16), and SSR frequencies from 18 microsatellite markers. Datasets were analysed with Bayesian and maximum likelihood phylogenetic reconstruction, phenogram reconstruction, STRUCTURE Bayesian clustering, principal coordinate analysis, haplotype network, and several different species delimitation analyses (ABGD, PTP, GMYC, and DISSECT). Additionally, past population demography and divergence times are estimated. The different approaches to species recognition do not support the monophyly of the 11 currently accepted morphospecies, and rather suggest the reduction of these to four phylogenetic species. Moreover, three of these are relatively recent in origin and cryptic, including phenotypically and chemically variable specimens. Issues regarding the integration of an evolutionary perspective into taxonomic conclusions in species complexes, which have undergone recent diversification, are discussed. The four accepted species, all epitypified by sequenced material, are Bryoria fuscescens, B. glabra, B. kockiana, and B. pseudofuscescens. Ten species rank names are reduced to synonymy. In the absence of molecular data, they can be recorded as the B. fuscescens complex. Intraspecific phenotype plasticity and factors affecting the speciation of different morphospecies in this group of Bryoria are outlined.


Sign in / Sign up

Export Citation Format

Share Document