scholarly journals Biospytial: spatial graph-based computing for ecological Big Data

GigaScience ◽  
2020 ◽  
Vol 9 (5) ◽  
Author(s):  
Juan M Escamilla Molgora ◽  
Luigi Sedda ◽  
Peter M Atkinson

Abstract Background The exponential accumulation of environmental and ecological data together with the adoption of open data initiatives bring opportunities and challenges for integrating and synthesising relevant knowledge that need to be addressed, given the ongoing environmental crises. Findings Here we present Biospytial, a modular open source knowledge engine designed to import, organise, analyse and visualise big spatial ecological datasets using the power of graph theory. The engine uses a hybrid graph-relational approach to store and access information. A graph data structure uses linkage relationships to build semantic structures represented as complex data structures stored in a graph database, while tabular and geospatial data are stored in an efficient spatial relational database system. We provide an application using information on species occurrences, their taxonomic classification and climatic datasets. We built a knowledge graph of the Tree of Life embedded in an environmental and geographical grid to perform an analysis on threatened species co-occurring with jaguars (Panthera onca). Conclusions The Biospytial approach reduces the complexity of joining datasets using multiple tabular relations, while its scalable design eases the problem of merging datasets from different sources. Its modular design makes it possible to distribute several instances simultaneously, allowing fast and efficient handling of big ecological datasets. The provided example demonstrates the engine’s capabilities in performing basic graph manipulation, analysis and visualizations of taxonomic groups co-occurring in space. The example shows potential avenues for performing novel ecological analyses, biodiversity syntheses and species distribution models aided by a network of taxonomic and spatial relationships.

Author(s):  
Natalya Ivanova ◽  
Maxim Shashkov

Currently Russia doesn't have a national biodiversity information system, and is still not a GBIF (Global Biodiversity Information Facility) member. Nevertheless, GBIF is the largest source of biodiversity data for Russia. As of August 2020, >5M species occurrences were available through the GBIF portal, of which 54% were published by Russian organisations. There are 107 institutions from Russia that have become GBIF publishers and 357 datasets have been published. The important trend of data mobilization in Russia is driven by the considerable contribution of citizen science. The most popular platform is iNaturalist. This year, the related GBIF dataset (Ueda 2020) became the largest one for Russia (793,049 species occurrences as of 2020-08-11). The first observation for Russia was posted in 2011, but iNaturalist started becoming popular in 2017. That year, 88 observers added >4500 observations that represented 1390 new species for Russia, 7- and 2-fold more respectively, than for the previous 6 years. Now we have nearly 12,000 observers, about 15,000 observed species and >1M research-grade observations. The ratio of observations for Tracheophyta, Chordata, and Arthropoda in Russia is different compared to the global scale. There are almost an equal amount of observations in the global iNaturalist GBIF dataset for these groups. At the same time in Russia, vascular plants make up 2/3rds of the observations. That is due to the "Flora of Russia" project, which attracted many professional botanists both as observers and experts. Thanks to their activity, Russia has a high proportion of research-grade observations in iNaturalist, 78% versus 60% globally. Another consequence of wide participation by professional researchers is the high rate of species accumulation. For some taxonomic groups conspicuous species were already revealed. There are about 850 bird species in Russia of which 398 species were observed in 2018, and only 83 new species in 2019. Currently, the number of new species recorded over time is decreasing despite the increase in observers and overall user activity. Russian iNaturalist observers have shared a lot of archive photos (taken during past years). In 2018, it was nearly 1/4 of the total number of observations and about 3/4 of new species for the year, with similar trends observed during 2019. Usually archive photos are posted from December until April, but the 2020 pandemic lockdown spurred a new wave of archive photo mobilisation in April and May. There are many iNaturalist projects for protected areas in Russia: 27 for strict nature reserves and national parks, and about 300 for others. About 100,000 observations (7.5% of all Russian observations) from the umbrella project "Protected areas of Russia" represent >34% of the species diversity observed in Russia. For some regions, e.g., Novosibirsk, Nizhniy Novgorod and Vladimir Oblasts, almost all protected areas are covered by iNaturalist projects, and are often their only source of available biodiversity data. There are also other popular citizen science platforms developed by Russian researchers. The first one is the Russian birdwatching network RU-BIRDS.RU. The related GBIF dataset (Ukolov et al. 2019) is the third largest dataset for Russia (>370,000 species occurrences). Another Russian citizen science system is wildlifemonitoring.ru, which includes thematic resources for different taxonomic groups of vertebrates. This is the crowd-sourced web-GIS maintained by the Siberian Environmental Center NGO in Novosibirsk. It is noteworthy that iNaturalist activities in Russia are developed more as a social network than as a way to attract volunteers to participate in scientific research. Of 746 citations in the iNaturalist dataset, only 18 articles include co-authors from Russia. iNaturalist data are used for the management of regional red lists (in the Republic of Bashkortostan, Novosibirsk Oblast and others), and as an additional information source for regional inventories. RU-BIRDS data were used in the European Russia Breeding Bird Atlas and the new edition of the European Breeding Bird Atlas. In Russia, citizen science activities significantly contribute to filling gaps in the global biodiversity map. However, Russian iNaturalist observations available through GBIF originate from the USA. It is not ideal, because the iNaturalist GBIF dataset is growing rapidly, and in the future it will represent more than all other datasets for Russia combined. In our opinion, iNaturalist data should be repatriated during the process of publishing through GBIF, as it is implemented for the eBird dataset (Levatich and Ligocki 2020).


2021 ◽  
Vol 9 ◽  
Author(s):  
Renee A. Catullo ◽  
Rhiannon Schembri ◽  
Leonardo Gonçalves Tedeschi ◽  
Mark D. B. Eldridge ◽  
Leo Joseph ◽  
...  

Environmental catastrophes are increasing in frequency and severity under climate change, and they substantially impact biodiversity. Recovery actions after catastrophes depend on prior benchmarking of biodiversity and that in turn minimally requires critical assessment of taxonomy and species-level diversity. Long-term recovery of species also requires an understanding of within-species diversity. Australia’s 2019–2020 bushfires were unprecedented in their extent and severity and impacted large portions of habitats that are not adapted to fire. Assessments of the fires’ impacts on vertebrates identified 114 species that were a high priority for management. In response, we compiled explicit information on taxonomic diversity and genetic diversity within fire-impacted vertebrates to provide to government agencies undertaking rapid conservation assessments. Here we discuss what we learned from our effort to benchmark pre-fire taxonomic and genetic diversity after the event. We identified a significant number of candidate species (genetic units that may be undescribed species), particularly in frogs and mammals. Reptiles and mammals also had high levels of intraspecific genetic structure relevant to conservation management. The first challenge was making published genetic data fit for purpose because original publications often focussed on a different question and did not provide raw sequence read data. Gaining access to analytical files and compiling appropriate individual metadata was also time-consuming. For many species, significant unpublished data was held by researchers. Identifying which data existed was challenging. For both published and unpublished data, substantial sampling gaps prevented areas of a species’ distribution being assigned to a conservation unit. Summarising sampling gaps across species revealed that many areas were poorly sampled across taxonomic groups. To resolve these issues and prepare responses to future catastrophes, we recommend that researchers embrace open data principles including providing detailed metadata. Governments need to invest in a skilled taxonomic workforce to document and describe biodiversity before an event and to assess its impacts afterward. Natural history collections should also target increasing their DNA collections based on sampling gaps and revise their collection strategies to increasingly take population-scale DNA samples in order to document within-species genetic diversity.


The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Andrea Lienhard ◽  
Günther Krisper

AbstractA challenge for taxonomists all over the world and across all taxonomic groups is recognizing and delimiting species, and cryptic species are even more challenging. However, an accurate identification is fundamental for all biological studies from ecology to conversation biology. We used a multidisciplinary approach including genetics as well as morphological and ecological data to assess if an easily recognizable, widely distributed and euryoecious mite taxon represents one and the same species. According to phylogenetic (based on mitochondrial and nuclear genes) and species delimitation analyses, five distinct putative species were detected and supported by high genetic distances. These genetic lineages correlate well with ecological data, and each species could be associated to its own (micro)habitat. Subsequently, slight morphological differences were found and provide additional evidence that five different species occur in Central and Southern Europe. The minuteness and the characteristic habitus of Caleremaeus monilipes tempted to neglect potential higher species diversity. This problem might concern several other “well-known” euryoecious microarthropods. Five new species of the genus Caleremaeus are described, namely Caleremaeus mentobellus sp. nov., C. lignophilus sp. nov., C. alpinus sp. nov., C. elevatus sp. nov., and C. hispanicus sp. nov. Additionally, a morphological evaluation of C. monilipes is presented.


2015 ◽  
Author(s):  
Elita Baldridge ◽  
David J. Harris ◽  
Xiao Xiao ◽  
Ethan P. White

AbstractA number of different models have been proposed as descriptions of the species-abundance distribution (SAD). Most evaluations of these models use only one or two models, focus only a single ecosystem or taxonomic group, or fail to use appropriate statistical methods. We use likelihood and AIC to compare the fit of four of the most widely used models to data on over 16,000 communities from a diverse array of taxonomic groups and ecosystems. Across all datasets combined the log-series, Poisson lognormal, and negative binomial all yield similar overall fits to the data. Therefore, when correcting for differences in the number of parameters the log-series generally provides the best fit to data. Within individual datasets some other distributions performed nearly as well as the log-series even after correcting for the number of parameters. The Zipf distribution is generally a poor characterization of the SAD.


Ecology ◽  
2017 ◽  
Author(s):  
Friedrich Recknagel

The emerging discipline of ecological informatics takes into account the data-intensive nature of ecology, the precious information content of ecological data, and the growing capacity of computational technology to leverage complex data as well as the critical need for informing sustainable management of complex ecosystems. It comprehends novel concepts and techniques for image- and genome-based monitoring, data management, data analysis, synthesis, and forecasting.


2021 ◽  
Author(s):  
Alice Fremand

<p>Open data is not a new concept. Over sixty years ago in 1959, knowledge sharing was at the heart of the Antarctic Treaty which included in article III 1c the statement: “scientific observations and results from Antarctica shall be exchanged and made freely available”. ​At a similar time, the World Data Centre (WDC) system was created to manage and distribute the data collected from the International Geophysical Year (1957-1958) led by the International Council of Science (ICSU) building the foundations of today’s research data management practices.</p><p>What about now? The WDC system still exists through the World Data System (WDS). Open data has been endorsed by a majority of funders and stakeholders. Technology has dramatically evolved. And the profession of data manager/curator has emerged. Utilising their professional expertise means that their role is far wider than the long-term curation and publication of data sets.</p><p>Data managers are involved in all stages of the data life cycle: from data management planning, data accessioning to data publication and re-use. They implement open data policies; help write data management plans and provide advice on how to manage data during, and beyond the life of, a science project. In liaison with software developers as well as scientists, they are developing new strategies to publish data either via data catalogues, via more sophisticated map-based viewer services or in machine-readable form via APIs. Often, they bring the expertise of the field they are working in to better assist scientists satisfy Findable, Accessible, Interoperable and Re-usable (FAIR) principles. Recent years have seen the development of a large community of experts that are essential to share, discuss and set new standards and procedures. The data are published to be re-used, and data managers are key to promoting high-quality datasets and participation in large data compilations.</p><p>To date, there is no magical formula for FAIR data. The Research Data Alliance is a great platform allowing data managers and researchers to work together, develop and adopt infrastructure that promotes data-sharing and data-driven research. However, the challenge to properly describe each data set remains. Today, scientists are expecting more and more from their data publication or data requests: they want interactive maps, they want more complex data systems, they want to query data, combine data from different sources and publish them rapidly.  By developing new procedures and standards, and looking at new technologies, data managers help set the foundations to data science.</p>


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2823 ◽  
Author(s):  
Elita Baldridge ◽  
David J. Harris ◽  
Xiao Xiao ◽  
Ethan P. White

A number of different models have been proposed as descriptions of the species-abundance distribution (SAD). Most evaluations of these models use only one or two models, focus on only a single ecosystem or taxonomic group, or fail to use appropriate statistical methods. We use likelihood and AIC to compare the fit of four of the most widely used models to data on over 16,000 communities from a diverse array of taxonomic groups and ecosystems. Across all datasets combined the log-series, Poisson lognormal, and negative binomial all yield similar overall fits to the data. Therefore, when correcting for differences in the number of parameters the log-series generally provides the best fit to data. Within individual datasets some other distributions performed nearly as well as the log-series even after correcting for the number of parameters. The Zipf distribution is generally a poor characterization of the SAD.


1999 ◽  
Vol 18 (1) ◽  
pp. 45-65 ◽  
Author(s):  
Esam O. Abdulsamad ◽  
Roberto Barbieri

Abstract. In the coastal area of northeastern Cyrenaica (Libya), the excellent exposures of Cenozoic limestone sequences of Al Jabal al Akhdar average around 1000 m in thickness and allow detailed stratigraphic investigations to be undertaken. This study of the biostratigraphy and depositional environments has been augmented by an analysis of the microfacies and of matrix-free foraminiferal assemblages. The biotic contents of the microfacies provide a good tool for correlation with the Letter classification developed from the Indo-Pacific region. The palaeoecological significance of the biota has been evaluated by comparison with the ecological requirements of their present day counterparts. Limitations for the palaeoecological interpretations are mainly due to the inadequate relationships with existing ecological data sets and to some local bias in fossil recovery because of some unfavourable lithologies. In the investigated Eocene to Miocene shallow marine carbonate succession nine different microfacies and sub-microfacies were distinguished through depositional texture and biotic components. Wilson’s standard carbonate facies belts, integrated with present day foraminiferal distribution models, have been used for reference in microfacies analysis and description. Most of the microfossils present are foraminifera and a total of 150 taxa, including larger, small and planktonic foraminifera, have been recognized and their stratigraphic and palaeaeocological distribution reported. Physiographically, the rock sequences investigated are referred to a shelf–carbonate platform complex, in which the depositional environments range from open shelf to restricted platform conditions. The nature and distribution of the foraminiferal assemblages and related biota, in association with sedimentological evidence, indicate a generalized shallowing upward trend in which several bathymetric oscillations, especially in the Oligocene, are reported. These reflect the interplay between local tectonics and large-scale eustatic changes.


2011 ◽  
Vol 8 (3) ◽  
pp. 324-326 ◽  
Author(s):  
Luciana H. Y. Kamino ◽  
João Renato Stehmann ◽  
Silvana Amaral ◽  
Paulo De Marco ◽  
Thiago F. Rangel ◽  
...  

The workshop ‘ Species distribution models: applications, challenges and perspectives ’ held at Belo Horizonte (Brazil), 29–30 August 2011, aimed to review the state-of-the-art in species distribution modelling (SDM) in the neotropical realm. It brought together researchers in ecology, evolution, biogeography and conservation, with different backgrounds and research interests. The application of SDM in the megadiverse neotropics—where data on species occurrences are scarce—presents several challenges, involving acknowledging the limitations imposed by data quality, including surveys as an integral part of SDM studies, and designing the analyses in accordance with the question investigated. Specific solutions were discussed, and a code of good practice in SDM studies and related field surveys was drafted.


Sign in / Sign up

Export Citation Format

Share Document