Addressing Uncertainties in Machine Learning Predictions of Conservation Status

Extinction risk assessments are increasingly important to many stakeholders (Bennun et al. 2017) but there remain large gaps in our knowledge about the status of many species. The IUCN Red List of Threatened Species (IUCN 2019, hereafter Red List) is the most comprehensive assessment of extinction risk. However, it includes assessments of just 7% of all vascular plants, while 18% of all assessed animals lack sufficient data to assign a conservation status. The wide availability of species occurrence information through digitised natural history collections and aggregators such as the Global Biodiversity Information Facility (GBIF), coupled with machine learning methods, provides an opportunity to fill these gaps in our knowledge. Machine learning approaches have already been proposed to guide conservation assessment efforts (Nic Lughadha et al. 2018), assign a conservation status to species with insufficient data for a full assessment (Bland et al. 2014), and predict the number of threatened species across the world (Pelletier et al. 2018). The wide range in sources of species occurrence records can lead to data quality issues, such as missing, imprecise, or mistaken information. These data quality issues may be compounded in databases that aggregate information from multiple sources: many such records derive from field observations (78% for plant species in GBIF; Meyer et al. 2016) largely unsupported by voucher specimens that would allow confirmation or correction of their identification. Even where voucher specimens do exist, different taxonomic or geographic information can be held for a single collection event represented by duplicate specimens deposited in different natural history collections. Tools are available to help clean species occurrence data, but these cannot deal with problems like specimen misidentification, which previous work (Nic Lughadha et al. 2019) has shown to have a large impact on preliminary assessments of conservation status. Machine learning models based on species occurrence records have been reported to predict with high accuracy the conservation status of species. However, given the black-box nature of some of the better machine learning models, it is unclear how well these accuracies apply beyond the data on which the models were trained. Practices for training machine learning models differ between studies, but more interrogation of these models is required if we are to know how much to trust their predictions. To address these problems, we compare predictions made by a machine learning model when trained on specimen occurrence records that have benefitted from minimal or more thorough cleaning, with those based on records from an expert-curated database. We then explore different techniques to interrogate machine learning models and quantify the uncertainty in their predictions.

Download Full-text

Increasing accuracy: The advantage of using open access species occurrence database in the Red List assessment

Biodiversitas Journal of Biological Diversity ◽

10.13057/biodiv/d210831 ◽

2020 ◽

Vol 21 (8) ◽

Author(s):

Iyan Robiansyah ◽

Wita Wardani

Keyword(s):

Open Access ◽

Extinction Risk ◽

Conservation Status ◽

Iucn Red List ◽

Red List ◽

Suitable Habitat ◽

Herbarium Specimens ◽

Species Occurrence ◽

Extent Of Occurrence ◽

Area Of Occupancy

Abstract. Robiansyah I, Wardani W. 2020. Increasing accuracy: The advantage of using open access species occurrence database in the Red List assessment. Biodiversitas 21: 3658-3664. IUCN Red List is the most widely used instrument to assess and advise the extinction risk of a species. One of the criteria used in IUCN Red List is geographical range of the species assessed (criterion B) in the form of extent of occurrence (EOO) and/or area of occupancy (AOO). While this criterion is presumed to be the easiest to be completed as it is based mainly on species occurrence data, there are some assessments that failed to maximize freely available databases. Here, we reassessed the conservation status of Cibotium arachnoideum, a tree fern distributed in Sumatra and Borneo. This species was previously assessed by Praptosuwiryo (2020, Biodiversitas 21 (4): 1379-1384) which classified the species as Endangered (EN) under criteria B2ab(i,ii,iii); C2a(ii). Using additional data from herbarium specimens recorded in the Global Biodiversity Information Facility (GBIF) website and from peer-reviewed scientific papers, in the present paper we show that C. arachnoideum has a larger extent of occurrence (EOO) and area of occupancy (AOO), more locations and different conservation status compared to those in Praptosuwiryo (2020). Our results are supported by the predicted suitable habitat map of C. arachnoideum produced by MaxEnt modelling method. Based on our assessment, we propose the category of Vulnerable (VU) C2a(i) as the global conservation status for C. arachnoideum. Our study implies the advantage of using open access databases to increase the accuracy of extinction risk assessment under the IUCN Red List criteria in regions like Indonesia, where adequate taxonomical information is not always readily available.

Download Full-text

Using Machine Learning Models and Logistic Regression Analyses to Develop a Comprehensive Understanding of Extinction Risk For Marine Animal Phyla Across the Paleozoic

10.1002/essoar.10509445.1 ◽

2021 ◽

Author(s):

Adarsh Ambati ◽

Theo Chiang ◽

Anya Sengupta ◽

Pedro Monarrez ◽

Michael Pimentel-Galvan ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Extinction Risk ◽

Marine Animal ◽

Regression Analyses ◽

Learning Models ◽

Comprehensive Understanding ◽

Animal Phyla ◽

Machine Learning Models

Download Full-text

The Sampled Red List Index for Plants, phase II: ground-truthing specimen-based conservation assessments

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2014.0015 ◽

2015 ◽

Vol 370 (1662) ◽

pp. 20140015 ◽

Cited By ~ 30

Author(s):

Neil Brummitt ◽

Steven P. Bachman ◽

Elina Aletrari ◽

Helen Chadburn ◽

Janine Griffiths-Lee ◽

...

Keyword(s):

Natural History ◽

Plant Species ◽

Biological Diversity ◽

Extinction Risk ◽

Conservation Status ◽

Population Data ◽

Iucn Red List ◽

Red List ◽

Distribution Maps ◽

Conservation Assessments

The IUCN Sampled Red List Index (SRLI) is a policy response by biodiversity scientists to the need to estimate trends in extinction risk of the world's diminishing biological diversity. Assessments of plant species for the SRLI project rely predominantly on herbarium specimen data from natural history collections, in the overwhelming absence of accurate population data or detailed distribution maps for the vast majority of plant species. This creates difficulties in re-assessing these species so as to measure genuine changes in conservation status, which must be observed under the same Red List criteria in order to be distinguished from an increase in the knowledge available for that species, and thus re-calculate the SRLI. However, the same specimen data identify precise localities where threatened species have previously been collected and can be used to model species ranges and to target fieldwork in order to test specimen-based range estimates and collect population data for SRLI plant species. Here, we outline a strategy for prioritizing fieldwork efforts in order to apply a wider range of IUCN Red List criteria to assessments of plant species, or any taxa with detailed locality or natural history specimen data, to produce a more robust estimation of the SRLI.

Download Full-text

Improving XGBoost with Imagination Sampling

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.2.1.holloway.1 ◽

2020 ◽

Vol 2 (1) ◽

pp. 3-6

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

General System ◽

Learning Models ◽

Starting Point ◽

Machine Learning Models

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.

Download Full-text

Development of Machine Learning Models to Predict Student Performance in Computer Literacy Courses

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v13i1.16863 ◽

2018 ◽

Vol 13 (1) ◽

pp. 21

Author(s):

George Anderson ◽

Oduronke T. Eyitayo

Keyword(s):

Machine Learning ◽

Student Performance ◽

Computer Literacy ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Experimental Comparison of Machine Learning Models in Malware Packing Detection

2020 21st Asia-Pacific Network Operations and Management Symposium (APNOMS) ◽

10.23919/apnoms50412.2020.9237007 ◽

2020 ◽

Author(s):

Jong-Wouk Kim ◽

Juhong Namgung ◽

Yang-Sae Moon ◽

Mi-Jung Choi

Keyword(s):

Machine Learning ◽

Experimental Comparison ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.

Download Full-text

A Comparative Study of Machine Learning Models for Stock Market Rate Prediction

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i6.985990 ◽

2019 ◽

Vol 7 (6) ◽

pp. 985-990

Author(s):

reeraksha M S ◽

Bhargavi M S

Keyword(s):

Machine Learning ◽

Stock Market ◽

Comparative Study ◽

Learning Models ◽

Rate Prediction ◽

Market Rate ◽

Machine Learning Models

Download Full-text

An Intelligent Approach for Prediction of Liver Disease using Machine Learning Models

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2020/568102020 ◽

2020 ◽

Vol 8 (10) ◽

pp. 6974-6983

Keyword(s):

Machine Learning ◽

Liver Disease ◽

Learning Models ◽

Intelligent Approach ◽

Machine Learning Models

Download Full-text

Utilizing Blockchain Technology in Social Media Bot Identification

10.36227/techrxiv.12049374 ◽

2020 ◽

Author(s):

Shreya Reddy ◽

Lisa Ewen ◽

Pankti Patel ◽

Prerak Patel ◽

Ankit Kundal ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Gold Standard ◽

The Internet ◽

Learning Models ◽

Current Time ◽

Machine Learning Methods ◽

Blockchain Technology ◽

Modern Age ◽

Machine Learning Models

As bots become more prevalent and smarter in the modern age of the internet, it becomes ever more important that they be identified and removed. Recent research has dictated that machine learning methods are accurate and the gold standard of bot identification on social media. Unfortunately, machine learning models do not come without their negative aspects such as lengthy training times, difficult feature selection, and overwhelming pre-processing tasks. To overcome these difficulties, we are proposing a blockchain framework for bot identification. At the current time, it is unknown how this method will perform, but it serves to prove the existence of an overwhelming gap of research under this area.

Download Full-text