Combining Regional Habitat Selection Models for Large-Scale Prediction: Circumpolar Habitat Selection of Southern Ocean Humpback Whales

Ryan R. Reisinger; Ari S. Friedlaender; Alexandre N. Zerbini; Daniel M. Palacios; Virginia Andrews-Goff; Luciano Dalla Rosa; Mike Double; Ken Findlay; Claire Garrigue; Jason How; Curt Jenner; Micheline-Nicole Jenner; Bruce Mate; Howard C. Rosenbaum; S. Mduduzi Seakamela; Rochelle Constantine

doi:10.3390/rs13112074

Combining Regional Habitat Selection Models for Large-Scale Prediction: Circumpolar Habitat Selection of Southern Ocean Humpback Whales

Remote Sensing ◽

10.3390/rs13112074 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2074

Author(s):

Ryan R. Reisinger ◽

Ari S. Friedlaender ◽

Alexandre N. Zerbini ◽

Daniel M. Palacios ◽

Virginia Andrews-Goff ◽

...

Keyword(s):

Habitat Selection ◽

Predictive Models ◽

Regional Variation ◽

Large Scale ◽

Predictive Performance ◽

Humpback Whale ◽

Machine Learning Algorithms ◽

Humpback Whales ◽

Environmental Covariates ◽

Animal Habitat

Machine learning algorithms are often used to model and predict animal habitat selection—the relationships between animal occurrences and habitat characteristics. For broadly distributed species, habitat selection often varies among populations and regions; thus, it would seem preferable to fit region- or population-specific models of habitat selection for more accurate inference and prediction, rather than fitting large-scale models using pooled data. However, where the aim is to make range-wide predictions, including areas for which there are no existing data or models of habitat selection, how can regional models best be combined? We propose that ensemble approaches commonly used to combine different algorithms for a single region can be reframed, treating regional habitat selection models as the candidate models. By doing so, we can incorporate regional variation when fitting predictive models of animal habitat selection across large ranges. We test this approach using satellite telemetry data from 168 humpback whales across five geographic regions in the Southern Ocean. Using random forests, we fitted a large-scale model relating humpback whale locations, versus background locations, to 10 environmental covariates, and made a circumpolar prediction of humpback whale habitat selection. We also fitted five regional models, the predictions of which we used as input features for four ensemble approaches: an unweighted ensemble, an ensemble weighted by environmental similarity in each cell, stacked generalization, and a hybrid approach wherein the environmental covariates and regional predictions were used as input features in a new model. We tested the predictive performance of these approaches on an independent validation dataset of humpback whale sightings and whaling catches. These multiregional ensemble approaches resulted in models with higher predictive performance than the circumpolar naive model. These approaches can be used to incorporate regional variation in animal habitat selection when fitting range-wide predictive models using machine learning algorithms. This can yield more accurate predictions across regions or populations of animals that may show variation in habitat selection.

Download Full-text

Large-scale spatial variabilities in the humpback whale acoustic presence in the Atlantic sector of the Southern Ocean

Royal Society Open Science ◽

10.1098/rsos.201347 ◽

2020 ◽

Vol 7 (12) ◽

pp. 201347

Author(s):

Elena Schall ◽

Karolin Thomisch ◽

Olaf Boebel ◽

Gabriele Gerlach ◽

Stefanie Spiesecke ◽

...

Keyword(s):

Southern Ocean ◽

High Latitude ◽

Large Scale ◽

Austral Summer ◽

Humpback Whale ◽

Humpback Whales ◽

Stock Management ◽

Atlantic Sector ◽

Acoustic Activity ◽

Passive Acoustic

Southern Hemisphere humpback whales ( Megaptera novaeangliae ) inhabit a wide variety of ecosystems including both low- and high-latitude areas. Understanding the habitat selection of humpback whale populations is key for humpback whale stock management and general ecosystem management. In the Atlantic sector of the Southern Ocean ( ASSO ), the investigation of baleen whale distribution by sighting surveys is temporally restricted to the austral summer. The implementation of autonomous passive acoustic monitoring, in turn, allows the study of vocal baleen whales year-round. This study describes the results of analysing passive acoustic data spanning 12 recording positions throughout the ASSO applying a combination of automatic and manual analysis methods to register humpback whale acoustic activity. Humpback whales were present at nine recording positions with higher acoustic activities towards lower latitudes and the eastern and western edges of the ASSO . During all months, except December (the month with the fewest recordings), humpback whale acoustic activity was registered in the ASSO . The acoustic presence of humpback whales at various locations in the ASSO confirms previous observations that part of the population remains in high-latitude waters beyond austral summer, presumably to feed. The spatial and temporal extent of humpback whale presence in the ASSO suggests that this area may be used by multiple humpback whale breeding populations as a feeding ground.

Download Full-text

The Large-Scale Distribution of Humpback Whales (Megaptera novaeangliae) Wintering in French Polynesia During 1997-2002

Aquatic Mammals ◽

10.1578/am.30.2.2004.227 ◽

2004 ◽

Vol 30 (2) ◽

pp. 227-236 ◽

Cited By ~ 5

Author(s):

Alexandre Gannier

Keyword(s):

Large Scale ◽

French Polynesia ◽

Megaptera Novaeangliae ◽

Humpback Whales

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>

Download Full-text

Efficient Image Retrieval approach for Large-scale Chest X Ray data using Hand-Crafted Features and Machine Learning Algorithms

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i11.890896 ◽

2018 ◽

Vol 6 (11) ◽

pp. 890-896

Author(s):

Irene Getzi S ◽

D. Christopher Durairaj ◽

V Joseph Raj

Keyword(s):

Machine Learning ◽

Image Retrieval ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

X Ray ◽

Chest X Ray

Download Full-text

Clinician checklist for assessing suitability of machine learning applications in healthcare

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100251 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100251

Author(s):

Ian Scott ◽

Stacey Carter ◽

Enrico Coiera

Keyword(s):

Machine Learning ◽

Large Scale ◽

Clinical Decision Making ◽

Improve Patient Care ◽

Clinical Decision ◽

Routine Care ◽

Machine Learning Algorithms ◽

Clinical Settings ◽

Machine Learning Applications ◽

Key Issues

Machine learning algorithms are being used to screen and diagnose disease, prognosticate and predict therapeutic responses. Hundreds of new algorithms are being developed, but whether they improve clinical decision making and patient outcomes remains uncertain. If clinicians are to use algorithms, they need to be reassured that key issues relating to their validity, utility, feasibility, safety and ethical use have been addressed. We propose a checklist of 10 questions that clinicians can ask of those advocating for the use of a particular algorithm, but which do not expect clinicians, as non-experts, to demonstrate mastery over what can be highly complex statistical and computational concepts. The questions are: (1) What is the purpose and context of the algorithm? (2) How good were the data used to train the algorithm? (3) Were there sufficient data to train the algorithm? (4) How well does the algorithm perform? (5) Is the algorithm transferable to new clinical settings? (6) Are the outputs of the algorithm clinically intelligible? (7) How will this algorithm fit into and complement current workflows? (8) Has use of the algorithm been shown to improve patient care and outcomes? (9) Could the algorithm cause patient harm? and (10) Does use of the algorithm raise ethical, legal or social concerns? We provide examples where an algorithm may raise concerns and apply the checklist to a recent review of diagnostic imaging applications. This checklist aims to assist clinicians in assessing algorithm readiness for routine care and identify situations where further refinement and evaluation is required prior to large-scale use.

Download Full-text

Analysis of local habitat selection and large-scale attraction/avoidance based on animal tracking data: is there a single best method?

Movement Ecology ◽

10.1186/s40462-021-00260-y ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Moritz Mercker ◽

Philipp Schwemmer ◽

Verena Peschko ◽

Leonie Enners ◽

Stefan Garthe

Keyword(s):

Habitat Selection ◽

Large Scale ◽

Error Rates ◽

Process Models ◽

Statistical Hypothesis ◽

Type I ◽

Tracking Data ◽

Selection Models ◽

Animal Tracking ◽

Step Selection

Abstract Background New wildlife telemetry and tracking technologies have become available in the last decade, leading to a large increase in the volume and resolution of animal tracking data. These technical developments have been accompanied by various statistical tools aimed at analysing the data obtained by these methods. Methods We used simulated habitat and tracking data to compare some of the different statistical methods frequently used to infer local resource selection and large-scale attraction/avoidance from tracking data. Notably, we compared spatial logistic regression models (SLRMs), spatio-temporal point process models (ST-PPMs), step selection models (SSMs), and integrated step selection models (iSSMs) and their interplay with habitat and animal movement properties in terms of statistical hypothesis testing. Results We demonstrated that only iSSMs and ST-PPMs showed nominal type I error rates in all studied cases, whereas SSMs may slightly and SLRMs may frequently and strongly exceed these levels. iSSMs appeared to have on average a more robust and higher statistical power than ST-PPMs. Conclusions Based on our results, we recommend the use of iSSMs to infer habitat selection or large-scale attraction/avoidance from animal tracking data. Further advantages over other approaches include short computation times, predictive capacity, and the possibility of deriving mechanistic movement models.

Download Full-text

Machine-learning based prediction of Cushing’s syndrome in dogs attending UK primary-care veterinary practice

Scientific Reports ◽

10.1038/s41598-021-88440-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Imogen Schofield ◽

David C. Brodbelt ◽

Noel Kennedy ◽

Stijn J. M. Niessen ◽

David B. Church ◽

...

Keyword(s):

Machine Learning ◽

Cushing’S Syndrome ◽

Clinical Decision Making ◽

Predictive Performance ◽

Clinical Decision ◽

Cushing's Syndrome ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods ◽

Clinical Records

AbstractCushing’s syndrome is an endocrine disease in dogs that negatively impacts upon the quality-of-life of affected animals. Cushing’s syndrome can be a challenging diagnosis to confirm, therefore new methods to aid diagnosis are warranted. Four machine-learning algorithms were applied to predict a future diagnosis of Cushing's syndrome, using structured clinical data from the VetCompass programme in the UK. Dogs suspected of having Cushing's syndrome were included in the analysis and classified based on their final reported diagnosis within their clinical records. Demographic and clinical features available at the point of first suspicion by the attending veterinarian were included within the models. The machine-learning methods were able to classify the recorded Cushing’s syndrome diagnoses, with good predictive performance. The LASSO penalised regression model indicated the best overall performance when applied to the test set with an AUROC = 0.85 (95% CI 0.80–0.89), sensitivity = 0.71, specificity = 0.82, PPV = 0.75 and NPV = 0.78. The findings of our study indicate that machine-learning methods could predict the future diagnosis of a practicing veterinarian. New approaches using these methods could support clinical decision-making and contribute to improved diagnosis of Cushing’s syndrome in dogs.

Download Full-text

Towards Vine Water Status Monitoring on a Large Scale Using Sentinel-2 Images

Remote Sensing ◽

10.3390/rs13091837 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1837

Author(s):

Eve Laroche-Pinel ◽

Sylvie Duthoit ◽

Mohanad Albughdadi ◽

Anne D. Costard ◽

Jacques Rousseau ◽

...

Keyword(s):

Climate Change ◽

Water Potential ◽

Large Scale ◽

Water Status ◽

Vegetation Indices ◽

Machine Learning Algorithms ◽

Severe Drought ◽

Stem Water Potential ◽

Stem Water ◽

Sentinel 2

Wine growing needs to adapt to confront climate change. In fact, the lack of water becomes more and more important in many regions. Whereas vineyards have been located in dry areas for decades, so they need special resilient varieties and/or a sufficient water supply at key development stages in case of severe drought. With climate change and the decrease of water availability, some vineyard regions face difficulties because of unsuitable variety, wrong vine management or due to the limited water access. Decision support tools are therefore required to optimize water use or to adapt agronomic practices. This study aimed at monitoring vine water status at a large scale with Sentinel-2 images. The goal was to provide a solution that would give spatialized and temporal information throughout the season on the water status of the vines. For this purpose, thirty six plots were monitored in total over three years (2018, 2019 and 2020). Vine water status was measured with stem water potential in field measurements from pea size to ripening stage. Simultaneously Sentinel-2 images were downloaded and processed to extract band reflectance values and compute vegetation indices. In our study, we tested five supervised regression machine learning algorithms to find possible relationships between stem water potential and data acquired from Sentinel-2 images (bands reflectance values and vegetation indices). Regression model using Red, NIR, Red-Edge and SWIR bands gave promising result to predict stem water potential (R2=0.40, RMSE=0.26).

Download Full-text

57 Precision neoantigen discovery using novel algorithms and expanded HLA-ligandome datasets

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0057 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A62-A62

Author(s):

Dattatreya Mellacheruvu ◽

Rachel Pyke ◽

Charles Abbott ◽

Nick Phillips ◽

Sejal Desai ◽

...

Keyword(s):

Machine Learning ◽

Cell Lines ◽

Antigen Processing ◽

Large Scale ◽

Prediction Models ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Training Data ◽

High Quality ◽

Tissue Samples

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.

Download Full-text