scholarly journals Weakly Supervised Faster-RCNN+FPN to classify small animals in camera trap images

Author(s):  
Pierrick Pochelu ◽  
Clara Erard ◽  
Philippe Cordier ◽  
Serge G. Petiton ◽  
Bruno Conche

<div>Camera traps have revolutionized animal research of many species that were previously nearly impossible to observe due to their habitat or behavior.</div><div>Deep learning has the potential to overcome the workload to the class automatically those images according to taxon or empty images. However, a standard deep neural network classifier fails because animals often represent a small portion of the high-definition images. Therefore, we propose a workflow named Weakly Object Detection Faster-RCNN+FPN which suits this challenge. The model is weakly supervised because it requires only the animal taxon label per image but doesn't require any manual bounding box annotations. First, it automatically performs the weakly supervised bounding box annotation using the motion from multiple frames. Then, it trains a Faster-RCNN+FPN model using this weak supervision.<br></div><div>Experimental results have been obtained on two datasets and an easily reproducible testbed.</div>

2021 ◽  
Author(s):  
Pierrick Pochelu ◽  
Clara Erard ◽  
Philippe Cordier ◽  
Serge G. Petiton ◽  
Bruno Conche

<div>Camera traps have revolutionized animal research of many species that were previously nearly impossible to observe due to their habitat or behavior.</div><div>Deep learning has the potential to overcome the workload to the class automatically those images according to taxon or empty images. However, a standard deep neural network classifier fails because animals often represent a small portion of the high-definition images. Therefore, we propose a workflow named Weakly Object Detection Faster-RCNN+FPN which suits this challenge. The model is weakly supervised because it requires only the animal taxon label per image but doesn't require any manual bounding box annotations. First, it automatically performs the weakly supervised bounding box annotation using the motion from multiple frames. Then, it trains a Faster-RCNN+FPN model using this weak supervision.<br></div><div>Experimental results have been obtained on two datasets and an easily reproducible testbed.</div>


2020 ◽  
Vol 34 (2) ◽  
pp. 165-180 ◽  
Author(s):  
Clemens-Alexander Brust ◽  
Christoph Käding ◽  
Joachim Denzler

Abstract Large amounts of labeled training data are one of the main contributors to the great success that deep models have achieved in the past. Label acquisition for tasks other than benchmarks can pose a challenge due to requirements of both funding and expertise. By selecting unlabeled examples that are promising in terms of model improvement and only asking for respective labels, active learning can increase the efficiency of the labeling process in terms of time and cost. In this work, we describe combinations of an incremental learning scheme and methods of active learning. These allow for continuous exploration of newly observed unlabeled data. We describe selection criteria based on model uncertainty as well as expected model output change (EMOC). An object detection task is evaluated in a continuous exploration context on the PASCAL VOC dataset. We also validate a weakly supervised system based on active and incremental learning in a real-world biodiversity application where images from camera traps are analyzed. Labeling only 32 images by accepting or rejecting proposals generated by our method yields an increase in accuracy from 25.4 to 42.6%.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0247536
Author(s):  
Bart J. Harmsen ◽  
Nicola Saville ◽  
Rebecca J. Foster

Population assessments of wide-ranging, cryptic, terrestrial mammals rely on camera trap surveys. While camera trapping is a powerful method of detecting presence, it is difficult distinguishing rarity from low detection rate. The margay (Leopardus wiedii) is an example of a species considered rare based on its low detection rates across its range. Although margays have a wide distribution, detection rates with camera traps are universally low; consequently, the species is listed as Near Threatened. Our 12-year camera trap study of margays in protected broadleaf forest in Belize suggests that while margays have low detection rate, they do not seem to be rare, rather that they are difficult to detect with camera traps. We detected a maximum of 187 individuals, all with few or no recaptures over the years (mean = 2.0 captures/individual ± SD 2.1), with two-thirds of individuals detected only once. The few individuals that were recaptured across years exhibited long tenures up to 9 years and were at least 10 years old at their final detection. We detected multiple individuals of both sexes at the same locations during the same survey, suggesting overlapping ranges with non-exclusive territories, providing further evidence of a high-density population. By studying the sparse annual datasets across multiple years, we found evidence of an abundant margay population in the forest of the Cockscomb Basin, which might have been deemed low density and rare, if studied in the short term. We encourage more long-term camera trap studies to assess population status of semi-arboreal carnivore species that have hitherto been considered rare based on low detection rates.


Author(s):  
Yutong Wang ◽  
Jiyuan Zheng ◽  
Qijiong Liu ◽  
Zhou Zhao ◽  
Jun Xiao ◽  
...  

Automatic question generation according to an answer within the given passage is useful for many applications, such as question answering system, dialogue system, etc. Current neural-based methods mostly take two steps which extract several important sentences based on the candidate answer through manual rules or supervised neural networks and then use an encoder-decoder framework to generate questions about these sentences. These approaches still acquire two steps and neglect the semantic relations between the answer and the context of the whole passage which is sometimes necessary for answering the question. To address this problem, we propose the Weakly Supervision Enhanced Generative Network (WeGen) which automatically discovers relevant features of the passage given the answer span in a weakly supervised manner to improve the quality of generated questions. More specifically, we devise a discriminator, Relation Guider, to capture the relations between the passage and the associated answer and then the Multi-Interaction mechanism is deployed to transfer the knowledge dynamically for our question generation system. Experiments show the effectiveness of our method in both automatic evaluations and human evaluations.


2019 ◽  
Author(s):  
Sadoune Ait Kaci Azzou ◽  
Liam Singer ◽  
Thierry Aebischer ◽  
Madleina Caduff ◽  
Beat Wolf ◽  
...  

SummaryCamera traps and acoustic recording devices are essential tools to quantify the distribution, abundance and behavior of mobile species. Varying detection probabilities among device locations must be accounted for when analyzing such data, which is generally done using occupancy models. We introduce a Bayesian Time-dependent Observation Model for Camera Trap data (Tomcat), suited to estimate relative event densities in space and time. Tomcat allows to learn about the environmental requirements and daily activity patterns of species while accounting for imperfect detection. It further implements a sparse model that deals well will a large number of potentially highly correlated environmental variables. By integrating both spatial and temporal information, we extend the notation of overlap coefficient between species to time and space to study niche partitioning. We illustrate the power of Tomcat through an application to camera trap data of eight sympatrically occurring duiker Cephalophinae species in the savanna - rainforest ecotone in the Central African Republic and show that most species pairs show little overlap. Exceptions are those for which one species is very rare, likely as a result of direct competition.


2020 ◽  
Author(s):  
Thel Lucie ◽  
Chamaillé-Jammes Simon ◽  
Keurinck Léa ◽  
Catala Maxime ◽  
Packer Craig ◽  
...  

AbstractEcologists increasingly rely on camera trap data to estimate a wide range of biological parameters such as occupancy, population abundance or activity patterns. Because of the huge amount of data collected, the assistance of non-scientists is often sought after, but an assessment of the data quality is a prerequisite to their use.We tested whether citizen science data from one of the largest citizen science projects - Snapshot Serengeti - could be used to study breeding phenology, an important life-history trait. In particular, we tested whether the presence of juveniles (less than one or 12 months old) of three ungulate species in the Serengeti: topi Damaliscus jimela, kongoni Alcelaphus buselaphus and Grant’s gazelle Nanger granti could be reliably detected by the “naive” volunteers vs. trained observers. We expected a positive correlation between the proportion of volunteers identifying juveniles and their effective presence within photographs, assessed by the trained observers.We first checked the agreement between the trained observers for age classes and species and found a good agreement between them (Fleiss’ κ > 0.61 for juveniles of less than one and 12 month(s) old), suggesting that morphological criteria can be used successfully to determine age. The relationship between the proportion of volunteers detecting juveniles less than a month old and their actual presence plateaued at 0.45 for Grant’s gazelle and reached 0.70 for topi and 0.56 for kongoni. The same relationships were however much stronger for juveniles younger than 12 months, to the point that their presence was perfectly detected by volunteers for topi and kongoni.Volunteers’ classification allows a rough, moderately accurate, but quick, sorting of photograph sequences with/without juveniles. Obtaining accurate data however appears more difficult. We discuss the limitations of using citizen science camera traps data to study breeding phenology, and the options to improve the detection of juveniles, such as the addition of aging criteria on the online citizen science platforms, or the use of machine learning.


2020 ◽  
Vol 20 (4) ◽  
Author(s):  
Paula Ribeiro Prist ◽  
Guilherme S. T. Garbino ◽  
Fernanda Delborgo Abra ◽  
Thais Pagotto ◽  
Osnir Ormon Giacon

Abstract The water opossum (Chironectes minimus) is a semi-aquatic mammal that is infrequently sampled in Atlantic rainforest areas in Brazil. Here we report on new records of C. minimus in the state of São Paulo, southeastern Brazil, and comment on its behavior and ecology. We placed nine camera traps in culverts and cattle boxes under a highway, between 2017 and 2019. From a total of 6,750 camera-trap-days, we obtained 16 records of C. minimus (0.24 records/100 camera-trap-days) in two cameras placed in culverts over streams. Most of the records were made between May and August, in the dry season and in the first six hours after sunset. The new records are from a highly degraded area with some riparian forests. The records lie approximately 30 km away from the nearest protected area where the species is known to occur. We suggest that C. minimus has some tolerance to degraded habitats, as long as the water bodies and riparian forests are minimally preserved. The new records presented here also fill a distribution gap in western São Paulo state.


2018 ◽  
Vol 40 (1) ◽  
pp. 118 ◽  
Author(s):  
Bronwyn A. Fancourt ◽  
Mark Sweaney ◽  
Don B. Fletcher

Camera traps are being used increasingly for wildlife management and research. When choosing camera models, practitioners often consider camera trigger speed to be one of the most important factors to maximise species detections. However, factors such as detection zone will also influence detection probability. As part of a rabbit eradication program, we performed a pilot study to compare rabbit (Oryctolagus cuniculus) detections using the Reconyx PC900 (faster trigger speed, narrower detection zone) and the Ltl Acorn Ltl-5310A (slower trigger speed, wider detection zone). Contrary to our predictions, the slower-trigger-speed cameras detected rabbits more than twice as often as the faster-trigger-speed cameras, suggesting that the wider detection zone more than compensated for the relatively slower trigger time. We recommend context-specific field trials to ensure cameras are appropriate for the required purpose. Missed detections could lead to incorrect inferences and potentially misdirected management actions.


2019 ◽  
Author(s):  
◽  
Hayder Yousif

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Camera traps are a popular tool to sample animal populations because they are noninvasive, detect a variety of species, and can record many thousands of animal detections per deployment. Cameras are typically set to take bursts of multiple images for each detection, and are deployed in arrays of dozens or hundreds of sites, often resulting in millions of images per study. The task of converting images to animal detection records from such large image collections is daunting, and made worse by situations that generate copious empty pictures from false triggers (e.g. camera malfunction or moving vegetation) or pictures of humans. We offer the first widely available computer vision tool for processing camera trap images. Our results show that the tool is accurate and results in substantial time savings for processing large image datasets, thus improving our ability to monitor wildlife across large scales with camera traps. In this dissertation, we have developed new image/video processing and computer vision algorithms for efficient and accurate object detection and sequence-level classiffication from natural scene camera-trap images. This work addresses the following five major tasks: (1) Human-animal detection. We develop a fast and accurate scheme for human-animal detection from highly cluttered camera-trap images using joint background modeling and deep learning classification. Specifically, first, We develop an effective background modeling and subtraction scheme to generate region proposals for the foreground objects. We then develop a cross-frame image patch verification to reduce the number of foreground object proposals. Finally, We perform complexity-accuracy analysis of deep convolutional neural networks (DCNN) to develop a fast deep learning classification scheme to classify these region proposals into three categories: human, animals, and background patches. The optimized DCNN is able to maintain high level of accuracy while reducing the computational complexity by 14 times. Our experimental results demonstrate that the proposed method outperforms existing methods on the camera-trap dataset. (2) Object segmentation from natural scene. We first design and train a fast DCNN for animal-human-background object classification, which is used to analyze the input image to generate multi-layer feature maps, representing the responses of different image regions to the animal-human-background classifier. From these feature maps, we construct the so-called deep objectness graph for accurate animal-human object segmentation with graph cut. The segmented object regions from each image in the sequence are then verfied and fused in the temporal domain using background modeling. Our experimental results demonstrate that our proposed method outperforms existing state-of-the-art methods on the camera-trap dataset with highly cluttered natural scenes. (3) DCNN domain background modeling. We replaced the background model with a new more efficient deep learning based model. The input frames are segmented into regions through the deep objectness graph then the region boundaries of the input frames are multiplied by each other to obtain the regions of movement patches. We construct the background representation using the temporal information of the co-located patches. We propose to fuse the subtraction and foreground/background pixel classiffcation of two representation : a) chromaticity and b) deep pixel information. (4) Sequence-level object classiffcation. We proposed a new method for sequence-level video recognition with application to animal species recognition from camera trap images. First, using background modeling and cross-frame patch verification, we developed a scheme to generate candidate object regions or object proposals in the spatiotemporal domain. Second, we develop a dynamic programming optimization approach to identify the best temporal subset of object proposals. Third, we aggregate and fuse the features of these selected object proposals for efficient sequence-level animal species classification.


Sign in / Sign up

Export Citation Format

Share Document