Deep neural networks for animal object detection and recognition in the wild

2019 ◽  
Author(s):  
◽  
Hayder Yousif

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Camera traps are a popular tool to sample animal populations because they are noninvasive, detect a variety of species, and can record many thousands of animal detections per deployment. Cameras are typically set to take bursts of multiple images for each detection, and are deployed in arrays of dozens or hundreds of sites, often resulting in millions of images per study. The task of converting images to animal detection records from such large image collections is daunting, and made worse by situations that generate copious empty pictures from false triggers (e.g. camera malfunction or moving vegetation) or pictures of humans. We offer the first widely available computer vision tool for processing camera trap images. Our results show that the tool is accurate and results in substantial time savings for processing large image datasets, thus improving our ability to monitor wildlife across large scales with camera traps. In this dissertation, we have developed new image/video processing and computer vision algorithms for efficient and accurate object detection and sequence-level classiffication from natural scene camera-trap images. This work addresses the following five major tasks: (1) Human-animal detection. We develop a fast and accurate scheme for human-animal detection from highly cluttered camera-trap images using joint background modeling and deep learning classification. Specifically, first, We develop an effective background modeling and subtraction scheme to generate region proposals for the foreground objects. We then develop a cross-frame image patch verification to reduce the number of foreground object proposals. Finally, We perform complexity-accuracy analysis of deep convolutional neural networks (DCNN) to develop a fast deep learning classification scheme to classify these region proposals into three categories: human, animals, and background patches. The optimized DCNN is able to maintain high level of accuracy while reducing the computational complexity by 14 times. Our experimental results demonstrate that the proposed method outperforms existing methods on the camera-trap dataset. (2) Object segmentation from natural scene. We first design and train a fast DCNN for animal-human-background object classification, which is used to analyze the input image to generate multi-layer feature maps, representing the responses of different image regions to the animal-human-background classifier. From these feature maps, we construct the so-called deep objectness graph for accurate animal-human object segmentation with graph cut. The segmented object regions from each image in the sequence are then verfied and fused in the temporal domain using background modeling. Our experimental results demonstrate that our proposed method outperforms existing state-of-the-art methods on the camera-trap dataset with highly cluttered natural scenes. (3) DCNN domain background modeling. We replaced the background model with a new more efficient deep learning based model. The input frames are segmented into regions through the deep objectness graph then the region boundaries of the input frames are multiplied by each other to obtain the regions of movement patches. We construct the background representation using the temporal information of the co-located patches. We propose to fuse the subtraction and foreground/background pixel classiffcation of two representation : a) chromaticity and b) deep pixel information. (4) Sequence-level object classiffcation. We proposed a new method for sequence-level video recognition with application to animal species recognition from camera trap images. First, using background modeling and cross-frame patch verification, we developed a scheme to generate candidate object regions or object proposals in the spatiotemporal domain. Second, we develop a dynamic programming optimization approach to identify the best temporal subset of object proposals. Third, we aggregate and fuse the features of these selected object proposals for efficient sequence-level animal species classification.

2019 ◽  
Author(s):  
Eric Devost ◽  
Sandra Lai ◽  
Nicolas Casajus ◽  
Dominique Berteaux

SUMMARYCamera traps now represent a reliable, efficient and cost-effective technique to monitor wildlife and collect biological data in the field. However, efficiently extracting information from the massive amount of images generated is often extremely time-consuming and may now represent the most rate-limiting step in camera trap studies.To help overcome this challenge, we developed FoxMask, a new tool performing the automatic detection of animal presence in short sequences of camera trap images. FoxMask uses background estimation and foreground segmentation algorithms to detect the presence of moving objects (most likely, animals) on images.We analyzed a sample dataset from camera traps used to monitor activity on arctic fox Vulpes lagopus dens to test the parameter settings and the performance of the algorithm. The shape and color of arctic foxes, their background at snowmelt and during the summer growing season were highly variable, thus offering challenging testing conditions. We compared the automated animal detection performed by FoxMask to a manual review of the image series.The performance analysis indicated that the proportion of images correctly classified by FoxMask as containing an animal or not was very high (> 90%). FoxMask is thus highly efficient at reducing the workload by eliminating most false triggers (images without an animal). We provide parameter recommendations to facilitate usage and we present the cases where the algorithm performs less efficiently to stimulate further development.FoxMask is an easy-to-use tool freely available to ecologists performing camera trap data extraction. By minimizing analytical time, computer-assisted image analysis will allow collection of increased sample sizes and testing of new biological questions.


2021 ◽  
Vol 43 (4) ◽  
pp. 139-151
Author(s):  
Nguyen Ai Tam ◽  
Nguyen Van Tay ◽  
Nguyen Thi Kim Yen ◽  
Ha Thang Long

Kon Ka Kinh National Park (KKK NP) is a priority zone for biodiversity protection in Vietnam as well as ASEAN. In order to survey the current fauna species diversity in the southern part of the KKK NP, we conducted camera trapping surveys in 2017, 2018, and 2019. 28 infrared camera traps were set up on elevations between 1041 to 1497 meters. In total, there were 360 days of survey using camera trap. As result, we recorded a total of 27 animal species of those, five species are listed in the IUCN Red List of Threatened Species (IUCN, 2020). The survey results showed a high richness of wildlife in the southern park region, and it also revealed human disturbance to wildlife in the park. The first-time camera trap was used for surveying wildlife diversity in the southern region of the KKK NP. Conducting camera trap surveys in the whole KKK NP is essential for monitoring and identifying priority areas for wildlife conservation in the national park.


Computers ◽  
2022 ◽  
Vol 11 (1) ◽  
pp. 13
Author(s):  
Imran Zualkernan ◽  
Salam Dhou ◽  
Jacky Judas ◽  
Ali Reza Sajun ◽  
Brylle Ryan Gomez ◽  
...  

Camera traps deployed in remote locations provide an effective method for ecologists to monitor and study wildlife in a non-invasive way. However, current camera traps suffer from two problems. First, the images are manually classified and counted, which is expensive. Second, due to manual coding, the results are often stale by the time they get to the ecologists. Using the Internet of Things (IoT) combined with deep learning represents a good solution for both these problems, as the images can be classified automatically, and the results immediately made available to ecologists. This paper proposes an IoT architecture that uses deep learning on edge devices to convey animal classification results to a mobile app using the LoRaWAN low-power, wide-area network. The primary goal of the proposed approach is to reduce the cost of the wildlife monitoring process for ecologists, and to provide real-time animal sightings data from the camera traps in the field. Camera trap image data consisting of 66,400 images were used to train the InceptionV3, MobileNetV2, ResNet18, EfficientNetB1, DenseNet121, and Xception neural network models. While performance of the trained models was statistically different (Kruskal–Wallis: Accuracy H(5) = 22.34, p < 0.05; F1-score H(5) = 13.82, p = 0.0168), there was only a 3% difference in the F1-score between the worst (MobileNet V2) and the best model (Xception). Moreover, the models made similar errors (Adjusted Rand Index (ARI) > 0.88 and Adjusted Mutual Information (AMU) > 0.82). Subsequently, the best model, Xception (Accuracy = 96.1%; F1-score = 0.87; F1-Score = 0.97 with oversampling), was optimized and deployed on the Raspberry Pi, Google Coral, and Nvidia Jetson edge devices using both TenorFlow Lite and TensorRT frameworks. Optimizing the models to run on edge devices reduced the average macro F1-Score to 0.7, and adversely affected the minority classes, reducing their F1-score to as low as 0.18. Upon stress testing, by processing 1000 images consecutively, Jetson Nano, running a TensorRT model, outperformed others with a latency of 0.276 s/image (s.d. = 0.002) while consuming an average current of 1665.21 mA. Raspberry Pi consumed the least average current (838.99 mA) with a ten times worse latency of 2.83 s/image (s.d. = 0.036). Nano was the only reasonable option as an edge device because it could capture most animals whose maximum speeds were below 80 km/h, including goats, lions, ostriches, etc. While the proposed architecture is viable, unbalanced data remain a challenge and the results can potentially be improved by using object detection to reduce imbalances and by exploring semi-supervised learning.


2018 ◽  
Vol 115 (25) ◽  
pp. E5716-E5725 ◽  
Author(s):  
Mohammad Sadegh Norouzzadeh ◽  
Anh Nguyen ◽  
Margaret Kosmala ◽  
Alexandra Swanson ◽  
Meredith S. Palmer ◽  
...  

Having accurate, detailed, and up-to-date information about the location and behavior of animals in the wild would improve our ability to study and conserve ecosystems. We investigate the ability to automatically, accurately, and inexpensively collect such data, which could help catalyze the transformation of many fields of ecology, wildlife biology, zoology, conservation biology, and animal behavior into “big data” sciences. Motion-sensor “camera traps” enable collecting wildlife pictures inexpensively, unobtrusively, and frequently. However, extracting information from these pictures remains an expensive, time-consuming, manual task. We demonstrate that such information can be automatically extracted by deep learning, a cutting-edge type of artificial intelligence. We train deep convolutional neural networks to identify, count, and describe the behaviors of 48 species in the 3.2 million-image Snapshot Serengeti dataset. Our deep neural networks automatically identify animals with >93.8% accuracy, and we expect that number to improve rapidly in years to come. More importantly, if our system classifies only images it is confident about, our system can automate animal identification for 99.3% of the data while still performing at the same 96.6% accuracy as that of crowdsourced teams of human volunteers, saving >8.4 y (i.e., >17,000 h at 40 h/wk) of human labeling effort on this 3.2 million-image dataset. Those efficiency gains highlight the importance of using deep neural networks to automate data extraction from camera-trap images, reducing a roadblock for this widely used technology. Our results suggest that deep learning could enable the inexpensive, unobtrusive, high-volume, and even real-time collection of a wealth of information about vast numbers of animals in the wild.


2018 ◽  
Author(s):  
Michael A. Tabak ◽  
Mohammad S. Norouzzadeh ◽  
David W. Wolfson ◽  
Steven J. Sweeney ◽  
Kurt C. VerCauteren ◽  
...  

Abstract1. Motion-activated cameras (“camera traps”) are increasingly used in ecological and management studies for remotely observing wildlife and have been regarded as among the most powerful tools for wildlife research. However, studies involving camera traps result in millions of images that need to be analyzed, typically by visually observing each image, in order to extract data that can be used in ecological analyses.2. We trained machine learning models using convolutional neural networks with the ResNet-18 architecture and 3,367,383 images to automatically classify wildlife species from camera trap images obtained from five states across the United States. We tested our model on an independent subset of images not seen during training from the United States and on an out-of-sample (or “out-of-distribution” in the machine learning literature) dataset of ungulate images from Canada. We also tested the ability of our model to distinguish empty images from those with animals in another out-of-sample dataset from Tanzania, containing a faunal community that was novel to the model.3. The trained model classified approximately 2,000 images per minute on a laptop computer with 16 gigabytes of RAM. The trained model achieved 98% accuracy at identifying species in the United States, the highest accuracy of such a model to date. Out-of-sample validation from Canada achieved 82% accuracy, and correctly identified 94% of images containing an animal in the dataset from Tanzania. We provide an R package (Machine Learning for Wildlife Image Classification; MLWIC) that allows the users to A) implement the trained model presented here and B) train their own model using classified images of wildlife from their studies.4. The use of machine learning to rapidly and accurately classify wildlife in camera trap images can facilitate non-invasive sampling designs in ecological studies by reducing the burden of manually analyzing images. We present an R package making these methods accessible to ecologists. We discuss the implications of this technology for ecology and considerations that should be addressed in future implementations of these methods.


Author(s):  
Matthew Kutugata ◽  
Jeremy Baumgardt ◽  
John A. Goolsby ◽  
Alexis E. Racelis

Abstract Camera traps provide a low-cost approach to collect data and monitor wildlife across large scales but hand-labeling images at a rate that outpaces accumulation is difficult. Deep learning, a subdiscipline of machine learning and computer science, can address the issue of automatically classifying camera-trap images with a high degree of accuracy. This technique, however, may be less accessible to ecologists or small-scale conservation projects, and has serious limitations. In this study, we trained a simple deep learning model using a dataset of 120,000 images to identify the presence of nilgai Boselaphus tragocamelus, a regionally specific nonnative game animal, in camera-trap images with an overall accuracy of 97%. We trained a second model to identify 20 groups of animals and one group of images without any animals present, labeled as “none,” with an accuracy of 89%. Lastly, we tested the multigroup model on images collected of similar species, but in the southwestern United States, resulting in significantly lower precision and recall for each group. This study highlights the potential of deep learning for automating camera-trap image processing workflows, provides a brief overview of image-based deep learning, and discusses the often-understated limitations and methodological considerations in the context of wildlife conservation and species monitoring.


Author(s):  
A. G. Zotin ◽  
A. V. Proskurin

<p><strong>Abstract.</strong> Camera traps providing enormous number of images during a season help to observe remotely animals in the wild. However, analysis of such image collection manually is impossible. In this research, we develop a method for automatic animal detection based on background modeling of scene under complex shooting. First, we design a fast algorithm for image selection without motions. Second, the images are processed by modified Multi-Scale Retinex algorithm in order to align uneven illumination. Finally, background is subtracted from incoming image using adaptive threshold. A threshold value is adjusted by saliency map, which is calculated using pyramid consisting of the original image and images modified by MSR algorithm. Proposed method allows to achieve high estimators of animals detection.</p>


2018 ◽  
Vol 10 (1) ◽  
pp. 80-91 ◽  
Author(s):  
Marco Willi ◽  
Ross T. Pitman ◽  
Anabelle W. Cardoso ◽  
Christina Locke ◽  
Alexandra Swanson ◽  
...  

2020 ◽  
Vol E103.B (12) ◽  
pp. 1394-1402
Author(s):  
Hiroshi SAITO ◽  
Tatsuki OTAKE ◽  
Hayato KATO ◽  
Masayuki TOKUTAKE ◽  
Shogo SEMBA ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document