Fusion of Deep Learning Features for Wild Animal Detection

Author(s):  
Anamika Dhillon ◽  
Gyanendra K. Verma
2020 ◽  
Vol E103.B (12) ◽  
pp. 1394-1402
Author(s):  
Hiroshi SAITO ◽  
Tatsuki OTAKE ◽  
Hayato KATO ◽  
Masayuki TOKUTAKE ◽  
Shogo SEMBA ◽  
...  

Author(s):  
Amal A. Moustafa ◽  
Ahmed Elnakib ◽  
Nihal F. F. Areed

This paper presents a methodology for Age-Invariant Face Recognition (AIFR), based on the optimization of deep learning features. The proposed method extracts deep learning features using transfer deep learning, extracted from the unprocessed face images. To optimize the extracted features, a Genetic Algorithm (GA) procedure is designed in order to select the most relevant features to the problem of identifying a person based on his/her facial images over different ages. For classification, K-Nearest Neighbor (KNN) classifiers with different distance metrics are investigated, i.e., Correlation, Euclidian, Cosine, and Manhattan distance metrics. Experimental results using a Manhattan distance KNN classifier achieves the best Rank-1 recognition rate of 86.2% and 96% on the standard FGNET and MORPH datasets, respectively. Compared to the state-of-the-art methods, our proposed method needs no preprocessing stages. In addition, the experiments show its privilege over other related methods.


2019 ◽  
Author(s):  
◽  
Hayder Yousif

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Camera traps are a popular tool to sample animal populations because they are noninvasive, detect a variety of species, and can record many thousands of animal detections per deployment. Cameras are typically set to take bursts of multiple images for each detection, and are deployed in arrays of dozens or hundreds of sites, often resulting in millions of images per study. The task of converting images to animal detection records from such large image collections is daunting, and made worse by situations that generate copious empty pictures from false triggers (e.g. camera malfunction or moving vegetation) or pictures of humans. We offer the first widely available computer vision tool for processing camera trap images. Our results show that the tool is accurate and results in substantial time savings for processing large image datasets, thus improving our ability to monitor wildlife across large scales with camera traps. In this dissertation, we have developed new image/video processing and computer vision algorithms for efficient and accurate object detection and sequence-level classiffication from natural scene camera-trap images. This work addresses the following five major tasks: (1) Human-animal detection. We develop a fast and accurate scheme for human-animal detection from highly cluttered camera-trap images using joint background modeling and deep learning classification. Specifically, first, We develop an effective background modeling and subtraction scheme to generate region proposals for the foreground objects. We then develop a cross-frame image patch verification to reduce the number of foreground object proposals. Finally, We perform complexity-accuracy analysis of deep convolutional neural networks (DCNN) to develop a fast deep learning classification scheme to classify these region proposals into three categories: human, animals, and background patches. The optimized DCNN is able to maintain high level of accuracy while reducing the computational complexity by 14 times. Our experimental results demonstrate that the proposed method outperforms existing methods on the camera-trap dataset. (2) Object segmentation from natural scene. We first design and train a fast DCNN for animal-human-background object classification, which is used to analyze the input image to generate multi-layer feature maps, representing the responses of different image regions to the animal-human-background classifier. From these feature maps, we construct the so-called deep objectness graph for accurate animal-human object segmentation with graph cut. The segmented object regions from each image in the sequence are then verfied and fused in the temporal domain using background modeling. Our experimental results demonstrate that our proposed method outperforms existing state-of-the-art methods on the camera-trap dataset with highly cluttered natural scenes. (3) DCNN domain background modeling. We replaced the background model with a new more efficient deep learning based model. The input frames are segmented into regions through the deep objectness graph then the region boundaries of the input frames are multiplied by each other to obtain the regions of movement patches. We construct the background representation using the temporal information of the co-located patches. We propose to fuse the subtraction and foreground/background pixel classiffcation of two representation : a) chromaticity and b) deep pixel information. (4) Sequence-level object classiffcation. We proposed a new method for sequence-level video recognition with application to animal species recognition from camera trap images. First, using background modeling and cross-frame patch verification, we developed a scheme to generate candidate object regions or object proposals in the spatiotemporal domain. Second, we develop a dynamic programming optimization approach to identify the best temporal subset of object proposals. Third, we aggregate and fuse the features of these selected object proposals for efficient sequence-level animal species classification.


Sign in / Sign up

Export Citation Format

Share Document