Automatic Recognition of Flock Behavior of Chickens with Convolutional Neural Network and Kinect Sensor

Author(s):  
Haitao Pu ◽  
Jian Lian ◽  
Mingqu Fan

In this paper, we propose an automatic convolutional neural network (CNN)-based method to recognize the chicken behavior within a poultry farm using a Kinect sensor. It resolves the hardships in flock behavior image classification by leveraging a data-driven mechanism and exploiting non-manually extracted multi-scale image features which combine both the local and global characteristics of the image. To our best knowledge, this is probably the first attempt of deep learning strategy in the field of domestic animal behavior recognition. To testify the performance of our proposed method, we conducted experiments between state-of-the-art methods and our method. Experimental results witness that our proposed approach outperforms the state-of-the-art methods both in effectiveness and efficiency. Our proposed CNN architecture for recognizing flock behavior of chickens produces an extremely impressive accuracy of 99.17%.

2018 ◽  
Vol 4 (9) ◽  
pp. 107 ◽  
Author(s):  
Mohib Ullah ◽  
Ahmed Mohammed ◽  
Faouzi Alaya Cheikh

Articulation modeling, feature extraction, and classification are the important components of pedestrian segmentation. Usually, these components are modeled independently from each other and then combined in a sequential way. However, this approach is prone to poor segmentation if any individual component is weakly designed. To cope with this problem, we proposed a spatio-temporal convolutional neural network named PedNet which exploits temporal information for spatial segmentation. The backbone of the PedNet consists of an encoder–decoder network for downsampling and upsampling the feature maps, respectively. The input to the network is a set of three frames and the output is a binary mask of the segmented regions in the middle frame. Irrespective of classical deep models where the convolution layers are followed by a fully connected layer for classification, PedNet is a Fully Convolutional Network (FCN). It is trained end-to-end and the segmentation is achieved without the need of any pre- or post-processing. The main characteristic of PedNet is its unique design where it performs segmentation on a frame-by-frame basis but it uses the temporal information from the previous and the future frame for segmenting the pedestrian in the current frame. Moreover, to combine the low-level features with the high-level semantic information learned by the deeper layers, we used long-skip connections from the encoder to decoder network and concatenate the output of low-level layers with the higher level layers. This approach helps to get segmentation map with sharp boundaries. To show the potential benefits of temporal information, we also visualized different layers of the network. The visualization showed that the network learned different information from the consecutive frames and then combined the information optimally to segment the middle frame. We evaluated our approach on eight challenging datasets where humans are involved in different activities with severe articulation (football, road crossing, surveillance). The most common CamVid dataset which is used for calculating the performance of the segmentation algorithm is evaluated against seven state-of-the-art methods. The performance is shown on precision/recall, F 1 , F 2 , and mIoU. The qualitative and quantitative results show that PedNet achieves promising results against state-of-the-art methods with substantial improvement in terms of all the performance metrics.


Author(s):  
Jianwen Jiang ◽  
Di Bao ◽  
Ziqiang Chen ◽  
Xibin Zhao ◽  
Yue Gao

3D shape retrieval has attracted much attention and become a hot topic in computer vision field recently.With the development of deep learning, 3D shape retrieval has also made great progress and many view-based methods have been introduced in recent years. However, how to represent 3D shapes better is still a challenging problem. At the same time, the intrinsic hierarchical associations among views still have not been well utilized. In order to tackle these problems, in this paper, we propose a multi-loop-view convolutional neural network (MLVCNN) framework for 3D shape retrieval. In this method, multiple groups of views are extracted from different loop directions first. Given these multiple loop views, the proposed MLVCNN framework introduces a hierarchical view-loop-shape architecture, i.e., the view level, the loop level, and the shape level, to conduct 3D shape representation from different scales. In the view-level, a convolutional neural network is first trained to extract view features. Then, the proposed Loop Normalization and LSTM are utilized for each loop of view to generate the loop-level features, which considering the intrinsic associations of the different views in the same loop. Finally, all the loop-level descriptors are combined into a shape-level descriptor for 3D shape representation, which is used for 3D shape retrieval. Our proposed method has been evaluated on the public 3D shape benchmark, i.e., ModelNet40. Experiments and comparisons with the state-of-the-art methods show that the proposed MLVCNN method can achieve significant performance improvement on 3D shape retrieval tasks. Our MLVCNN outperforms the state-of-the-art methods by the mAP of 4.84% in 3D shape retrieval task. We have also evaluated the performance of the proposed method on the 3D shape classification task where MLVCNN also achieves superior performance compared with recent methods.


2020 ◽  
Vol 10 (11) ◽  
pp. 2733-2738
Author(s):  
Yanxia Sun ◽  
Peiqing ◽  
Xiaoxu Geng ◽  
Haiying Wang ◽  
Jinke Wang ◽  
...  

Accurate optic cup and optic disc (OC, OD) segmentation is the prerequisite for cup-disc ratio (CDR) calculation. In this paper, a new full convolutional neural network (FCN) with multi-scale residual module is proposed. Firstly, polar coordinate transformation was introduced to balance the CDR with space constraints, and CLAHE was implemented in fundus images for contrast enhancement. Secondly, W-Net-R model was proposed as the main framework, while the standard convolution unit was replaced by the multi-scale residual module. Finally, the multi-label cost function is utilized to guide its functioning. In the experiment, the REFUGE dataset was used for training, validation and testing. We obtained 0.979 and 0.904 for OD and OC segmentations on MIoU, which indicates a relative improvement of 4.04% and 3.55%, comparing with that of U-Net, respectively. Experiment results proved that our proposed method is superior to other state-of-the-art schemes on OC and OD segmentation, and could be a potential prospective tool for early screening of glaucoma.


2021 ◽  
Vol 303 ◽  
pp. 01058
Author(s):  
Meng-Di Deng ◽  
Rui-Sheng Jia ◽  
Hong-Mei Sun ◽  
Xing-Li Zhang

The resolution of seismic section images can directly affect the subsequent interpretation of seismic data. In order to improve the spatial resolution of low-resolution seismic section images, a super-resolution reconstruction method based on multi-scale convolution is proposed. This method designs a multi-scale convolutional neural network to learn high-low resolution image feature pairs, and realizes mapping learning from low-resolution seismic section images to high-resolution seismic section images. This multi-scale convolutional neural network model consists of four convolutional layers and a sub-pixel convolutional layer. Convolution operations are used to learn abundant seismic section image features, and sub-pixel convolution layer is used to reconstruct high-resolution seismic section image. The experimental results show that the proposed method is superior to the comparison method in peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). In the total training time and reconstruction time, our method is about 22% less than the FSRCNN method and about 18% less than the ESPCN method.


2020 ◽  
Author(s):  
Fábia Isabella Pires Enembreck ◽  
Erikson Freitas de Morais ◽  
Marcella Scoczynski Ribeiro Martins

Abstract The person re-identification problem addresses the task of identify if a person being watched by security cameras in surveillance environments has ever been in the scene. This problem is considered challenging, since the images obtained by cameras are subject to many variations, such as lighting, perspective and occlusions. This work aims to develop two robust approaches based on deep learning techniques for person re-identification, considering these variations. The first approach uses a Siamese neural network composed by two identical subnets. This model receives two input images that may or may not be from the same person. The second approach consists of a triplet neural network, with three identical subnets, which receives a reference image from a certain person, a second image from the same person and another image from a different person. Both approaches have identical subnets, composed by a convolutional neural network which extracts general characteristics from each image and an autoencoder model, responsible for addressing high variations that input images may undergo. To compare the developed networks, three datasets were used, and the accuracy and the CMC curve metrics were applied for the analysis. The experiments showed an improvement in the results with the use of the autoencoder in the subnets. Besides, Triplet Neural Network presented promising results in comparison with Siamese Neural Network and state-of-the-art methods.


Author(s):  
K. Rahmani ◽  
H. Mayer

In this paper we present a pipeline for high quality semantic segmentation of building facades using Structured Random Forest (SRF), Region Proposal Network (RPN) based on a Convolutional Neural Network (CNN) as well as rectangular fitting optimization. Our main contribution is that we employ features created by the RPN as channels in the SRF.We empirically show that this is very effective especially for doors and windows. Our pipeline is evaluated on two datasets where we outperform current state-of-the-art methods. Additionally, we quantify the contribution of the RPN and the rectangular fitting optimization on the accuracy of the result.


Author(s):  
Murali Kanthi ◽  
Thogarcheti Hitendra Sarma ◽  
Chigarapalle Shoba Bindu

Deep Learning methods are state-of-the-art approaches for pixel-based hyperspectral images (HSI) classification. High classification accuracy has been achieved by extracting deep features from both spatial-spectral channels. However, the efficiency of such spatial-spectral approaches depends on the spatial dimension of each patch and there is no theoretically valid approach to find the optimum spatial dimension to be considered. It is more valid to extract spatial features by considering varying neighborhood scales in spatial dimensions. In this regard, this article proposes a deep convolutional neural network (CNN) model wherein three different multi-scale spatial-spectral patches are used to extract the features in both the spatial and spectral channels. In order to extract these potential features, the proposed deep learning architecture takes three patches various scales in spatial dimension. 3D convolution is performed on each selected patch and the process runs through entire image. The proposed is named as multi-scale three-dimensional convolutional neural network (MS-3DCNN). The efficiency of the proposed model is being verified through the experimental studies on three publicly available benchmark datasets including Pavia University, Indian Pines, and Salinas. It is empirically proved that the classification accuracy of the proposed model is improved when compared with the remaining state-of-the-art methods.


2020 ◽  
Vol 4 (1) ◽  
pp. 87-107
Author(s):  
Ranjan Mondal ◽  
Moni Shankar Dey ◽  
Bhabatosh Chanda

AbstractMathematical morphology is a powerful tool for image processing tasks. The main difficulty in designing mathematical morphological algorithm is deciding the order of operators/filters and the corresponding structuring elements (SEs). In this work, we develop morphological network composed of alternate sequences of dilation and erosion layers, which depending on learned SEs, may form opening or closing layers. These layers in the right order along with linear combination (of their outputs) are useful in extracting image features and processing them. Structuring elements in the network are learned by back-propagation method guided by minimization of the loss function. Efficacy of the proposed network is established by applying it to two interesting image restoration problems, namely de-raining and de-hazing. Results are comparable to that of many state-of-the-art algorithms for most of the images. It is also worth mentioning that the number of network parameters to handle is much less than that of popular convolutional neural network for similar tasks. The source code can be found here https://github.com/ranjanZ/Mophological-Opening-Closing-Net


Sign in / Sign up

Export Citation Format

Share Document