Information theory based pruning for CNN compression and its application to image classification and action recognition

Author(s):  
Hai-Hong Phan ◽  
Ngoc-Son Vu
2019 ◽  
Vol 277 ◽  
pp. 02034
Author(s):  
Sophie Aubry ◽  
Sohaib Laraba ◽  
Joëlle Tilmanne ◽  
Thierry Dutoit

In this paper a methodology to recognize actions based on RGB videos is proposed which takes advantages of the recent breakthrough made in deep learning. Following the development of Convolutional Neural Networks (CNNs), research was conducted on the transformation of skeletal motion data into 2D images. In this work, a solution is proposed requiring only the use of RGB videos instead of RGB-D videos. This work is based on multiple works studying the conversion of RGB-D data into 2D images. From a video stream (RGB images), a two-dimension skeleton of 18 joints for each detected body is extracted with a DNN-based human pose estimator called OpenPose. The skeleton data are encoded into Red, Green and Blue channels of images. Different ways of encoding motion data into images were studied. We successfully use state-of-the-art deep neural networks designed for image classification to recognize actions. Based on a study of the related works, we chose to use image classification models: SqueezeNet, AlexNet, DenseNet, ResNet, Inception, VGG and retrained them to perform action recognition. For all the test the NTU RGB+D database is used. The highest accuracy is obtained with ResNet: 83.317% cross-subject and 88.780% cross-view which outperforms most of state-of-the-art results.


2021 ◽  
Vol 40 ◽  
pp. 03014
Author(s):  
Ritik Pandey ◽  
Yadnesh Chikhale ◽  
Ritik Verma ◽  
Deepali Patil

Human action recognition has become an important research area in the fields of computer vision, image processing, and human-machine or human-object interaction due to its large number of real time applications. Action recognition is the identification of different actions from video clips (an arrangement of 2D frames) where the action may be performed in the video. This is a general construction of image classification tasks to multiple frames and then collecting the predictions from each frame. Different approaches are proposed in literature to improve the accuracy in recognition. In this paper we proposed a deep learning based model for Recognition and the main focus is on the CNN model for image classification. The action videos are converted into frames and pre-processed before sending to our model for recognizing different actions accurately..


2018 ◽  
Vol 310 ◽  
pp. 277-286 ◽  
Author(s):  
Yue Song ◽  
Yang Liu ◽  
Quanxue Gao ◽  
Xinbo Gao ◽  
Feiping Nie ◽  
...  

Author(s):  
Chongwen Liu ◽  
Zhaowei Shang ◽  
Bo Lin ◽  
Yuan Yan Tang

The multi-task learning (MTL) methods consider learning a problem together with other related problems simultaneously. The major challenge of MTL is how to selectively screen the shared information. The information of each task must be related to the others, but when sharing information between two unrelated tasks it degenerates the performance of both tasks. To ensure the related problems are related to the main task is the most important point in MTL. In this paper, we will design a novel algorithm to calculate the degrees of relationship among tasks by using a semantical space of features in each task and then build semantical tree to achieve better learning performance. We propose an MTL method under this algorithm which achieves good experimental performance. Our experiments are taken on both image classification and video action recognition, compared with the state-of-the-art MTL methods. Our method proposes good performance in the four public datasets.


2016 ◽  
Vol 55 ◽  
pp. 64-76 ◽  
Author(s):  
Teng Li ◽  
Zhijun Meng ◽  
Bingbing Ni ◽  
Jianbing Shen ◽  
Meng Wang

Author(s):  
Charles A. Doan ◽  
Ronaldo Vigo

Abstract. Several empirical investigations have explored whether observers prefer to sort sets of multidimensional stimuli into groups by employing one-dimensional or family-resemblance strategies. Although one-dimensional sorting strategies have been the prevalent finding for these unsupervised classification paradigms, several researchers have provided evidence that the choice of strategy may depend on the particular demands of the task. To account for this disparity, we propose that observers extract relational patterns from stimulus sets that facilitate the development of optimal classification strategies for relegating category membership. We conducted a novel constrained categorization experiment to empirically test this hypothesis by instructing participants to either add or remove objects from presented categorical stimuli. We employed generalized representational information theory (GRIT; Vigo, 2011b , 2013a , 2014 ) and its associated formal models to predict and explain how human beings chose to modify these categorical stimuli. Additionally, we compared model performance to predictions made by a leading prototypicality measure in the literature.


Sign in / Sign up

Export Citation Format

Share Document