Skeleton-Based Action Recognition with Joint Coordinates as Feature Using Neural Oblivious Decision Ensembles

Recognition of human behavior is critical in video monitoring, human-computer interaction, video comprehension, and virtual reality. The key problem with behaviour recognition in video surveillance is the high degree of variation between and within subjects. Numerous studies have suggested background-insensitive skeleton-based as the proven detection technique. The present state-of-the-art approaches to skeleton-based action recognition rely primarily on Recurrent Neural Networks (RNN) and Convolution Neural Networks (CNN). Both methods take dynamic human skeleton as the input to the network. We chose to handle skeleton data differently, relying solely on its skeleton joint coordinates as the input. The skeleton joints’ positions are defined in (x, y) coordinates. In this paper, we investigated the incorporation of the Neural Oblivious Decision Ensemble (NODE) into our proposed action classifier network. The skeleton is extracted using a pose estimation technique based on the Residual Network (ResNet). It extracts the 2D skeleton of 18 joints for each detected body. The joint coordinates of the skeleton are stored in a table in the form of rows and columns. Each row represents the position of the joints. The structured data are fed into NODE for label prediction. With the proposed network, we obtain 97.5% accuracy on RealWorld (HAR) dataset. Experimental results show that the proposed network outperforms one the state-of-the-art approaches by 1.3%. In conclusion, NODE is a promising deep learning technique for structured data analysis as compared to its machine learning counterparts such as the GBDT packages; Catboost, and XGBoost.

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>

Download Full-text

Multi-View Attribute Graph Convolution Networks for Clustering

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/411 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jiafeng Cheng ◽

Qianqian Wang ◽

Zhiqiang Tao ◽

Deyan Xie ◽

Quanxue Gao

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Graph Embedding ◽

Structured Data ◽

Attention Networks ◽

Graph Data ◽

Graph Reconstruction ◽

Node Attributes ◽

Graph Neural Networks ◽

Geometric Relationship

Graph neural networks (GNNs) have made considerable achievements in processing graph-structured data. However, existing methods can not allocate learnable weights to different nodes in the neighborhood and lack of robustness on account of neglecting both node attributes and graph reconstruction. Moreover, most of multi-view GNNs mainly focus on the case of multiple graphs, while designing GNNs for solving graph-structured data of multi-view attributes is still under-explored. In this paper, we propose a novel Multi-View Attribute Graph Convolution Networks (MAGCN) model for the clustering task. MAGCN is designed with two-pathway encoders that map graph embedding features and learn the view-consistency information. Specifically, the first pathway develops multi-view attribute graph attention networks to reduce the noise/redundancy and learn the graph embedding features for each multi-view graph data. The second pathway develops consistent embedding encoders to capture the geometric relationship and probability distribution consistency among different views, which adaptively finds a consistent clustering embedding space for multi-view attributes. Experiments on three benchmark graph datasets show the superiority of our method compared with several state-of-the-art algorithms.

Download Full-text

Action recognition based on 2D skeletons extracted from RGB videos

MATEC Web of Conferences ◽

10.1051/matecconf/201927702034 ◽

2019 ◽

Vol 277 ◽

pp. 02034

Author(s):

Sophie Aubry ◽

Sohaib Laraba ◽

Joëlle Tilmanne ◽

Thierry Dutoit

Keyword(s):

Neural Networks ◽

Image Classification ◽

Action Recognition ◽

State Of The Art ◽

Video Stream ◽

Motion Data ◽

Rgb Images ◽

Human Pose ◽

2D Images ◽

Made In

In this paper a methodology to recognize actions based on RGB videos is proposed which takes advantages of the recent breakthrough made in deep learning. Following the development of Convolutional Neural Networks (CNNs), research was conducted on the transformation of skeletal motion data into 2D images. In this work, a solution is proposed requiring only the use of RGB videos instead of RGB-D videos. This work is based on multiple works studying the conversion of RGB-D data into 2D images. From a video stream (RGB images), a two-dimension skeleton of 18 joints for each detected body is extracted with a DNN-based human pose estimator called OpenPose. The skeleton data are encoded into Red, Green and Blue channels of images. Different ways of encoding motion data into images were studied. We successfully use state-of-the-art deep neural networks designed for image classification to recognize actions. Based on a study of the related works, we chose to use image classification models: SqueezeNet, AlexNet, DenseNet, ResNet, Inception, VGG and retrained them to perform action recognition. For all the test the NTU RGB+D database is used. The highest accuracy is obtained with ResNet: 83.317% cross-subject and 88.780% cross-view which outperforms most of state-of-the-art results.

Download Full-text

RGBD camera monitoring system for Alzheimer's disease assessment using Recurrent Neural Networks with Parametric Bias action recognition

IFAC Proceedings Volumes ◽

10.3182/20140824-6-za-1003.02199 ◽

2014 ◽

Vol 47 (3) ◽

pp. 3863-3868 ◽

Cited By ~ 10

Author(s):

Sabrina Iarlori ◽

Francesco Ferracuti ◽

Andrea Giantomassi ◽

Sauro Longhi

Keyword(s):

Alzheimer’S Disease ◽

Neural Networks ◽

Alzheimer's Disease ◽

Monitoring System ◽

Action Recognition ◽

Recurrent Neural Networks ◽

Disease Assessment

Download Full-text

ARRNET: Action recognition through recurrent neural networks

2016 International Conference on Signal Processing and Communications (SPCOM) ◽

10.1109/spcom.2016.7746614 ◽

2016 ◽

Cited By ~ 2

Author(s):

Kumaresh Krishnan ◽

Nikita Prabhu ◽

R. Venkatesh Babu

Keyword(s):

Neural Networks ◽

Action Recognition ◽

Recurrent Neural Networks

Download Full-text

Action recognition on video using recurrent neural networks

Program systems theory and applications ◽

10.25209/2079-3316-2017-8-4-327-345 ◽

2017 ◽

Vol 8 (4) ◽

pp. 327-345

Author(s):

Aleksandr Buyko ◽

◽

Andrey Vinogradov ◽

Keyword(s):

Neural Networks ◽

Action Recognition ◽

Recurrent Neural Networks

Download Full-text

Accurate and efficient time-domain classification with adaptive spiking recurrent neural networks

10.1101/2021.03.22.436372 ◽

2021 ◽

Author(s):

Bojian Yin ◽

Federico Corradi ◽

Sander M. Bohté

Keyword(s):

Neural Networks ◽

Time Domain ◽

Recurrent Neural Networks ◽

State Of The Art ◽

Spiking Neurons ◽

Recurrent Networks ◽

Computationally Efficient ◽

Hardware Implementations ◽

Comparable Performance ◽

The Time Domain

ABSTRACTInspired by more detailed modeling of biological neurons, Spiking neural networks (SNNs) have been investigated both as more biologically plausible and potentially more powerful models of neural computation, and also with the aim of extracting biological neurons’ energy efficiency; the performance of such networks however has remained lacking compared to classical artificial neural networks (ANNs). Here, we demonstrate how a novel surrogate gradient combined with recurrent networks of tunable and adaptive spiking neurons yields state-of-the-art for SNNs on challenging benchmarks in the time-domain, like speech and gesture recognition. This also exceeds the performance of standard classical recurrent neural networks (RNNs) and approaches that of the best modern ANNs. As these SNNs exhibit sparse spiking, we show that they theoretically are one to three orders of magnitude more computationally efficient compared to RNNs with comparable performance. Together, this positions SNNs as an attractive solution for AI hardware implementations.

Download Full-text

An empirical study on temporal modeling for online action detection

Complex & Intelligent Systems ◽

10.1007/s40747-021-00534-3 ◽

2021 ◽

Author(s):

Wen Wang ◽

Xiaojiang Peng ◽

Yu Qiao ◽

Jian Cheng

Keyword(s):

Neural Networks ◽

Empirical Study ◽

Recurrent Neural Networks ◽

State Of The Art ◽

Deep Convolutional Neural Networks ◽

Temporal Modeling ◽

Action Detection ◽

Modeling Methods ◽

Feature Extractor ◽

First Time

AbstractOnline action detection (OAD) is a practical yet challenging task, which has attracted increasing attention in recent years. A typical OAD system mainly consists of three modules: a frame-level feature extractor which is usually based on pre-trained deep Convolutional Neural Networks (CNNs), a temporal modeling module, and an action classifier. Among them, the temporal modeling module is crucial which aggregates discriminative information from historical and current features. Though many temporal modeling methods have been developed for OAD and other topics, their effects are lack of investigation on OAD fairly. This paper aims to provide an empirical study on temporal modeling for OAD including four meta types of temporal modeling methods, i.e. temporal pooling, temporal convolution, recurrent neural networks, and temporal attention, and uncover some good practices to produce a state-of-the-art OAD system. Many of them are explored in OAD for the first time, and extensively evaluated with various hyper parameters. Furthermore, based on our empirical study, we present several hybrid temporal modeling methods. Our best networks, i.e. , the hybridization of DCC, LSTM and M-NL, and the hybridization of DCC and M-NL, which outperform previously published results with sizable margins on THUMOS-14 dataset (48.6% vs. 47.2%) and TVSeries dataset (84.3% vs. 83.7%).

Download Full-text

Handwritten Bangla Character Recognition Using the State-of-the-Art Deep Convolutional Neural Networks

Computational Intelligence and Neuroscience ◽

10.1155/2018/6747098 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 18

Author(s):

Md Zahangir Alom ◽

Paheding Sidike ◽

Mahmudul Hasan ◽

Tarek M. Taha ◽

Vijayan K. Asari

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Convolutional Neural Networks ◽

Character Recognition ◽

State Of The Art ◽

The State ◽

Superior Performance ◽

Deep Convolutional Neural Networks ◽

Practical Applications ◽

High Degree

In spite of advances in object recognition technology, handwritten Bangla character recognition (HBCR) remains largely unsolved due to the presence of many ambiguous handwritten characters and excessively cursive Bangla handwritings. Even many advanced existing methods do not lead to satisfactory performance in practice that related to HBCR. In this paper, a set of the state-of-the-art deep convolutional neural networks (DCNNs) is discussed and their performance on the application of HBCR is systematically evaluated. The main advantage of DCNN approaches is that they can extract discriminative features from raw data and represent them with a high degree of invariance to object distortions. The experimental results show the superior performance of DCNN models compared with the other popular object recognition approaches, which implies DCNN can be a good candidate for building an automatic HBCR system for practical applications.

Download Full-text

DialogueRNN: An Attentive RNN for Emotion Detection in Conversations

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016818 ◽

2019 ◽

Vol 33 ◽

pp. 6818-6825 ◽

Cited By ~ 27

Author(s):

Navonil Majumder ◽

Soujanya Poria ◽

Devamanyu Hazarika ◽

Rada Mihalcea ◽

Alexander Gelbukh ◽

...

Keyword(s):

Neural Networks ◽

Social Media ◽

Recurrent Neural Networks ◽

Opinion Mining ◽

State Of The Art ◽

Emotion Detection ◽

Emotion Classification ◽

Argumentation Mining ◽

Consumer Feedback ◽

The Individual

Emotion detection in conversations is a necessary step for a number of applications, including opinion mining over chat history, social media threads, debates, argumentation mining, understanding consumer feedback in live conversations, and so on. Currently systems do not treat the parties in the conversation individually by adapting to the speaker of each utterance. In this paper, we describe a new method based on recurrent neural networks that keeps track of the individual party states throughout the conversation and uses this information for emotion classification. Our model outperforms the state-of-the-art by a significant margin on two different datasets.

Download Full-text