A Computer Vision-Based Yoga Pose Grading Approach Using Contrastive Skeleton Feature Representations

The main objective of yoga pose grading is to assess the input yoga pose and compare it to a standard pose in order to provide a quantitative evaluation as a grade. In this paper, a computer vision-based yoga pose grading approach is proposed using contrastive skeleton feature representations. First, the proposed approach extracts human body skeleton keypoints from the input yoga pose image and then feeds their coordinates into a pose feature encoder, which is trained using contrastive triplet examples; finally, a comparison of similar encoded pose features is made. Furthermore, to tackle the inherent challenge of composing contrastive examples in pose feature encoding, this paper proposes a new strategy to use both a coarse triplet example—comprised of an anchor, a positive example from the same category, and a negative example from a different category, and a fine triplet example—comprised of an anchor, a positive example, and a negative example from the same category with different pose qualities. Extensive experiments are conducted using two benchmark datasets to demonstrate the superior performance of the proposed approach.

Download Full-text

Residual Augmented Attentional U-Shaped Network for Spectral Reconstruction from RGB Images

Remote Sensing ◽

10.3390/rs13010115 ◽

2020 ◽

Vol 13 (1) ◽

pp. 115

Author(s):

Jiaojiao Li ◽

Chaoxiong Wu ◽

Rui Song ◽

Yunsong Li ◽

Weiying Xie

Keyword(s):

Superior Performance ◽

Feature Maps ◽

Deep Convolutional Neural Networks ◽

Second Order Statistics ◽

Feature Representations ◽

Quantitative Measurements ◽

Spectral Reconstruction ◽

Perceptual Comparison ◽

Benchmark Datasets ◽

Rgb Images

Deep convolutional neural networks (CNNs) have been successfully applied to spectral reconstruction (SR) and acquired superior performance. Nevertheless, the existing CNN-based SR approaches integrate hierarchical features from different layers indiscriminately, lacking an investigation of the relationships of intermediate feature maps, which limits the learning power of CNNs. To tackle this problem, we propose a deep residual augmented attentional u-shape network (RA2UN) with several double improved residual blocks (DIRB) instead of paired plain convolutional units. Specifically, a trainable spatial augmented attention (SAA) module is developed to bridge the encoder and decoder to emphasize the features in the informative regions. Furthermore, we present a novel channel augmented attention (CAA) module embedded in the DIRB to rescale adaptively and enhance residual learning by using first-order and second-order statistics for stronger feature representations. Finally, a boundary-aware constraint is employed to focus on the salient edge information and recover more accurate high-frequency details. Experimental results on four benchmark datasets demonstrate that the proposed RA2UN network outperforms the state-of-the-art SR methods under quantitative measurements and perceptual comparison.

Download Full-text

Drug-Drug Interaction Predicting by Neural Network Using Integrated Similarity

Scientific Reports ◽

10.1038/s41598-019-50121-3 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 10

Author(s):

Narjes Rohani ◽

Changiz Eslahchi

Keyword(s):

Neural Network ◽

Drug Interaction ◽

Side Effect ◽

Network Architecture ◽

Selection Process ◽

Superior Performance ◽

Multiple Drug ◽

Interaction Prediction ◽

Benchmark Datasets ◽

Drug Drug Interaction

Abstract Drug-Drug Interaction (DDI) prediction is one of the most critical issues in drug development and health. Proposing appropriate computational methods for predicting unknown DDI with high precision is challenging. We proposed "NDD: Neural network-based method for drug-drug interaction prediction" for predicting unknown DDIs using various information about drugs. Multiple drug similarities based on drug substructure, target, side effect, off-label side effect, pathway, transporter, and indication data are calculated. At first, NDD uses a heuristic similarity selection process and then integrates the selected similarities with a nonlinear similarity fusion method to achieve high-level features. Afterward, it uses a neural network for interaction prediction. The similarity selection and similarity integration parts of NDD have been proposed in previous studies of other problems. Our novelty is to combine these parts with new neural network architecture and apply these approaches in the context of DDI prediction. We compared NDD with six machine learning classifiers and six state-of-the-art graph-based methods on three benchmark datasets. NDD achieved superior performance in cross-validation with AUPR ranging from 0.830 to 0.947, AUC from 0.954 to 0.994 and F-measure from 0.772 to 0.902. Moreover, cumulative evidence in case studies on numerous drug pairs, further confirm the ability of NDD to predict unknown DDIs. The evaluations corroborate that NDD is an efficient method for predicting unknown DDIs. The data and implementation of NDD are available at https://github.com/nrohani/NDD.

Download Full-text

An efficient pruning scheme of deep neural networks for Internet of Things applications

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00744-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Chen Qi ◽

Shibo Shen ◽

Rongpeng Li ◽

Zhifeng Zhao ◽

Qing Liu ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Internet Of Things ◽

Deep Neural Networks ◽

Computational Cost ◽

Superior Performance ◽

Compact Structure ◽

Resource Limited ◽

Benchmark Datasets ◽

Iot Devices

AbstractNowadays, deep neural networks (DNNs) have been rapidly deployed to realize a number of functionalities like sensing, imaging, classification, recognition, etc. However, the computational-intensive requirement of DNNs makes it difficult to be applicable for resource-limited Internet of Things (IoT) devices. In this paper, we propose a novel pruning-based paradigm that aims to reduce the computational cost of DNNs, by uncovering a more compact structure and learning the effective weights therein, on the basis of not compromising the expressive capability of DNNs. In particular, our algorithm can achieve efficient end-to-end training that transfers a redundant neural network to a compact one with a specifically targeted compression rate directly. We comprehensively evaluate our approach on various representative benchmark datasets and compared with typical advanced convolutional neural network (CNN) architectures. The experimental results verify the superior performance and robust effectiveness of our scheme. For example, when pruning VGG on CIFAR-10, our proposed scheme is able to significantly reduce its FLOPs (floating-point operations) and number of parameters with a proportion of 76.2% and 94.1%, respectively, while still maintaining a satisfactory accuracy. To sum up, our scheme could facilitate the integration of DNNs into the common machine-learning-based IoT framework and establish distributed training of neural networks in both cloud and edge.

Download Full-text

Adaptive Channel Selection for Robust Visual Object Tracking with Discriminative Correlation Filters

International Journal of Computer Vision ◽

10.1007/s11263-021-01435-1 ◽

2021 ◽

Author(s):

Tianyang Xu ◽

Zhenhua Feng ◽

Xiao-Jun Wu ◽

Josef Kittler

Keyword(s):

Object Tracking ◽

Augmented Lagrangian Method ◽

Channel Selection ◽

Image Feature ◽

Superior Performance ◽

Appearance Model ◽

Visual Object ◽

Correlation Filters ◽

Visual Object Tracking ◽

Feature Representations

AbstractDiscriminative Correlation Filters (DCF) have been shown to achieve impressive performance in visual object tracking. However, existing DCF-based trackers rely heavily on learning regularised appearance models from invariant image feature representations. To further improve the performance of DCF in accuracy and provide a parsimonious model from the attribute perspective, we propose to gauge the relevance of multi-channel features for the purpose of channel selection. This is achieved by assessing the information conveyed by the features of each channel as a group, using an adaptive group elastic net inducing independent sparsity and temporal smoothness on the DCF solution. The robustness and stability of the learned appearance model are significantly enhanced by the proposed method as the process of channel selection performs implicit spatial regularisation. We use the augmented Lagrangian method to optimise the discriminative filters efficiently. The experimental results obtained on a number of well-known benchmarking datasets demonstrate the effectiveness and stability of the proposed method. A superior performance over the state-of-the-art trackers is achieved using less than $$10\%$$ 10 % deep feature channels.

Download Full-text

Computer Vision-Based Human Body Segmentation and Posture Estimation

IEEE Transactions on Systems Man and Cybernetics - Part A Systems and Humans ◽

10.1109/tsmca.2008.2008397 ◽

2009 ◽

Cited By ~ 1

Author(s):

Chia-Feng Juang ◽

Chia-Ming Chang ◽

Jiuh-Rou Wu ◽

Demei Lee

Keyword(s):

Computer Vision ◽

Human Body ◽

Posture Estimation ◽

Body Segmentation ◽

Human Body Segmentation

Download Full-text

Epileptic Seizure Prediction Using Deep Transformer Model

International Journal of Neural Systems ◽

10.1142/s0129065721500581 ◽

2021 ◽

Author(s):

Abhijeet Bhattacharya ◽

Tanmay Baweja ◽

S. P. K. Karri

Keyword(s):

Signal Processing ◽

Deep Learning ◽

False Positive Rate ◽

Superior Performance ◽

Seizure Prediction ◽

Advantages And Disadvantages ◽

Positive Rate ◽

Benchmark Datasets ◽

Automated Screening ◽

Transformer Model

The electroencephalogram (EEG) is the most promising and efficient technique to study epilepsy and record all the electrical activity going in our brain. Automated screening of epilepsy through data-driven algorithms reduces the manual workload of doctors to diagnose epilepsy. New algorithms are biased either towards signal processing or deep learning, which holds subjective advantages and disadvantages. The proposed pipeline is an end-to-end automated seizure prediction framework with a Fourier transform feature extraction and deep learning-based transformer model, a blend of signal processing and deep learning — this imbibes the potential features to automatically identify the attentive regions in EEG signals for effective screening. The proposed pipeline has demonstrated superior performance on the benchmark dataset with average sensitivity and false-positive rate per hour (FPR/h) as 98.46%, 94.83% and 0.12439, 0, respectively. The proposed work shows great results on the benchmark datasets and a big potential for clinics as a support system with medical experts monitoring the patients.

Download Full-text

Random Fourier Features via Fast Surrogate Leverage Weighted Sampling

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5920 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4844-4851

Author(s):

Fanghui Liu ◽

Xiaolin Huang ◽

Yudong Chen ◽

Jie Yang ◽

Johan Suykens

Keyword(s):

State Of The Art ◽

Sampling Strategy ◽

Prediction Performance ◽

Current State ◽

Kernel Approximation ◽

Sampling Process ◽

New Strategy ◽

The Matrix ◽

Benchmark Datasets ◽

Random Fourier Features

In this paper, we propose a fast surrogate leverage weighted sampling strategy to generate refined random Fourier features for kernel approximation. Compared to the current state-of-the-art method that uses the leverage weighted scheme (Li et al. 2019), our new strategy is simpler and more effective. It uses kernel alignment to guide the sampling process and it can avoid the matrix inversion operator when we compute the leverage function. Given n observations and s random features, our strategy can reduce the time complexity for sampling from O(ns2+s3) to O(ns2), while achieving comparable (or even slightly better) prediction performance when applied to kernel ridge regression (KRR). In addition, we provide theoretical guarantees on the generalization performance of our approach, and in particular characterize the number of random features required to achieve statistical guarantees in KRR. Experiments on several benchmark datasets demonstrate that our algorithm achieves comparable prediction performance and takes less time cost when compared to (Li et al. 2019).

Download Full-text

Self-Amplificated Network: Learning fine-grained learner with few samples

Journal of Physics Conference Series ◽

10.1088/1742-6596/2050/1/012006 ◽

2021 ◽

Vol 2050 (1) ◽

pp. 012006

Author(s):

Xili Dai ◽

Chunmei Ma ◽

Jingwei Sun ◽

Tao Zhang ◽

Haigang Gong ◽

...

Keyword(s):

Deep Neural Networks ◽

Classification Problem ◽

The Self ◽

Superior Performance ◽

Query Image ◽

Network Learning ◽

Fine Grained ◽

Support Set ◽

Meta Learning ◽

Benchmark Datasets

Abstract Training deep neural networks from only a few examples has been an interesting topic that motivated few shot learning. In this paper, we study the fine-grained image classification problem in a challenging few-shot learning setting, and propose the Self-Amplificated Network (SAN), a method based on meta-learning to tackle this problem. The SAN model consists of three parts, which are the Encoder, Amplification and Similarity Modules. The Encoder Module encodes a fine-grained image input into a feature vector. The Amplification Module is used to amplify subtle differences between fine-grained images based on the self attention mechanism which is composed of multi-head attention. The Similarity Module measures how similar the query image and the support set are in order to determine the classification result. In-depth experiments on three benchmark datasets have showcased that our network achieves superior performance over the competing baselines.

Download Full-text

3D visualization of human body internal structures surface during stereo-endoscopic operations using computer vision techniques

PRZEGLĄD ELEKTROTECHNICZNY ◽

10.15199/48.2021.09.06 ◽

2021 ◽

Vol 1 (9) ◽

pp. 32-35

Author(s):

Karina SELIVANOVA

Keyword(s):

Computer Vision ◽

Human Body ◽

3D Visualization ◽

Internal Structures

Download Full-text

Residual Invertible Spatio-Temporal Network for Video Super-Resolution

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015981 ◽

2019 ◽

Vol 33 ◽

pp. 5981-5988 ◽

Cited By ~ 12

Author(s):

Xiaobin Zhu ◽

Zhuangzi Li ◽

Xiao-Yu Zhang ◽

Changsheng Li ◽

Yaqi Liu ◽

...

Keyword(s):

Spatial Information ◽

Super Resolution ◽

Temporal Consistency ◽

Temporal Network ◽

Convolutional Network ◽

Feature Representations ◽

Video Frames ◽

Temporal Features ◽

Benchmark Datasets ◽

Spatio Temporal

Video super-resolution is a challenging task, which has attracted great attention in research and industry communities. In this paper, we propose a novel end-to-end architecture, called Residual Invertible Spatio-Temporal Network (RISTN) for video super-resolution. The RISTN can sufficiently exploit the spatial information from low-resolution to high-resolution, and effectively models the temporal consistency from consecutive video frames. Compared with existing recurrent convolutional network based approaches, RISTN is much deeper but more efficient. It consists of three major components: In the spatial component, a lightweight residual invertible block is designed to reduce information loss during feature transformation and provide robust feature representations. In the temporal component, a novel recurrent convolutional model with residual dense connections is proposed to construct deeper network and avoid feature degradation. In the reconstruction component, a new fusion method based on the sparse strategy is proposed to integrate the spatial and temporal features. Experiments on public benchmark datasets demonstrate that RISTN outperforms the state-ofthe-art methods.

Download Full-text