Study on a visual coder acceleration algorithm for image classification applying dynamic scaling training techniques

Abstract Nowadays, image classification techniques are used in the field of autonomous vehicles, and Convolutional Neural Network (CNN) is used extensively, and Vision Transformer (ViT) networks are used instead of deep convolutional networks in order to compress the network size and improve the model accuracy. The ViT network is used to replace the deep convolutional network. Since training ViT requires a large dataset to have sufficient accuracy, a variant of ViT, Data-Efficient Image Transformers (DEIT), is used in this paper. In addition, in order to greatly reduce the computing memory and shorten the computing time in practical use, the network is flexibly scaled in size and training speed by both adaptive width and adaptive depth. In this paper, we introduce DEIT, width adaptive techniques and depth adaptive techniques and combine them to be applied to image classification examples. Experiments are conducted on the Cifar100 dataset, and the experiments demonstrate the superiority of the algorithm on image classification scenarios.

Download Full-text

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Applied Sciences ◽

10.3390/app11156975 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6975

Author(s):

Tao Zhang ◽

Lun He ◽

Xudong Li ◽

Guoqing Feng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Error Rates ◽

Convolutional Network ◽

Convolutional Networks ◽

Sentence Level ◽

End To End ◽

High Level ◽

Improved Accuracy ◽

Talking Face

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.

Download Full-text

Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks for Fake News Detection

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3451215 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-23

Author(s):

Shengsheng Qian ◽

Jun Hu ◽

Quan Fang ◽

Changsheng Xu

Keyword(s):

Social Media ◽

Visual Information ◽

Representation Learning ◽

Fake News ◽

Unified Framework ◽

Model Learning ◽

Convolutional Network ◽

Textual Information ◽

Convolutional Networks ◽

Real World Datasets

In this article, we focus on fake news detection task and aim to automatically identify the fake news from vast amount of social media posts. To date, many approaches have been proposed to detect fake news, which includes traditional learning methods and deep learning-based models. However, there are three existing challenges: (i) How to represent social media posts effectively, since the post content is various and highly complicated; (ii) how to propose a data-driven method to increase the flexibility of the model to deal with the samples in different contexts and news backgrounds; and (iii) how to fully utilize the additional auxiliary information (the background knowledge and multi-modal information) of posts for better representation learning. To tackle the above challenges, we propose a novel Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks (KMAGCN) to capture the semantic representations by jointly modeling the textual information, knowledge concepts, and visual information into a unified framework for fake news detection. We model posts as graphs and use a knowledge-aware multi-modal adaptive graph learning principal for the effective feature learning. Compared with existing methods, the proposed KMAGCN addresses challenges from three aspects: (1) It models posts as graphs to capture the non-consecutive and long-range semantic relations; (2) it proposes a novel adaptive graph convolutional network to handle the variability of graph data; and (3) it leverages textual information, knowledge concepts and visual information jointly for model learning. We have conducted extensive experiments on three public real-world datasets and superior results demonstrate the effectiveness of KMAGCN compared with other state-of-the-art algorithms.

Download Full-text

Image Classification Based On Deep Convolutional Network And Gaussian Aggregate Encoding

2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI) ◽

10.1109/ictai50040.2020.00089 ◽

2020 ◽

Author(s):

Fengge Wang ◽

Xiaolin Tian ◽

Yang Zhang ◽

Nan Jia ◽

Tiantian Lu

Keyword(s):

Image Classification ◽

Convolutional Network ◽

Deep Convolutional Network

Download Full-text

Densely connected convolutional networks for breast cancer histopathological image classification

International Conference on Signal Image Processing and Communication (ICSIPC 2021) ◽

10.1117/12.2600183 ◽

2021 ◽

Author(s):

Jie Li ◽

JinLing Chen ◽

Chengming Zhao

Keyword(s):

Breast Cancer ◽

Image Classification ◽

Convolutional Networks ◽

Histopathological Image ◽

Histopathological Image Classification

Download Full-text

Dual Graph Convolutional Network for Hyperspectral Image Classification With Limited Training Samples

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2021.3061088 ◽

2021 ◽

pp. 1-18

Author(s):

Xin He ◽

Yushi Chen ◽

Pedram Ghamisi

Keyword(s):

Image Classification ◽

Hyperspectral Image ◽

Dual Graph ◽

Hyperspectral Image Classification ◽

Convolutional Network ◽

Training Samples ◽

Limited Training Samples

Download Full-text

Polarimetric Convolutional Network for PolSAR Image Classification

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2018.2879984 ◽

2019 ◽

Vol 57 (5) ◽

pp. 3040-3054 ◽

Cited By ~ 19

Author(s):

Xu Liu ◽

Licheng Jiao ◽

Xu Tang ◽

Qigong Sun ◽

Dan Zhang

Keyword(s):

Image Classification ◽

Convolutional Network

Download Full-text

Adaptive convolutional network for SAR image classification

The Journal of Engineering ◽

10.1049/joe.2019.0565 ◽

2019 ◽

Vol 2019 (20) ◽

pp. 6868-6872

Author(s):

Shuang Xia ◽

Ze Yu ◽

JinDong Yu

Keyword(s):

Image Classification ◽

Sar Image ◽

Convolutional Network

Download Full-text

Spectral-Spatial Graph Convolutional Networks for Semel-Supervised Hyperspectral Image Classification

2018 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR) ◽

10.1109/icwapr.2018.8521407 ◽

2018 ◽

Author(s):

Anyong Qin ◽

Chang Liu ◽

Zhaowei Shang ◽

Jinyu Tian

Keyword(s):

Image Classification ◽

Hyperspectral Image ◽

Hyperspectral Image Classification ◽

Convolutional Networks ◽

Spatial Graph

Download Full-text

MR-GCN: Multi-Relational Graph Convolutional Networks based on Generalized Tensor Product

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/175 ◽

2020 ◽

Author(s):

Zhichao Huang ◽

Xutao Li ◽

Yunming Ye ◽

Michael K. Ng

Keyword(s):

Tensor Product ◽

Convolution Operator ◽

State Of The Art ◽

Single Type ◽

Convolutional Network ◽

Convolutional Networks ◽

Node Classification ◽

Relational Graphs ◽

Eigen Decomposition ◽

Single Relation

Graph Convolutional Networks (GCNs) have been extensively studied in recent years. Most of existing GCN approaches are designed for the homogenous graphs with a single type of relation. However, heterogeneous graphs of multiple types of relations are also ubiquitous and there is a lack of methodologies to tackle such graphs. Some previous studies address the issue by performing conventional GCN on each single relation and then blending their results. However, as the convolutional kernels neglect the correlations across relations, the strategy is sub-optimal. In this paper, we propose the Multi-Relational Graph Convolutional Network (MR-GCN) framework by developing a novel convolution operator on multi-relational graphs. In particular, our multi-dimension convolution operator extends the graph spectral analysis into the eigen-decomposition of a Laplacian tensor. And the eigen-decomposition is formulated with a generalized tensor product, which can correspond to any unitary transform instead of limited merely to Fourier transform. We conduct comprehensive experiments on four real-world multi-relational graphs to solve the semi-supervised node classification task, and the results show the superiority of MR-GCN against the state-of-the-art competitors.

Download Full-text

Graph Convolutional Networks for Exercise Motion Classification

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181321651255 ◽

2021 ◽

Vol 65 (1) ◽

pp. 685-689

Author(s):

Parian Haghighat ◽

Aden Prince ◽

Heejin Jeong

Keyword(s):

Nearest Neighbor ◽

Current Model ◽

Model Performance ◽

Classification Performance ◽

Great Promise ◽

Graph Classification ◽

Convolutional Network ◽

Motion Data ◽

Convolutional Networks ◽

Personal Fitness

The growth in self-fitness mobile applications has encouraged people to turn to personal fitness, which entails integrating self-tracking applications with exercise motion data to reduce fatigue and mitigate the risk of injury. The advancements in computer vision and motion capture technologies hold great promise to improve exercise classification performance. This study investigates a supervised deep learning model performance, Graph Convolutional Network (GCN) to classify three workouts using the Azure Kinect device’s motion data. The model defines the skeleton as a graph and combines GCN layers, a readout layer, and multi-layer perceptrons to build an end-to-end framework for graph classification. The model achieves an accuracy of 95.86% in classifying 19,442 frames. The current model exchanges feature information between each joint and its 1-nearest neighbor, which impact fades in graph-level classification. Therefore, a future study on improved feature utilization can enhance the model performance in classifying inter-user exercise variation.

Download Full-text