CNN and RNN mixed model for image classification

In this paper, we propose a CNN(Convolutional neural networks) and RNN(recurrent neural networks) mixed model for image classification, the proposed network, called CNN-RNN model. Image data can be viewed as two-dimensional wave data, and convolution calculation is a filtering process. It can filter non-critical band information in an image, leaving behind important features of image information. The CNN-RNN model can use the RNN to Calculate the Dependency and Continuity Features of the Intermediate Layer Output of the CNN Model, connect the characteristics of these middle tiers to the final full-connection network for classification prediction, which will result in better classification accuracy. At the same time, in order to satisfy the restriction of the length of the input sequence by the RNN model and prevent the gradient explosion or gradient disappearing in the network, this paper combines the wavelet transform (WT) method in the Fourier transform to filter the input data. We will test the proposed CNN-RNN model on a widely-used datasets CIFAR-10. The results prove the proposed method has a better classification effect than the original CNN network, and that further investigation is needed.

Download Full-text

Data Augmentation Methods Applying Grayscale Images for Convolutional Neural Networks in Machine Vision

Applied Sciences ◽

10.3390/app11156721 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6721

Author(s):

Jinyeong Wang ◽

Sanghwan Lee

Keyword(s):

Neural Networks ◽

Machine Vision ◽

Object Detection ◽

Image Classification ◽

Convolutional Neural Networks ◽

Data Augmentation ◽

Image Data ◽

Manufacturing Productivity ◽

Smart Factories ◽

Grayscale Images

In increasing manufacturing productivity with automated surface inspection in smart factories, the demand for machine vision is rising. Recently, convolutional neural networks (CNNs) have demonstrated outstanding performance and solved many problems in the field of computer vision. With that, many machine vision systems adopt CNNs to surface defect inspection. In this study, we developed an effective data augmentation method for grayscale images in CNN-based machine vision with mono cameras. Our method can apply to grayscale industrial images, and we demonstrated outstanding performance in the image classification and the object detection tasks. The main contributions of this study are as follows: (1) We propose a data augmentation method that can be performed when training CNNs with industrial images taken with mono cameras. (2) We demonstrate that image classification or object detection performance is better when training with the industrial image data augmented by the proposed method. Through the proposed method, many machine-vision-related problems using mono cameras can be effectively solved by using CNNs.

Download Full-text

Image Classification for the Automatic Feature Extraction in Human Worn Fashion Data

Mathematics ◽

10.3390/math9060624 ◽

2021 ◽

Vol 9 (6) ◽

pp. 624

Author(s):

Stefan Rohrmanstorfer ◽

Mikhail Komarov ◽

Felix Mödritscher

Keyword(s):

Neural Networks ◽

Feature Extraction ◽

Image Classification ◽

Convolutional Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Image Data ◽

Classification Model ◽

Upper Body ◽

Automatic Feature Extraction

With the always increasing amount of image data, it has become a necessity to automatically look for and process information in these images. As fashion is captured in images, the fashion sector provides the perfect foundation to be supported by the integration of a service or application that is built on an image classification model. In this article, the state of the art for image classification is analyzed and discussed. Based on the elaborated knowledge, four different approaches will be implemented to successfully extract features out of fashion data. For this purpose, a human-worn fashion dataset with 2567 images was created, but it was significantly enlarged by the performed image operations. The results show that convolutional neural networks are the undisputed standard for classifying images, and that TensorFlow is the best library to build them. Moreover, through the introduction of dropout layers, data augmentation and transfer learning, model overfitting was successfully prevented, and it was possible to incrementally improve the validation accuracy of the created dataset from an initial 69% to a final validation accuracy of 84%. More distinct apparel like trousers, shoes and hats were better classified than other upper body clothes.

Download Full-text

Convolutional Nonlinear Differential Recurrent Neural Networks for Crowd Scene Understanding

International Journal of Semantic Computing ◽

10.1142/s1793351x18400196 ◽

2018 ◽

Vol 12 (04) ◽

pp. 481-500 ◽

Cited By ~ 1

Author(s):

Naifan Zhuang ◽

The Duc Kieu ◽

Jun Ye ◽

Kien A. Hua

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Scene Understanding ◽

Image Data ◽

High Density ◽

Temporal Information ◽

Deep Model ◽

End To End ◽

The Individual

With the growth of crowd phenomena in the real world, crowd scene understanding is becoming an important task in anomaly detection and public security. Visual ambiguities and occlusions, high density, low mobility, and scene semantics, however, make this problem a great challenge. In this paper, we propose an end-to-end deep architecture, convolutional nonlinear differential recurrent neural networks (CNDRNNs), for crowd scene understanding. CNDRNNs consist of GoogleNet Inception V3 convolutional neural networks (CNNs) and nonlinear differential recurrent neural networks (RNNs). Different from traditional non-end-to-end solutions which separate the steps of feature extraction and parameter learning, CNDRNN utilizes a unified deep model to optimize the parameters of CNN and RNN hand in hand. It thus has the potential of generating a more harmonious model. The proposed architecture takes sequential raw image data as input, and does not rely on tracklet or trajectory detection. It thus has clear advantages over the traditional flow-based and trajectory-based methods, especially in challenging crowd scenarios of high density and low mobility. Taking advantage of CNN and RNN, CNDRNN can effectively analyze the crowd semantics. Specifically, CNN is good at modeling the semantic crowd scene information. On the other hand, nonlinear differential RNN models the motion information. The individual and increasing orders of derivative of states (DoS) in differential RNN can progressively build up the ability of the long short-term memory (LSTM) gates to detect different levels of salient dynamical patterns in deeper stacked layers modeling higher orders of DoS. Lastly, existing LSTM-based crowd scene solutions explore deep temporal information and are claimed to be “deep in time.” Our proposed method CNDRNN, however, models the spatial and temporal information in a unified architecture and achieves “deep in space and time.” Extensive performance studies on the Violent-Flows, CUHK Crowd, and NUS-HGA datasets show that the proposed technique significantly outperforms state-of-the-art methods.

Download Full-text

Multiscale Superpixel-Based Hyperspectral Image Classification Using Recurrent Neural Networks With Stacked Autoencoders

IEEE Transactions on Multimedia ◽

10.1109/tmm.2019.2928491 ◽

2020 ◽

Vol 22 (2) ◽

pp. 487-501 ◽

Cited By ~ 5

Author(s):

Cheng Shi ◽

Chi-Man Pun

Keyword(s):

Neural Networks ◽

Image Classification ◽

Recurrent Neural Networks ◽

Hyperspectral Image ◽

Hyperspectral Image Classification ◽

Stacked Autoencoders

Download Full-text

Guided filter based Deep Recurrent Neural Networks for Hyperspectral Image Classification

Procedia Computer Science ◽

10.1016/j.procs.2018.03.048 ◽

2018 ◽

Vol 129 ◽

pp. 219-223 ◽

Cited By ~ 7

Author(s):

Yanhui Guo ◽

Siming Han ◽

Han Cao ◽

Yu Zhang ◽

Qian Wang

Keyword(s):

Neural Networks ◽

Image Classification ◽

Recurrent Neural Networks ◽

Hyperspectral Image ◽

Guided Filter ◽

Hyperspectral Image Classification

Download Full-text

Attend and Imagine: Multi-Label Image Classification With Visual Attention and Recurrent Neural Networks

IEEE Transactions on Multimedia ◽

10.1109/tmm.2019.2894964 ◽

2019 ◽

Vol 21 (8) ◽

pp. 1971-1981 ◽

Cited By ~ 13

Author(s):

Fan Lyu ◽

Qi Wu ◽

Fuyuan Hu ◽

Qingyao Wu ◽

Mingkui Tan

Keyword(s):

Neural Networks ◽

Visual Attention ◽

Image Classification ◽

Recurrent Neural Networks

Download Full-text

Deep Recurrent Neural Networks for Hyperspectral Image Classification

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2016.2636241 ◽

2017 ◽

Vol 55 (7) ◽

pp. 3639-3655 ◽

Cited By ~ 268

Author(s):

Lichao Mou ◽

Pedram Ghamisi ◽

Xiao Xiang Zhu

Keyword(s):

Neural Networks ◽

Image Classification ◽

Recurrent Neural Networks ◽

Hyperspectral Image ◽

Hyperspectral Image Classification

Download Full-text

Classification of Imbalanced Cloud Image Data Using Deep Neural Networks –Performance Improvement Through a Data Science Competition–

10.21203/rs.3.rs-710989/v1 ◽

2021 ◽

Author(s):

Daisuke Matsuoka

Keyword(s):

Neural Networks ◽

Data Science ◽

Image Data ◽

Cost Effective ◽

Classification Performance ◽

Classification Model ◽

Extreme Weather Events ◽

Target Class ◽

Imbalanced Classification ◽

Classification Prediction

Abstract Image data classification using machine learning is one of the effective methods for detecting atmospheric phenomena. However, extreme weather events with a small number of cases cause a decrease in classification prediction accuracy owing to the imbalance of data between the target class and the other classes. In order to build a highly accurate classification model, we held a data analysis competition to determine the best classification performance for two classes of cloud image data: tropical cyclones including precursors and other classes. For the top models in the competition, minority data oversampling, majority data undersampling, ensemble learning, deep layer neural networks, and cost-effective loss functions were used to improve the imbalanced classification performance. In particular, the best model out of 209 submissions succeeded in improving the classification capability by 65.4% over similar conventional methods in a measure of low false alarm ratio.

Download Full-text