Convolutional Neural Networks for Visual Information Analysis with Limited Computing Resources

Author(s):  
Paraskevi Nousi ◽  
Emmanouil Patsiouras ◽  
Anastasios Tefas ◽  
Ioannis Pitas
Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 72
Author(s):  
Sanghun Jeon ◽  
Ahmed Elsharkawy ◽  
Mun Sang Kim

In visual speech recognition (VSR), speech is transcribed using only visual information to interpret tongue and teeth movements. Recently, deep learning has shown outstanding performance in VSR, with accuracy exceeding that of lipreaders on benchmark datasets. However, several problems still exist when using VSR systems. A major challenge is the distinction of words with similar pronunciation, called homophones; these lead to word ambiguity. Another technical limitation of traditional VSR systems is that visual information does not provide sufficient data for learning words such as “a”, “an”, “eight”, and “bin” because their lengths are shorter than 0.02 s. This report proposes a novel lipreading architecture that combines three different convolutional neural networks (CNNs; a 3D CNN, a densely connected 3D CNN, and a multi-layer feature fusion 3D CNN), which are followed by a two-layer bi-directional gated recurrent unit. The entire network was trained using connectionist temporal classification. The results of the standard automatic speech recognition evaluation metrics show that the proposed architecture reduced the character and word error rates of the baseline model by 5.681% and 11.282%, respectively, for the unseen-speaker dataset. Our proposed architecture exhibits improved performance even when visual ambiguity arises, thereby increasing VSR reliability for practical applications.


2021 ◽  
pp. 1-19
Author(s):  
Michelle Torres ◽  
Francisco Cantú

Abstract We provide an introduction of the functioning, implementation, and challenges of convolutional neural networks (CNNs) to classify visual information in social sciences. This tool can help scholars to make more efficient the tedious task of classifying images and extracting information from them. We illustrate the implementation and impact of this methodology by coding handwritten information from vote tallies. Our paper not only demonstrates the contributions of CNNs to both scholars and policy practitioners, but also presents the practical challenges and limitations of the method, providing advice on how to deal with these issues.


2021 ◽  
Vol 15 ◽  
Author(s):  
Yufeng Zheng ◽  
Jun Huang ◽  
Tianwen Chen ◽  
Yang Ou ◽  
Wu Zhou

The convolutional neural networks (CNNs) are a powerful tool of image classification that has been widely adopted in applications of automated scene segmentation and identification. However, the mechanisms underlying CNN image classification remain to be elucidated. In this study, we developed a new approach to address this issue by investigating transfer of learning in representative CNNs (AlexNet, VGG, ResNet-101, and Inception-ResNet-v2) on classifying geometric shapes based on local/global features or invariants. While the local features are based on simple components, such as orientation of line segment or whether two lines are parallel, the global features are based on the whole object such as whether an object has a hole or whether an object is inside of another object. Six experiments were conducted to test two hypotheses on CNN shape classification. The first hypothesis is that transfer of learning based on local features is higher than transfer of learning based on global features. The second hypothesis is that the CNNs with more layers and advanced architectures have higher transfer of learning based global features. The first two experiments examined how the CNNs transferred learning of discriminating local features (square, rectangle, trapezoid, and parallelogram). The other four experiments examined how the CNNs transferred learning of discriminating global features (presence of a hole, connectivity, and inside/outside relationship). While the CNNs exhibited robust learning on classifying shapes, transfer of learning varied from task to task, and model to model. The results rejected both hypotheses. First, some CNNs exhibited lower transfer of learning based on local features than that based on global features. Second the advanced CNNs exhibited lower transfer of learning on global features than that of the earlier models. Among the tested geometric features, we found that learning of discriminating inside/outside relationship was the most difficult to be transferred, indicating an effective benchmark to develop future CNNs. In contrast to the “ImageNet” approach that employs natural images to train and analyze the CNNs, the results show proof of concept for the “ShapeNet” approach that employs well-defined geometric shapes to elucidate the strengths and limitations of the computation in CNN image classification. This “ShapeNet” approach will also provide insights into understanding visual information processing the primate visual systems.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Yaoda Xu ◽  
Maryam Vaziri-Pashkam

AbstractConvolutional neural networks (CNNs) are increasingly used to model human vision due to their high object categorization capabilities and general correspondence with human brain responses. Here we evaluate the performance of 14 different CNNs compared with human fMRI responses to natural and artificial images using representational similarity analysis. Despite the presence of some CNN-brain correspondence and CNNs’ impressive ability to fully capture lower level visual representation of real-world objects, we show that CNNs do not fully capture higher level visual representations of real-world objects, nor those of artificial objects, either at lower or higher levels of visual representations. The latter is particularly critical, as the processing of both real-world and artificial visual stimuli engages the same neural circuits. We report similar results regardless of differences in CNN architecture, training, or the presence of recurrent processing. This indicates some fundamental differences exist in how the brain and CNNs represent visual information.


2021 ◽  
Vol 40 ◽  
pp. 03048
Author(s):  
Vaibhav Narawade ◽  
Aneesh Potnis ◽  
Vishwaroop Ray ◽  
Pratik Rathor

Our project intends to classify movies into the three most probable genres that they belong to, from a predefined set of 25 genres, based on only one image i.e the movie poster. We have made use of Convolutional Neural Networks (CNN) to realize this project as we believe it would be of help to extract the features and visual information from the image. Instead of a multi-class classification problem in which the input is classified into any one class, this project would be more correctly described as a multilabel classification problem as a movie belongs to more than one genre. In this project we see a comparative study of different architectures and tune them to yield the best result based on the metric of accuracy. We have applied various techniques such as data augmentation and L2 regularization to comparatively deduce the model that performs best from all the tested models.


PLoS ONE ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. e0245230
Author(s):  
Marco Seeland ◽  
Patrick Mäder

Humans’ decision making process often relies on utilizing visual information from different views or perspectives. However, in machine-learning-based image classification we typically infer an object’s class from just a single image showing an object. Especially for challenging classification problems, the visual information conveyed by a single image may be insufficient for an accurate decision. We propose a classification scheme that relies on fusing visual information captured through images depicting the same object from multiple perspectives. Convolutional neural networks are used to extract and encode visual features from the multiple views and we propose strategies for fusing these information. More specifically, we investigate the following three strategies: (1) fusing convolutional feature maps at differing network depths; (2) fusion of bottleneck latent representations prior to classification; and (3) score fusion. We systematically evaluate these strategies on three datasets from different domains. Our findings emphasize the benefit of integrating information fusion into the network rather than performing it by post-processing of classification scores. Furthermore, we demonstrate through a case study that already trained networks can be easily extended by the best fusion strategy, outperforming other approaches by large margin.


Sign in / Sign up

Export Citation Format

Share Document