scholarly journals A novel keyframe extraction method for video classification using deep neural networks

Author(s):  
Rukiye Savran Kızıltepe ◽  
John Q. Gan ◽  
Juan José Escobar

AbstractCombining convolutional neural networks (CNNs) and recurrent neural networks (RNNs) produces a powerful architecture for video classification problems as spatial–temporal information can be processed simultaneously and effectively. Using transfer learning, this paper presents a comparative study to investigate how temporal information can be utilized to improve the performance of video classification when CNNs and RNNs are combined in various architectures. To enhance the performance of the identified architecture for effective combination of CNN and RNN, a novel action template-based keyframe extraction method is proposed by identifying the informative region of each frame and selecting keyframes based on the similarity between those regions. Extensive experiments on KTH and UCF-101 datasets with ConvLSTM-based video classifiers have been conducted. Experimental results are evaluated using one-way analysis of variance, which reveals the effectiveness of the proposed keyframe extraction method in the sense that it can significantly improve video classification accuracy.

Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1367
Author(s):  
Raghida El El Saj ◽  
Ehsan Sedgh Sedgh Gooya ◽  
Ayman Alfalou ◽  
Mohamad Khalil

Privacy-preserving deep neural networks have become essential and have attracted the attention of many researchers due to the need to maintain the privacy and the confidentiality of personal and sensitive data. The importance of privacy-preserving networks has increased with the widespread use of neural networks as a service in unsecured cloud environments. Different methods have been proposed and developed to solve the privacy-preserving problem using deep neural networks on encrypted data. In this article, we reviewed some of the most relevant and well-known computational and perceptual image encryption methods. These methods as well as their results have been presented, compared, and the conditions of their use, the durability and robustness of some of them against attacks, have been discussed. Some of the mentioned methods have demonstrated an ability to hide information and make it difficult for adversaries to retrieve it while maintaining high classification accuracy. Based on the obtained results, it was suggested to develop and use some of the cited privacy-preserving methods in applications other than classification.


Author(s):  
Vishal Babu Siramshetty ◽  
Dac-Trung Nguyen ◽  
Natalia J. Martinez ◽  
Anton Simeonov ◽  
Noel T. Southall ◽  
...  

The rise of novel artificial intelligence methods necessitates a comparison of this wave of new approaches with classical machine learning for a typical drug discovery project. Inhibition of the potassium ion channel, whose alpha subunit is encoded by human Ether-à-go-go-Related Gene (hERG), leads to prolonged QT interval of the cardiac action potential and is a significant safety pharmacology target for the development of new medicines. Several computational approaches have been employed to develop prediction models for assessment of hERG liabilities of small molecules including recent work using deep learning methods. Here we perform a comprehensive comparison of prediction models based on classical (random forests and gradient boosting) and modern (deep neural networks and recurrent neural networks) artificial intelligence methods. The training set (~9000 compounds) was compiled by integrating hERG bioactivity data from ChEMBL database with experimental data generated from an in-house, high-throughput thallium flux assay. We utilized different molecular descriptors including the latent descriptors, which are real-valued continuous vectors derived from chemical autoencoders trained on a large chemical space (> 1.5 million compounds). The models were prospectively validated on ~840 in-house compounds screened in the same thallium flux assay. The deep neural networks performed significantly better than the classical methods with the latent descriptors. The recurrent neural networks that operate on SMILES provided highest model sensitivity. The best models were merged into a consensus model that offered superior performance compared to reference models from academic and commercial domains. Further, we shed light on the potential of artificial intelligence methods to exploit the chemistry big data and generate novel chemical representations useful in predictive modeling and tailoring new chemical space.<br>


2018 ◽  
Vol 28 (4) ◽  
pp. 735-744 ◽  
Author(s):  
Michał Koziarski ◽  
Bogusław Cyganek

Abstract Due to the advances made in recent years, methods based on deep neural networks have been able to achieve a state-of-the-art performance in various computer vision problems. In some tasks, such as image recognition, neural-based approaches have even been able to surpass human performance. However, the benchmarks on which neural networks achieve these impressive results usually consist of fairly high quality data. On the other hand, in practical applications we are often faced with images of low quality, affected by factors such as low resolution, presence of noise or a small dynamic range. It is unclear how resilient deep neural networks are to the presence of such factors. In this paper we experimentally evaluate the impact of low resolution on the classification accuracy of several notable neural architectures of recent years. Furthermore, we examine the possibility of improving neural networks’ performance in the task of low resolution image recognition by applying super-resolution prior to classification. The results of our experiments indicate that contemporary neural architectures remain significantly affected by low image resolution. By applying super-resolution prior to classification we were able to alleviate this issue to a large extent as long as the resolution of the images did not decrease too severely. However, in the case of very low resolution images the classification accuracy remained considerably affected.


2018 ◽  
Vol 12 (04) ◽  
pp. 481-500 ◽  
Author(s):  
Naifan Zhuang ◽  
The Duc Kieu ◽  
Jun Ye ◽  
Kien A. Hua

With the growth of crowd phenomena in the real world, crowd scene understanding is becoming an important task in anomaly detection and public security. Visual ambiguities and occlusions, high density, low mobility, and scene semantics, however, make this problem a great challenge. In this paper, we propose an end-to-end deep architecture, convolutional nonlinear differential recurrent neural networks (CNDRNNs), for crowd scene understanding. CNDRNNs consist of GoogleNet Inception V3 convolutional neural networks (CNNs) and nonlinear differential recurrent neural networks (RNNs). Different from traditional non-end-to-end solutions which separate the steps of feature extraction and parameter learning, CNDRNN utilizes a unified deep model to optimize the parameters of CNN and RNN hand in hand. It thus has the potential of generating a more harmonious model. The proposed architecture takes sequential raw image data as input, and does not rely on tracklet or trajectory detection. It thus has clear advantages over the traditional flow-based and trajectory-based methods, especially in challenging crowd scenarios of high density and low mobility. Taking advantage of CNN and RNN, CNDRNN can effectively analyze the crowd semantics. Specifically, CNN is good at modeling the semantic crowd scene information. On the other hand, nonlinear differential RNN models the motion information. The individual and increasing orders of derivative of states (DoS) in differential RNN can progressively build up the ability of the long short-term memory (LSTM) gates to detect different levels of salient dynamical patterns in deeper stacked layers modeling higher orders of DoS. Lastly, existing LSTM-based crowd scene solutions explore deep temporal information and are claimed to be “deep in time.” Our proposed method CNDRNN, however, models the spatial and temporal information in a unified architecture and achieves “deep in space and time.” Extensive performance studies on the Violent-Flows, CUHK Crowd, and NUS-HGA datasets show that the proposed technique significantly outperforms state-of-the-art methods.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Wei Wang ◽  
Yiyang Hu ◽  
Ting Zou ◽  
Hongmei Liu ◽  
Jin Wang ◽  
...  

Because deep neural networks (DNNs) are both memory-intensive and computation-intensive, they are difficult to apply to embedded systems with limited hardware resources. Therefore, DNN models need to be compressed and accelerated. By applying depthwise separable convolutions, MobileNet can decrease the number of parameters and computational complexity with less loss of classification precision. Based on MobileNet, 3 improved MobileNet models with local receptive field expansion in shallow layers, also called Dilated-MobileNet (Dilated Convolution MobileNet) models, are proposed, in which dilated convolutions are introduced into a specific convolutional layer of the MobileNet model. Without increasing the number of parameters, dilated convolutions are used to increase the receptive field of the convolution filters to obtain better classification accuracy. The experiments were performed on the Caltech-101, Caltech-256, and Tubingen animals with attribute datasets, respectively. The results show that Dilated-MobileNets can obtain up to 2% higher classification accuracy than MobileNet.


Entropy ◽  
2019 ◽  
Vol 21 (5) ◽  
pp. 456 ◽  
Author(s):  
Hao Cheng ◽  
Dongze Lian ◽  
Shenghua Gao ◽  
Yanlin Geng

Inspired by the pioneering work of the information bottleneck (IB) principle for Deep Neural Networks’ (DNNs) analysis, we thoroughly study the relationship among the model accuracy, I ( X ; T ) and I ( T ; Y ) , where I ( X ; T ) and I ( T ; Y ) are the mutual information of DNN’s output T with input X and label Y. Then, we design an information plane-based framework to evaluate the capability of DNNs (including CNNs) for image classification. Instead of each hidden layer’s output, our framework focuses on the model output T. We successfully apply our framework to many application scenarios arising in deep learning and image classification problems, such as image classification with unbalanced data distribution, model selection, and transfer learning. The experimental results verify the effectiveness of the information plane-based framework: Our framework may facilitate a quick model selection and determine the number of samples needed for each class in the unbalanced classification problem. Furthermore, the framework explains the efficiency of transfer learning in the deep learning area.


Algorithms ◽  
2019 ◽  
Vol 12 (4) ◽  
pp. 85 ◽  
Author(s):  
Ioannis E. Livieris

During the last few decades, machine learning has constituted a significant tool in extracting useful knowledge from economic data for assisting decision-making. In this work, we evaluate the performance of weight-constrained recurrent neural networks in forecasting economic classification problems. These networks are efficiently trained with a recently-proposed training algorithm, which has two major advantages. Firstly, it exploits the numerical efficiency and very low memory requirements of the limited memory BFGS matrices; secondly, it utilizes a gradient-projection strategy for handling the bounds on the weights. The reported numerical experiments present the classification accuracy of the proposed model, providing empirical evidence that the application of the bounds on the weights of the recurrent neural network provides more stable and reliable learning.


2019 ◽  
Vol 2019 ◽  
pp. 1-7 ◽  
Author(s):  
Yu Fujinami-Yokokawa ◽  
Nikolas Pontikos ◽  
Lizhu Yang ◽  
Kazushige Tsunoda ◽  
Kazutoshi Yoshitake ◽  
...  

Purpose. To illustrate a data-driven deep learning approach to predicting the gene responsible for the inherited retinal disorder (IRD) in macular dystrophy caused by ABCA4 and RP1L1 gene aberration in comparison with retinitis pigmentosa caused by EYS gene aberration and normal subjects. Methods. Seventy-five subjects with IRD or no ocular diseases have been ascertained from the database of Japan Eye Genetics Consortium; 10 ABCA4 retinopathy, 20 RP1L1 retinopathy, 28 EYS retinopathy, and 17 normal patients/subjects. Horizontal/vertical cross-sectional scans of optical coherence tomography (SD-OCT) at the central fovea were cropped/adjusted to a resolution of 400 pixels/inch with a size of 750 × 500 pix2 for learning. Subjects were randomly split following a 3 : 1 ratio into training and test sets. The commercially available learning tool, Medic mind was applied to this four-class classification program. The classification accuracy, sensitivity, and specificity were calculated during the learning process. This process was repeated four times with random assignment to training and test sets to control for selection bias. For each training/testing process, the classification accuracy was calculated per gene category. Results. A total of 178 images from 75 subjects were included in this study. The mean training accuracy was 98.5%, ranging from 90.6 to 100.0. The mean overall test accuracy was 90.9% (82.0–97.6). The mean test accuracy per gene category was 100% for ABCA4, 78.0% for RP1L1, 89.8% for EYS, and 93.4% for Normal. Test accuracy of RP1L1 and EYS was not high relative to the training accuracy which suggests overfitting. Conclusion. This study highlighted a novel application of deep neural networks in the prediction of the causative gene in IRD retinopathies from SD-OCT, with a high prediction accuracy. It is anticipated that deep neural networks will be integrated into general screening to support clinical/genetic diagnosis, as well as enrich the clinical education.


Sensors ◽  
2020 ◽  
Vol 20 (17) ◽  
pp. 4992
Author(s):  
Shuli Xing ◽  
Malrey Lee

Due to the rich vitamin content in citrus fruit, citrus is an important crop around the world. However, the yield of these citrus crops is often reduced due to the damage of various pests and diseases. In order to mitigate these problems, several convolutional neural networks were applied to detect them. It is of note that the performance of these selected models degraded as the size of the target object in the image decreased. To adapt to scale changes, a new feature reuse method named bridge connection was developed. With the help of bridge connections, the accuracy of baseline networks was improved at little additional computation cost. The proposed BridgeNet-19 achieved the highest classification accuracy (95.47%), followed by the pre-trained VGG-19 (95.01%) and VGG-19 with bridge connections (94.73%). The use of bridge connections also strengthens the flexibility of sensors for image acquisition. It is unnecessary to pay more attention to adjusting the distance between a camera and pests and diseases.


Author(s):  
Hajar Maseeh Yasin ◽  
Adnan Mohsin Abdulazeez

Image compression is an essential technology for encoding and improving various forms of images in the digital era. The inventors have extended the principle of deep learning to the different states of neural networks as one of the most exciting machine learning methods to show that it is the most versatile way to analyze, classify, and compress images. Many neural networks are required for image compressions, such as deep neural networks, artificial neural networks, recurrent neural networks, and convolution neural networks. Therefore, this review paper discussed how to apply the rule of deep learning to various neural networks to obtain better compression in the image with high accuracy and minimize loss and superior visibility of the image. Therefore, deep learning and its application to different types of images in a justified manner with distinct analysis to obtain these things need deep learning.


Sign in / Sign up

Export Citation Format

Share Document