Video Classification Using 3D Convolutional Neural Network

Author(s):  
K. Jairam Naik ◽  
Annukriti Soni

Since video includes both temporal and spatial features, it has become a fascinating classification problem. Each frame within a video holds important information called spatial information, as does the context of that frame relative to the frames before it in time called temporal information. Several methods have been invented for video classification, but each one is suffering from its own drawback. One of such method is called convolutional neural networks (CNN) model. It is a category of deep learning neural network model that can turn directly on the underdone inputs. However, such models are recently limited to handling two-dimensional inputs only. This chapter implements a three-dimensional convolutional neural networks (CNN) model for video classification to analyse the classification accuracy gained using the 3D CNN model. The 3D convolutional networks are preferred for video classification since they inherently apply convolutions in the 3D space.

2019 ◽  
Vol 12 (1) ◽  
pp. 108 ◽  
Author(s):  
Juhyun Lee ◽  
Jungho Im ◽  
Dong-Hyun Cha ◽  
Haemi Park ◽  
Seongmun Sim

For a long time, researchers have tried to find a way to analyze tropical cyclone (TC) intensity in real-time. Since there is no standardized method for estimating TC intensity and the most widely used method is a manual algorithm using satellite-based cloud images, there is a bias that varies depending on the TC center and shape. In this study, we adopted convolutional neural networks (CNNs) which are part of a state-of-art approach that analyzes image patterns to estimate TC intensity by mimicking human cloud pattern recognition. Both two dimensional-CNN (2D-CNN) and three-dimensional-CNN (3D-CNN) were used to analyze the relationship between multi-spectral geostationary satellite images and TC intensity. Our best-optimized model produced a root mean squared error (RMSE) of 8.32 kts, resulting in better performance (~35%) than the existing model using the CNN-based approach with a single channel image. Moreover, we analyzed the characteristics of multi-spectral satellite-based TC images according to intensity using a heat map, which is one of the visualization means of CNNs. It shows that the stronger the intensity of the TC, the greater the influence of the TC center in the lower atmosphere. This is consistent with the results from the existing TC initialization method with numerical simulations based on dynamical TC models. Our study suggests the possibility that a deep learning approach can be used to interpret the behavior characteristics of TCs.


Author(s):  
Ainaz Hajimoradlou ◽  
Gioachino Roberti ◽  
David Poole

Landslides, movement of soil and rock under the influence of gravity, are common phenomena that cause significant human and economic losses every year. Experts use heterogeneous features such as slope, elevation, land cover, lithology, rock age, and rock family to predict landslides. To work with such features, we adapted convolutional neural networks to consider relative spatial information for the prediction task. Traditional filters in these networks either have a fixed orientation or are rotationally invariant. Intuitively, the filters should orient uphill, but there is not enough data to learn the concept of uphill; instead, it can be provided as prior knowledge. We propose a model called Locally Aligned Convolutional Neural Network, LACNN, that follows the ground surface at multiple scales to predict possible landslide occurrence for a single point. To validate our method, we created a standardized dataset of georeferenced images consisting of the heterogeneous features as inputs, and compared our method to several baselines, including linear regression, a neural network, and a convolutional network, using log-likelihood error and Receiver Operating Characteristic curves on the test set. Our model achieves 2-7% improvement in terms of accuracy and 2-15% boost in terms of log likelihood compared to the other proposed baselines.


Author(s):  
Ryan Hogan ◽  
Christoforos Christoforou

To inform a proper diagnosis and understanding of Alzheimer’s Disease (AD), deep learning has emerged as an alternate approach for detecting physical brain changes within magnetic resonance imaging (MRI). The advancement of deep learning within biomedical imaging, particularly in MRI scans, has proven to be an efficient resource for abnormality detection while utilizing convolutional neural networks (CNN) to perform feature mapping within multilayer perceptrons. In this study, we aim to test the feasibility of using three-dimensional convolutional neural networks to identify neurophysiological degeneration in the entire-brain scans that differentiate between AD patients and controls. In particular, we propose and train a 3D-CNN model to classify between MRI scans of cognitively-healthy individuals and AD patients. We validate our proposed model on a large dataset composed of more than seven hundred MRI scans (half AD). Our results show a validation accuracy of 79% which is at par with the current state-of-the-art. The benefits of our proposed 3D network are that it can assist in the exploration and detection of AD by mapping the complex heterogeneity of the brain, particularly in the limbic system and temporal lobe. The goal of this research is to measure the efficacy and predictability of 3D convolutional networks in detecting the progression of neurodegeneration within MRI brain scans of HC and AD patients.


Author(s):  
Ilyenko Anna ◽  
◽  
Ilyenko Sergii ◽  
Herasymenko Marharyta

During the research, the analysis of the existing biometric cryptographic systems was carried out. Some methods that help to generate biometric features were considered and compared with a cryptographic key. For comparing compact vectors of biometric images and cryptographic keys, the following methods are analyzed: designing and training of bidirectional associative memory; designing and training of single-layer and multilayer neural networks. As a result of comparative analysis of algorithms for extracting primary biometric features and comparing the generated image to a private key within the proposed authentication system, it was found that deep convolutional networks and neural network bidirectional associative memory are the most effective approach to process the data. In the research, an approach based on the integration of a biometric system and a cryptographic module was proposed, which allows using of a generated secret cryptographic key based on a biometric sample as the output of a neural network. The RSA algorithm is chosen to generate a private cryptographic key by use of convolutional neural networks and Python libraries. The software authentication module is implemented based on the client-server architecture using various internal Python libraries. Such authentication system should be used in systems where the user data and his valuable information resources are stored or where the user can perform certain valuable operations for which a cryptographic key is required. Proposed software module based on convolutional neural networks will be a perfect tool for ensuring the confidentiality of information and for all information-communication systems, because protecting information system from unauthorized access is one of the most pressing problems. This approach as software module solves the problem of secure generating and storing the secret key and author propose combination of the convolutional neural network with bidirectional associative memory, which is used to recognize the biometric sample, generate the image, and match it with a cryptographic key. The use of this software approach allows today to reduce the probability of errors of the first and second kind in authentication system and absolute number of errors was minimized by an average of 1,5 times. The proportion of correctly recognized images by the comparating together convolutional networks and neural network bidirectional associative memory in the authentication software module increased to 96,97%, which is on average from 1,08 times up to 1,01 times The authors further plan a number of scientific and technical solutions to develop and implement effective methods, tools to meet the requirements, principles and approaches to cybersecurity and cryptosystems for provide integrity and onfidentiality of information in experimental computer systems and networks.


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Eduardo Carabez ◽  
Miho Sugi ◽  
Isao Nambu ◽  
Yasuhiro Wada

From allowing basic communication to move through an environment, several attempts are being made in the field of brain-computer interfaces (BCI) to assist people that somehow find it difficult or impossible to perform certain activities. Focusing on these people as potential users of BCI, we obtained electroencephalogram (EEG) readings from nine healthy subjects who were presented with auditory stimuli via earphones from six different virtual directions. We presented the stimuli following the oddball paradigm to elicit P300 waves within the subject’s brain activity for later identification and classification using convolutional neural networks (CNN). The CNN models are given a novel single trial three-dimensional (3D) representation of the EEG data as an input, maintaining temporal and spatial information as close to the experimental setup as possible, a relevant characteristic as eliciting P300 has been shown to cause stronger activity in certain brain regions. Here, we present the results of CNN models using the proposed 3D input for three different stimuli presentation time intervals (500, 400, and 300 ms) and compare them to previous studies and other common classifiers. Our results show >80% accuracy for all the CNN models using the proposed 3D input in single trial P300 classification.


2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Qiang Lan ◽  
Zelong Wang ◽  
Mei Wen ◽  
Chunyuan Zhang ◽  
Yijie Wang

Convolutional neural networks have proven to be highly successful in applications such as image classification, object tracking, and many other tasks based on 2D inputs. Recently, researchers have started to apply convolutional neural networks to video classification, which constitutes a 3D input and requires far larger amounts of memory and much more computation. FFT based methods can reduce the amount of computation, but this generally comes at the cost of an increased memory requirement. On the other hand, the Winograd Minimal Filtering Algorithm (WMFA) can reduce the number of operations required and thus can speed up the computation, without increasing the required memory. This strategy was shown to be successful for 2D neural networks. We implement the algorithm for 3D convolutional neural networks and apply it to a popular 3D convolutional neural network which is used to classify videos and compare it to cuDNN. For our highly optimized implementation of the algorithm, we observe a twofold speedup for most of the 3D convolution layers of our test network compared to the cuDNN version.


2021 ◽  
Author(s):  
Ankur Gupta ◽  
Krishnan Raghavachari

<p>Deep learning methods provide a novel way to establish a correlation between two quantities. In this context, computer vision techniques like 3D-Convolutional Neural Networks (3D-CNN) become a natural choice to associate a molecular property with its structure due to the inherent three-dimensional nature of a molecule. However, traditional 3D input data structures are intrinsically sparse in nature, which tend to induce instabilities during the learning process, which in turn may lead to under-fitted results. To address this deficiency, in this project, we propose to use quantum-chemically derived molecular topological features, namely, Localized Orbital Locator (LOL) and Electron Localization Function (ELF), as molecular descriptors, which provide a relatively denser input representation in three-dimensional space. Such topological features provide a detailed picture of the atomic configuration and inter-atomic interactions in the molecule and are thus ideal for predicting properties that are highly dependent on molecular geometry. Herein, we demonstrate the efficacy of our proposed model by applying it to the task of predicting atomization energies for the QM9-G4MP2 dataset, which contains ~134-k molecules. Furthermore, we incorporated the Δ-ML approach into our model, allowing us to reach beyond benchmark accuracy levels (~1.0 kJ mol<sup>−1</sup>).<sup> </sup>We consistently obtain impressive MAEs of the order 0.1 kcal mol<sup>−1</sup> (~ 0.42 kJ mol<sup>−1</sup>) <i>versus</i> G4(MP2) theory using relatively modest models, which could potentially be improved further using additional compute resources.</p>


2021 ◽  
Author(s):  
Ankur Gupta ◽  
Krishnan Raghavachari

<p>Deep learning methods provide a novel way to establish a correlation between two quantities. In this context, computer vision techniques like 3D-Convolutional Neural Networks (3D-CNN) become a natural choice to associate a molecular property with its structure due to the inherent three-dimensional nature of a molecule. However, traditional 3D input data structures are intrinsically sparse in nature, which tend to induce instabilities during the learning process, which in turn may lead to under-fitted results. To address this deficiency, in this project, we propose to use quantum-chemically derived molecular topological features, namely, Localized Orbital Locator (LOL) and Electron Localization Function (ELF), as molecular descriptors, which provide a relatively denser input representation in three-dimensional space. Such topological features provide a detailed picture of the atomic configuration and inter-atomic interactions in the molecule and are thus ideal for predicting properties that are highly dependent on molecular geometry. Herein, we demonstrate the efficacy of our proposed model by applying it to the task of predicting atomization energies for the QM9-G4MP2 dataset, which contains ~134-k molecules. Furthermore, we incorporated the Δ-ML approach into our model, allowing us to reach beyond benchmark accuracy levels (~1.0 kJ mol<sup>−1</sup>).<sup> </sup>We consistently obtain impressive MAEs of the order 0.1 kcal mol<sup>−1</sup> (~ 0.42 kJ mol<sup>−1</sup>) <i>versus</i> G4(MP2) theory using relatively modest models, which could potentially be improved further using additional compute resources.</p>


2018 ◽  
Author(s):  
Edouard A Hay ◽  
Raghuveer Parthasarathy

AbstractThree-dimensional microscopy is increasingly prevalent in biology due to the development of techniques such as multiphoton, spinning disk confocal, and light sheet fluorescence microscopies. These methods enable unprecedented studies of life at the microscale, but bring with them larger and more complex datasets. New image processing techniques are therefore called for to analyze the resulting images in an accurate and efficient manner. Convolutional neural networks are becoming the standard for classification of objects within images due to their accuracy and generalizability compared to traditional techniques. Their application to data derived from 3D imaging, however, is relatively new and has mostly been in areas of magnetic resonance imaging and computer tomography. It remains unclear, for images of discrete cells in variable backgrounds as are commonly encountered in fluorescence microscopy, whether convolutional neural networks provide sufficient performance to warrant their adoption, especially given the challenges of human comprehension of their classification criteria and their requirements of large training datasets. We therefore applied a 3D convolutional neural network to distinguish bacteria and non-bacterial objects in 3D light sheet fluorescence microscopy images of larval zebrafish intestines. We find that the neural network is as accurate as human experts, outperforms random forest and support vector machine classifiers, and generalizes well to a different bacterial species through the use of transfer learning. We also discuss network design considerations, and describe the dependence of accuracy on dataset size and data augmentation. We provide source code, labeled data, and descriptions of our analysis pipeline to facilitate adoption of convolutional neural network analysis for three-dimensional microscopy data.Author summaryThe abundance of complex, three dimensional image datasets in biology calls for new image processing techniques that are both accurate and fast. Deep learning techniques, in particular convolutional neural networks, have achieved unprecedented accuracies and speeds across a large variety of image classification tasks. However, it is unclear whether or not their use is warranted in noisy, heterogeneous 3D microscopy datasets, especially considering their requirements of large, labeled datasets and their lack of comprehensible features. To asses this, we provide a case study, applying convolutional neural networks as well as feature-based methods to light sheet fluorescence microscopy datasets of bacteria in the intestines of larval zebrafish. We find that the neural network is as accurate as human experts, outperforms the feature-based methods, and generalizes well to a different bacterial species through the use of transfer learning.


Author(s):  
Muhammad Hanif Ahmad Nizar ◽  
Chow Khuen Chan ◽  
Azira Khalil ◽  
Ahmad Khairuddin Mohamed Yusof ◽  
Khin Wee Lai

Background: Valvular heart disease is a serious disease leading to mortality and increasing medical care cost. The aortic valve is the most common valve affected by this disease. Doctors rely on echocardiogram for diagnosing and evaluating valvular heart disease. However, the images from echocardiogram are poor in comparison to Computerized Tomography and Magnetic Resonance Imaging scan. This study proposes the development of Convolutional Neural Networks (CNN) that can function optimally during a live echocardiographic examination for detection of the aortic valve. An automated detection system in an echocardiogram will improve the accuracy of medical diagnosis and can provide further medical analysis from the resulting detection. Methods: Two detection architectures, Single Shot Multibox Detector (SSD) and Faster Regional based Convolutional Neural Network (R-CNN) with various feature extractors were trained on echocardiography images from 33 patients. Thereafter, the models were tested on 10 echocardiography videos. Results: Faster R-CNN Inception v2 had shown the highest accuracy (98.6%) followed closely by SSD Mobilenet v2. In terms of speed, SSD Mobilenet v2 resulted in a loss of 46.81% in framesper- second (fps) during real-time detection but managed to perform better than the other neural network models. Additionally, SSD Mobilenet v2 used the least amount of Graphic Processing Unit (GPU) but the Central Processing Unit (CPU) usage was relatively similar throughout all models. Conclusion: Our findings provide a foundation for implementing a convolutional detection system to echocardiography for medical purposes.


Sign in / Sign up

Export Citation Format

Share Document