3d cnn
Recently Published Documents


TOTAL DOCUMENTS

379
(FIVE YEARS 338)

H-INDEX

14
(FIVE YEARS 10)

2022 ◽  
Vol 72 ◽  
pp. 103334
Author(s):  
Li Kang ◽  
Ziqi Zhou ◽  
Jianjun Huang ◽  
Wenzhong Han
Keyword(s):  

2022 ◽  
Vol 14 (2) ◽  
pp. 359
Author(s):  
Ali Jamali ◽  
Masoud Mahdianpari

The use of machine learning algorithms to classify complex landscapes has been revolutionized by the introduction of deep learning techniques, particularly in remote sensing. Convolutional neural networks (CNNs) have shown great success in the classification of complex high-dimensional remote sensing imagery, specifically in wetland classification. On the other hand, the state-of-the-art natural language processing (NLP) algorithms are transformers. Although the transformers have been studied for a few remote sensing applications, the integration of deep CNNs and transformers has not been studied, particularly in wetland mapping. As such, in this study, we explore the potential and possible limitations to be overcome regarding the use of a multi-model deep learning network with the integration of a modified version of the well-known deep CNN network of VGG-16, a 3D CNN network, and Swin transformer for complex coastal wetland classification. Moreover, we discuss the potential and limitation of the proposed multi-model technique over several solo models, including a random forest (RF), support vector machine (SVM), VGG-16, 3D CNN, and Swin transformer in the pilot site of Saint John city located in New Brunswick, Canada. In terms of F-1 score, the multi-model network obtained values of 0.87, 0.88, 0.89, 0.91, 0.93, 0.93, and 0.93 for the recognition of shrub wetland, fen, bog, aquatic bed, coastal marsh, forested wetland, and freshwater marsh, respectively. The results suggest that the multi-model network is superior to other solo classifiers from 3.36% to 33.35% in terms of average accuracy. Results achieved in this study suggest the high potential for integrating and using CNN networks with the cutting-edge transformers for the classification of complex landscapes in remote sensing.


Author(s):  
S. El Kohli ◽  
Y. Jannaj ◽  
M. Maanan ◽  
H. Rhinane

Abstract. Cheating in exams is a worldwide phenomenon that hinders efforts to assess the skills and growth of students. With scientific and technological progress, it has become possible to develop detection systems in particular a system to monitor the movements and gestures of the candidates during the exam. Individually or collectively. Deep learning (DL) concepts are widely used to investigate image processing and machine learning applications. Our system is based on the advances in artificial intelligence, particularly 3D Convolutional Neural Network (3D CNN), object detector methods, OpenCV and especially Google Tensor Flow, to provides a real-time optimized Computer Vision. The proposal approach, we provide a detection system able to predict fraud during exams. Using the 3D CNN to generate a model from 7,638 selected images and objects detector to identify prohibited things. These experimental studies provide a detection performance with 95% accuracy of correlation between the training and validation data set.


2022 ◽  
Vol 70 (3) ◽  
pp. 4675-4690
Author(s):  
Muneeb Ur Rehman ◽  
Fawad Ahmed ◽  
Muhammad Attique Khan ◽  
Usman Tariq ◽  
Faisal Abdulaziz Alfouzan ◽  
...  

2022 ◽  
Vol 70 (2) ◽  
pp. 2655-2677
Author(s):  
Shoroog Khenkar ◽  
Salma Kammoun Jarraya
Keyword(s):  

2021 ◽  
Vol 12 (1) ◽  
pp. 174
Author(s):  
Byungjin Kang ◽  
Inho Park ◽  
Changmin Ok ◽  
Sungho Kim

Recently, hyperspectral image (HSI) classification using deep learning has been actively studied using 2D and 3D convolution neural networks (CNN). However, they learn spatial information as well as spectral information. These methods can increase the accuracy of classification, but do not only focus on the spectral information, which is a big advantage of HSI. In addition, the 1D-CNN, which learns only pure spectral information, has limitations because it uses adjacent spectral information. In this paper, we propose a One Dimensional Parellel Atrous Convolution Neural Network (ODPA-CNN) that learns not only adjacent spectral information for HSI classification, but also spectral information from a certain distance. It extracts features in parallel to account for bands of varying distances. The proposed method excludes spatial information such as the shape of an object and performs HSI classification only with spectral information about the material of the object. Atrous convolution is not a convolution of adjacent spectral information, but a convolution between spectral information separated by a certain distance. We compare the proposed model with various datasets to the other models. We also test with the data we have taken ourselves. Experimental results show a higher performance than some 3D-CNN models and other 1D-CNN methods. In addition, using datasets to which random space is applied, the vulnerabilities of 3D-CNN are identified, and the proposed model is shown to be robust to datasets with little spatial information.


Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 72
Author(s):  
Sanghun Jeon ◽  
Ahmed Elsharkawy ◽  
Mun Sang Kim

In visual speech recognition (VSR), speech is transcribed using only visual information to interpret tongue and teeth movements. Recently, deep learning has shown outstanding performance in VSR, with accuracy exceeding that of lipreaders on benchmark datasets. However, several problems still exist when using VSR systems. A major challenge is the distinction of words with similar pronunciation, called homophones; these lead to word ambiguity. Another technical limitation of traditional VSR systems is that visual information does not provide sufficient data for learning words such as “a”, “an”, “eight”, and “bin” because their lengths are shorter than 0.02 s. This report proposes a novel lipreading architecture that combines three different convolutional neural networks (CNNs; a 3D CNN, a densely connected 3D CNN, and a multi-layer feature fusion 3D CNN), which are followed by a two-layer bi-directional gated recurrent unit. The entire network was trained using connectionist temporal classification. The results of the standard automatic speech recognition evaluation metrics show that the proposed architecture reduced the character and word error rates of the baseline model by 5.681% and 11.282%, respectively, for the unseen-speaker dataset. Our proposed architecture exhibits improved performance even when visual ambiguity arises, thereby increasing VSR reliability for practical applications.


Sign in / Sign up

Export Citation Format

Share Document