scholarly journals Deep Learning Frameworks Applied For Audio-Visual Scene Classification

2021 ◽  
Author(s):  
Lam Pham ◽  
Alexander Schindler ◽  
Mina Schutz ◽  
Jasmin Lampert ◽  
Sven Schlarb ◽  
...  

In this paper, we present deep learning frameworks for audio-visual scene classification (SC) and indicate how individual visual and audio features as well as their combination affect SC performance.Our extensive experiments, which are conducted on DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development dataset, achieve the best classification accuracy of 82.2\%, 91.1\%, and 93.9\% with audio input only, visual input only, and both audio-visual input, respectively.The highest classification accuracy of 93.9\%, obtained from an ensemble of audio-based and visual-based frameworks, shows an improvement of 16.5\% compared with DCASE baseline.

Information ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 249
Author(s):  
Xin Jin ◽  
Yuanwen Zou ◽  
Zhongbing Huang

The cell cycle is an important process in cellular life. In recent years, some image processing methods have been developed to determine the cell cycle stages of individual cells. However, in most of these methods, cells have to be segmented, and their features need to be extracted. During feature extraction, some important information may be lost, resulting in lower classification accuracy. Thus, we used a deep learning method to retain all cell features. In order to solve the problems surrounding insufficient numbers of original images and the imbalanced distribution of original images, we used the Wasserstein generative adversarial network-gradient penalty (WGAN-GP) for data augmentation. At the same time, a residual network (ResNet) was used for image classification. ResNet is one of the most used deep learning classification networks. The classification accuracy of cell cycle images was achieved more effectively with our method, reaching 83.88%. Compared with an accuracy of 79.40% in previous experiments, our accuracy increased by 4.48%. Another dataset was used to verify the effect of our model and, compared with the accuracy from previous results, our accuracy increased by 12.52%. The results showed that our new cell cycle image classification system based on WGAN-GP and ResNet is useful for the classification of imbalanced images. Moreover, our method could potentially solve the low classification accuracy in biomedical images caused by insufficient numbers of original images and the imbalanced distribution of original images.


2022 ◽  
Vol 10 (1) ◽  
pp. 0-0

Effective productivity estimates of fresh produced crops are very essential for efficient farming, commercial planning, and logistical support. In the past ten years, machine learning (ML) algorithms have been widely used for grading and classification of agricultural products in agriculture sector. However, the precise and accurate assessment of the maturity level of tomatoes using ML algorithms is still a quite challenging to achieve due to these algorithms being reliant on hand crafted features. Hence, in this paper we propose a deep learning based tomato maturity grading system that helps to increase the accuracy and adaptability of maturity grading tasks with less amount of training data. The performance of proposed system is assessed on the real tomato datasets collected from the open fields using Nikon D3500 CCD camera. The proposed approach achieved an average maturity classification accuracy of 99.8 % which seems to be quite promising in comparison to the other state of art methods.


2019 ◽  
Author(s):  
Sahil Nalawade ◽  
Gowtham Murugesan ◽  
Maryam Vejdani-Jahromi ◽  
Ryan A. Fisicaro ◽  
Chandan Ganesh Bangalore Yogananda ◽  
...  

AbstractIsocitrate dehydrogenase (IDH) mutation status is an important marker in glioma diagnosis and therapy. We propose a novel automated pipeline for predicting IDH status noninvasively using deep learning and T2-weighted (T2w) MR images with minimal preprocessing (N4 bias correction and normalization to zero mean and unit variance). T2w MRI and genomic data were obtained from The Cancer Imaging Archive dataset (TCIA) for 260 subjects (120 High grade and 140 Low grade gliomas). A fully automated 2D densely connected model was trained to classify IDH mutation status on 208 subjects and tested on another held-out set of 52 subjects, using 5-fold cross validation. Data leakage was avoided by ensuring subject separation during the slice-wise randomization. Mean classification accuracy of 90.5% was achieved for each axial slice in predicting the three classes of no tumor, IDH mutated and IDH wild-type. Test accuracy of 83.8% was achieved in predicting IDH mutation status for individual subjects on the test dataset of 52 subjects. We demonstrate a deep learning method to predict IDH mutation status using T2w MRI alone. Radiologic imaging studies using deep learning methods must address data leakage (subject duplication) in the randomization process to avoid upward bias in the reported classification accuracy.


Author(s):  
M. Papadomanolaki ◽  
M. Vakalopoulou ◽  
S. Zagoruyko ◽  
K. Karantzalos

In this paper we evaluated deep-learning frameworks based on Convolutional Neural Networks for the accurate classification of multispectral remote sensing data. Certain state-of-the-art models have been tested on the publicly available SAT-4 and SAT-6 high resolution satellite multispectral datasets. In particular, the performed benchmark included the <i>AlexNet</i>, <i>AlexNet-small</i> and <i>VGG</i> models which had been trained and applied to both datasets exploiting all the available spectral information. Deep Belief Networks, Autoencoders and other semi-supervised frameworks have been, also, compared. The high level features that were calculated from the tested models managed to classify the different land cover classes with significantly high accuracy rates <i>i.e.</i>, above 99.9%. The experimental results demonstrate the great potentials of advanced deep-learning frameworks for the supervised classification of high resolution multispectral remote sensing data.


2019 ◽  
Vol 9 (21) ◽  
pp. 4500 ◽  
Author(s):  
Phung ◽  
Rhee

Research on clouds has an enormous influence on sky sciences and related applications, and cloud classification plays an essential role in it. Much research has been conducted which includes both traditional machine learning approaches and deep learning approaches. Compared with traditional machine learning approaches, deep learning approaches achieved better results. However, most deep learning models need large data to train due to the large number of parameters. Therefore, they cannot get high accuracy in case of small datasets. In this paper, we propose a complete solution for high accuracy of classification of cloud image patches on small datasets. Firstly, we designed a suitable convolutional neural network (CNN) model for small datasets. Secondly, we applied regularization techniques to increase generalization and avoid overfitting of the model. Finally, we introduce a model average ensemble to reduce the variance of prediction and increase the classification accuracy. We experiment the proposed solution on the Singapore whole-sky imaging categories (SWIMCAT) dataset, which demonstrates perfect classification accuracy for most classes and confirms the robustness of the proposed model.


Author(s):  
Tong Lin ◽  
◽  
Xin Chen ◽  
Xiao Tang ◽  
Ling He ◽  
...  

This paper discusses the use of deep convolutional neural networks for radar target classification. In this paper, three parts of the work are carried out: firstly, effective data enhancement methods are used to augment the dataset and address unbalanced datasets. Second, using deep learning techniques, we explore an effective framework for classifying and identifying targets based on radar spectral map data. By using data enhancement and the framework, we achieved an overall classification accuracy of 0.946. In the end, we researched the automatic annotation of image ROI (region of interest). By adjusting the model, we obtained a 93% accuracy in automatic labeling and classification of targets for both car and cyclist categories.


2021 ◽  
Vol 924 (1) ◽  
pp. 012022
Author(s):  
Y Hendrawan ◽  
B Rohmatulloh ◽  
I Prakoso ◽  
V Liana ◽  
M R Fauzy ◽  
...  

Abstract Tempe is a traditional food originating from Indonesia, which is made from the fermentation process of soybean using Rhizopus mold. The purpose of this study was to classify three quality levels of soybean tempe i.e., fresh, consumable, and non-consumable using a convolutional neural network (CNN) based deep learning. Four types of pre-trained networks CNN were used in this study i.e. SqueezeNet, GoogLeNet, ResNet50, and AlexNet. The sensitivity analysis showed the highest quality classification accuracy of soybean tempe was 100% can be achieved when using AlexNet with SGDm optimizer and learning rate of 0.0001; GoogLeNet with Adam optimizer and learning rate 0.0001, GoogLeNet with RMSProp optimizer, and learning rate 0.0001, ResNet50 with Adam optimizer and learning rate 0.00005, ResNet50 with Adam optimizer and learning rate 0.0001, and SqueezeNet with RSMProp optimizer and learning rate 0.0001. In further testing using testing-set data, the classification accuracy based on the confusion matrix reached 98.33%. The combination of the CNN model and the low-cost digital commercial camera can later be used to detect the quality of soybean tempe with the advantages of being non-destructive, rapid, accurate, low-cost, and real-time.


Author(s):  
M. Papadomanolaki ◽  
M. Vakalopoulou ◽  
S. Zagoruyko ◽  
K. Karantzalos

In this paper we evaluated deep-learning frameworks based on Convolutional Neural Networks for the accurate classification of multispectral remote sensing data. Certain state-of-the-art models have been tested on the publicly available SAT-4 and SAT-6 high resolution satellite multispectral datasets. In particular, the performed benchmark included the <i>AlexNet</i>, <i>AlexNet-small</i> and <i>VGG</i> models which had been trained and applied to both datasets exploiting all the available spectral information. Deep Belief Networks, Autoencoders and other semi-supervised frameworks have been, also, compared. The high level features that were calculated from the tested models managed to classify the different land cover classes with significantly high accuracy rates <i>i.e.</i>, above 99.9%. The experimental results demonstrate the great potentials of advanced deep-learning frameworks for the supervised classification of high resolution multispectral remote sensing data.


2021 ◽  
Author(s):  
Lam Pham ◽  
Hieu Tang ◽  
Anahid Jalal ◽  
Alexander Schindler ◽  
Ross King

In this paper, we presents a low-complexitydeep learning frameworks for acoustic scene classification(ASC). The proposed framework can be separated into threemain steps: Front-end spectrogram extraction, back-endclassification, and late fusion of predicted probabilities.First, we use Mel filter, Gammatone filter and ConstantQ Transfrom (CQT) to transform raw audio signal intospectrograms, where both frequency and temporal featuresare presented. Three spectrograms are then fed into threeindividual back-end convolutional neural networks (CNNs),classifying into ten urban scenes. Finally, a late fusion ofthree predicted probabilities obtained from three CNNs isconducted to achieve the final classification result. To reducethe complexity of our proposed CNN network, we applytwo model compression techniques: model restriction anddecomposed convolution. Our extensive experiments, whichare conducted on DCASE 2021 (IEEE AASP Challenge onDetection and Classification of Acoustic Scenes and Events)Task 1A development dataset, achieve a low-complexity CNNbased framework with 128 KB trainable parameters andthe best classification accuracy of 66.7%, improving DCASEbaseline by 19.0%.


Sign in / Sign up

Export Citation Format

Share Document