scholarly journals Comparison of Pre-Trained CNNs for Audio Classification Using Transfer Learning

2021 ◽  
Vol 10 (4) ◽  
pp. 72
Author(s):  
Eleni Tsalera ◽  
Andreas Papadakis ◽  
Maria Samarakou

The paper investigates retraining options and the performance of pre-trained Convolutional Neural Networks (CNNs) for sound classification. CNNs were initially designed for image classification and recognition, and, at a second phase, they extended towards sound classification. Transfer learning is a promising paradigm, retraining already trained networks upon different datasets. We selected three ‘Image’- and two ‘Sound’-trained CNNs, namely, GoogLeNet, SqueezeNet, ShuffleNet, VGGish, and YAMNet, and applied transfer learning. We explored the influence of key retraining parameters, including the optimizer, the mini-batch size, the learning rate, and the number of epochs, on the classification accuracy and the processing time needed in terms of sound preprocessing for the preparation of the scalograms and spectrograms as well as CNN training. The UrbanSound8K, ESC-10, and Air Compressor open sound datasets were employed. Using a two-fold criterion based on classification accuracy and time needed, we selected the ‘champion’ transfer-learning parameter combinations, discussed the consistency of the classification results, and explored possible benefits from fusing the classification estimations. The Sound CNNs achieved better classification accuracy, reaching an average of 96.4% for UrbanSound8K, 91.25% for ESC-10, and 100% for the Air Compressor dataset.

Agriculture ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. 115
Author(s):  
Blanca Dalila Pérez-Pérez ◽  
Juan Pablo García Vázquez ◽  
Ricardo Salomón-Torres

Convolutional neural networks (CNNs) have proven their efficiency in various applications in agriculture. In crops such as date, they have been mainly used in the identification and sorting of ripe fruits. The aim of this study was the performance evaluation of eight different CNNs, considering transfer learning for their training, as well as five hyperparameters. The CNN architectures evaluated were VGG-16, VGG-19, ResNet-50, ResNet-101, ResNet-152, AlexNet, Inception V3, and CNN from scratch. Likewise, the hyperparameters analyzed were the number of layers, the number of epochs, the batch size, optimizer, and learning rate. The accuracy and processing time were considered to determine the performance of CNN architectures, in the classification of mature dates’ cultivar Medjool. The model obtained from VGG-19 architecture with a batch of 128 and Adam optimizer with a learning rate of 0.01 presented the best performance with an accuracy of 99.32%. We concluded that the VGG-19 model can be used to build computer vision systems that help producers improve their sorting process to detect the Tamar stage of a Medjool date.


Electronics ◽  
2021 ◽  
Vol 10 (15) ◽  
pp. 1807
Author(s):  
Sascha Grollmisch ◽  
Estefanía Cano

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.


Author(s):  
Sebastian Nowak ◽  
Narine Mesropyan ◽  
Anton Faron ◽  
Wolfgang Block ◽  
Martin Reuter ◽  
...  

Abstract Objectives To investigate the diagnostic performance of deep transfer learning (DTL) to detect liver cirrhosis from clinical MRI. Methods The dataset for this retrospective analysis consisted of 713 (343 female) patients who underwent liver MRI between 2017 and 2019. In total, 553 of these subjects had a confirmed diagnosis of liver cirrhosis, while the remainder had no history of liver disease. T2-weighted MRI slices at the level of the caudate lobe were manually exported for DTL analysis. Data were randomly split into training, validation, and test sets (70%/15%/15%). A ResNet50 convolutional neural network (CNN) pre-trained on the ImageNet archive was used for cirrhosis detection with and without upstream liver segmentation. Classification performance for detection of liver cirrhosis was compared to two radiologists with different levels of experience (4th-year resident, board-certified radiologist). Segmentation was performed using a U-Net architecture built on a pre-trained ResNet34 encoder. Differences in classification accuracy were assessed by the χ2-test. Results Dice coefficients for automatic segmentation were above 0.98 for both validation and test data. The classification accuracy of liver cirrhosis on validation (vACC) and test (tACC) data for the DTL pipeline with upstream liver segmentation (vACC = 0.99, tACC = 0.96) was significantly higher compared to the resident (vACC = 0.88, p < 0.01; tACC = 0.91, p = 0.01) and to the board-certified radiologist (vACC = 0.96, p < 0.01; tACC = 0.90, p < 0.01). Conclusion This proof-of-principle study demonstrates the potential of DTL for detecting cirrhosis based on standard T2-weighted MRI. The presented method for image-based diagnosis of liver cirrhosis demonstrated expert-level classification accuracy. Key Points • A pipeline consisting of two convolutional neural networks (CNNs) pre-trained on an extensive natural image database (ImageNet archive) enables detection of liver cirrhosis on standard T2-weighted MRI. • High classification accuracy can be achieved even without altering the pre-trained parameters of the convolutional neural networks. • Other abdominal structures apart from the liver were relevant for detection when the network was trained on unsegmented images.


2021 ◽  
Vol 65 (1) ◽  
pp. 11-22
Author(s):  
Mengyao Lu ◽  
Shuwen Jiang ◽  
Cong Wang ◽  
Dong Chen ◽  
Tian’en Chen

HighlightsA classification model for the front and back sides of tobacco leaves was developed for application in industry.A tobacco leaf grading method that combines a CNN with double-branch integration was proposed.The A-ResNet network was proposed and compared with other classic CNN networks.The grading accuracy of eight different grades was 91.30% and the testing time was 82.180 ms, showing a relatively high classification accuracy and efficiency.Abstract. Flue-cured tobacco leaf grading is a key step in the production and processing of Chinese-style cigarette raw materials, directly affecting cigarette blend and quality stability. At present, manual grading of tobacco leaves is dominant in China, resulting in unsatisfactory grading quality and consuming considerable material and financial resources. In this study, for fast, accurate, and non-destructive tobacco leaf grading, 2,791 flue-cured tobacco leaves of eight different grades in south Anhui Province, China, were chosen as the study sample, and a tobacco leaf grading method that combines convolutional neural networks and double-branch integration was proposed. First, a classification model for the front and back sides of tobacco leaves was trained by transfer learning. Second, two processing methods (equal-scaled resizing and cropping) were used to obtain global images and local patches from the front sides of tobacco leaves. A global image-based tobacco leaf grading model was then developed using the proposed A-ResNet-65 network, and a local patch-based tobacco leaf grading model was developed using the ResNet-34 network. These two networks were compared with classic deep learning networks, such as VGGNet, GoogLeNet-V3, and ResNet. Finally, the grading results of the two grading models were integrated to realize tobacco leaf grading. The tobacco leaf classification accuracy of the final model, for eight different grades, was 91.30%, and grading of a single tobacco leaf required 82.180 ms. The proposed method achieved a relatively high grading accuracy and efficiency. It provides a method for industrial implementation of the tobacco leaf grading and offers a new approach for the quality grading of other agricultural products. Keywords: Convolutional neural network, Deep learning, Image classification, Transfer learning, Tobacco leaf grading


Author(s):  
Ch. Sanjeev Kumar Dash ◽  
Ajit Kumar Behera ◽  
Satchidananda Dehuri ◽  
Sung-Bae Cho

The classification of diseases appears as one of the fundamental problems for a medical practitioner, which might be substantially improved by intelligent systems. The present work is aimed at designing in what way an intelligent system supporting medical decision can be developed by hybridizing radial basis function neural networks (RBFNs) and differential evolution (DE). To this extent, a two phases learning algorithm with a modified kernel for radial basis function neural networks is proposed for classification. In phase one, differential evolution is used to reveal the parameters of the modified kernel. The second phase focus on optimization of weights for learning the networks. The proposed method is validated using five medical datasets such as bupa liver disorders, pima Indians diabetes, new thyroid, stalog (heart), and hepatitis. In addition, a predefined set of basis functions are considered to gain insight into, which basis function is better for what kind of domain through an empirical analysis. The experiment results indicate that the proposed method classification accuracy with 95% and 98% confidence interval is better than the base line classifier (i.e., simple RBFNs) in all aforementioned datasets. In the case of imbalanced dataset like new thyroid, the authors have noted that with 98% confidence level the classification accuracy of the proposed method based on the multi-quadratic kernel is better than other kernels; however, in the case of hepatitis, the proposed method based on cubic kernel is promising.


2020 ◽  
Vol 12 (11) ◽  
pp. 1780 ◽  
Author(s):  
Yao Liu ◽  
Lianru Gao ◽  
Chenchao Xiao ◽  
Ying Qu ◽  
Ke Zheng ◽  
...  

Convolutional neural networks (CNNs) have been widely applied in hyperspectral imagery (HSI) classification. However, their classification performance might be limited by the scarcity of labeled data to be used for training and validation. In this paper, we propose a novel lightweight shuffled group convolutional neural network (abbreviated as SG-CNN) to achieve efficient training with a limited training dataset in HSI classification. SG-CNN consists of SG conv units that employ conventional and atrous convolution in different groups, followed by channel shuffle operation and shortcut connection. In this way, SG-CNNs have less trainable parameters, whilst they can still be accurately and efficiently trained with fewer labeled samples. Transfer learning between different HSI datasets is also applied on the SG-CNN to further improve the classification accuracy. To evaluate the effectiveness of SG-CNNs for HSI classification, experiments have been conducted on three public HSI datasets pretrained on HSIs from different sensors. SG-CNNs with different levels of complexity were tested, and their classification results were compared with fine-tuned ShuffleNet2, ResNeXt, and their original counterparts. The experimental results demonstrate that SG-CNNs can achieve competitive classification performance when the amount of labeled data for training is poor, as well as efficiently providing satisfying classification results.


Computers ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 46 ◽  
Author(s):  
Markus-Oliver Tamm ◽  
Yar Muhammad ◽  
Naveed Muhammad

Imagined speech is a relatively new electroencephalography (EEG) neuro-paradigm, which has seen little use in Brain-Computer Interface (BCI) applications. Imagined speech can be used to allow physically impaired patients to communicate and to use smart devices by imagining desired commands and then detecting and executing those commands in a smart device. The goal of this research is to verify previous classification attempts made and then design a new, more efficient neural network that is noticeably less complex (fewer number of layers) that still achieves a comparable classification accuracy. The classifiers are designed to distinguish between EEG signal patterns corresponding to imagined speech of different vowels and words. This research uses a dataset that consists of 15 subjects imagining saying the five main vowels (a, e, i, o, u) and six different words. Two previous studies on imagined speech classifications are verified as those studies used the same dataset used here. The replicated results are compared. The main goal of this study is to take the proposed convolutional neural network (CNN) model from one of the replicated studies and make it much more simpler and less complex, while attempting to retain a similar accuracy. The pre-processing of data is described and a new CNN classifier with three different transfer learning methods is described and used to classify EEG signals. Classification accuracy is used as the performance metric. The new proposed CNN, which uses half as many layers and less complex pre-processing methods, achieved a considerably lower accuracy, but still managed to outperform the initial model proposed by the authors of the dataset by a considerable margin. It is recommended that further studies investigating classifying imagined speech should use more data and more powerful machine learning techniques. Transfer learning proved beneficial and should be used to improve the effectiveness of neural networks.


2021 ◽  
Vol 11 (13) ◽  
pp. 5796
Author(s):  
Loris Nanni ◽  
Gianluca Maguolo ◽  
Sheryl Brahnam ◽  
Michelangelo Paci

Research in sound classification and recognition is rapidly advancing in the field of pattern recognition. One important area in this field is environmental sound recognition, whether it concerns the identification of endangered species in different habitats or the type of interfering noise in urban environments. Since environmental audio datasets are often limited in size, a robust model able to perform well across different datasets is of strong research interest. In this paper, ensembles of classifiers are combined that exploit six data augmentation techniques and four signal representations for retraining five pre-trained convolutional neural networks (CNNs); these ensembles are tested on three freely available environmental audio benchmark datasets: (i) bird calls, (ii) cat sounds, and (iii) the Environmental Sound Classification (ESC-50) database for identifying sources of noise in environments. To the best of our knowledge, this is the most extensive study investigating ensembles of CNNs for audio classification. The best-performing ensembles are compared and shown to either outperform or perform comparatively to the best methods reported in the literature on these datasets, including on the challenging ESC-50 dataset. We obtained a 97% accuracy on the bird dataset, 90.51% on the cat dataset, and 88.65% on ESC-50 using different approaches. In addition, the same ensemble model trained on the three datasets managed to reach the same results on the bird and cat datasets while losing only 0.1% on ESC-50. Thus, we have managed to create an off-the-shelf ensemble that can be trained on different datasets and reach performances competitive with the state of the art.


2021 ◽  
Author(s):  
Jingru Fang ◽  
Bo Yin ◽  
Xiaopeng Ji ◽  
Zehua Du

Abstract Neural networks have achieved success in the task of environmental sound classification. However, the traditional neural network model has too many parameters and high computational cost. The lightweight networks solve these problems by compressing parameters, but reduce the classification accuracy. To solve the problems in existing research, we propose a two-stream model based on two lightweight convolutional neural networks, called TSLCNN-DS, which saves memory and improves the classification performance of environmental sounds. Specifically, we first used data patching and data balancing to slightly expand the amount of experimental data. Then we designed two lightweight and efficient classification networks based on the attention mechanism and residual learning. Finally, the Dempster-Shafer evidence theory is used to fuse the output of the two networks, and the two-stream model is integrated. Experiments have shown that the model has achieved a classification accuracy of 97.44% on the UrbanSound8k dataset, using only 0.12 M parameters.


2021 ◽  
Vol 11 (17) ◽  
pp. 7929
Author(s):  
Jiayi Fan ◽  
Jongwook Kim ◽  
Insu Jung ◽  
Yongkeun Lee

Diagnosis of skin diseases by human experts is a laborious task prone to subjective judgment. Aided by computer technology and machine learning, it is possible to improve the efficiency and robustness of skin disease classification. Deep transfer learning using off-the-shelf deep convolutional neural networks (CNNs) has huge potential in the automation of skin disease classification tasks. However, complicated architectures seem to be too heavy for the classification of only a few skin disease classes. In this paper, in order to study potential ways to improve the classification accuracy of skin diseases, multiple factors are investigated. First, two different off-the-shelf architectures, namely AlexNet and ResNet50, are evaluated. Then, approaches using either transfer learning or trained from scratch are compared. In order to reduce the complexity of the network, the effects of shortening the depths of deep CNNs are investigated. Furthermore, different data augmentation techniques based on basic image manipulation are compared. Finally, the choice of mini-batch size is studied. Experiments were carried out on the HAM10000 skin disease dataset. The results show that the ResNet50-based model is more accurate than the AlexNet-based model. The transferred knowledge from the ImageNet database helps to improve the accuracy of the model. The reduction in stages of the ResNet50-based model can reduce complexity while maintaining good accuracy. Additionally, the use of different types of data augmentation techniques and the choice of mini-batch size can also affect the classification accuracy of skin diseases.


Sign in / Sign up

Export Citation Format

Share Document