scholarly journals A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification

2021 ◽  
Vol 11 (11) ◽  
pp. 4880
Author(s):  
Abigail Copiaco ◽  
Christian Ritz ◽  
Nidhal Abdulaziz ◽  
Stefano Fasciani

Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.

Author(s):  
R. Istrate ◽  
F. Scheidegger ◽  
G. Mariani ◽  
D. Nikolopoulos ◽  
C. Bekas ◽  
...  

In recent years an increasing number of researchers and practitioners have been suggesting algorithms for large-scale neural network architecture search: genetic algorithms, reinforcement learning, learning curve extrapolation, and accuracy predictors. None of them, however, demonstrated highperformance without training new experiments in the presence of unseen datasets. We propose a new deep neural network accuracy predictor, that estimates in fractions of a second classification performance for unseen input datasets, without training. In contrast to previously proposed approaches, our prediction is not only calibrated on the topological network information, but also on the characterization of the dataset-difficulty which allows us to re-tune the prediction without any training. Our predictor achieves a performance which exceeds 100 networks per second on a single GPU, thus creating the opportunity to perform large-scale architecture search within a few minutes. We present results of two searches performed in 400 seconds on a single GPU. Our best discovered networks reach 93.67% accuracy for CIFAR-10 and 81.01% for CIFAR-100, verified by training. These networks are performance competitive with other automatically discovered state-of-the-art networks however we only needed a small fraction of the time to solution and computational resources.


2021 ◽  
Vol 2 (4) ◽  
Author(s):  
Sarun Paisarnsrisomsuk ◽  
Carolina Ruiz ◽  
Sergio A. Alvarez

AbstractDeep neural networks can provide accurate automated classification of human sleep signals into sleep stages that enables more effective diagnosis and treatment of sleep disorders. We develop a deep convolutional neural network (CNN) that attains state-of-the-art sleep stage classification performance on input data consisting of human sleep EEG and EOG signals. Nested cross-validation is used for optimal model selection and reliable estimation of out-of-sample classification performance. The resulting network attains a classification accuracy of $$84.50 \pm 0.13\%$$ 84.50 ± 0.13 % ; its performance exceeds human expert inter-scorer agreement, even on single-channel EEG input data, therefore providing more objective and consistent labeling than human experts demonstrate as a group. We focus on analyzing the learned internal data representations of our network, with the aim of understanding the development of class differentiation ability across the layers of processing units, as a function of layer depth. We approach this problem visually, using t-Stochastic Neighbor Embedding (t-SNE), and propose a pooling variant of Centered Kernel Alignment (CKA) that provides an objective quantitative measure of the development of sleep stage specialization and differentiation with layer depth. The results reveal a monotonic progression of both of these sleep stage modeling abilities as layer depth increases.


Diagnostics ◽  
2020 ◽  
Vol 10 (2) ◽  
pp. 110 ◽  
Author(s):  
Pius Kwao Gadosey ◽  
Yujian Li ◽  
Enock Adjei Agyekum ◽  
Ting Zhang ◽  
Zhaoying Liu ◽  
...  

During image segmentation tasks in computer vision, achieving high accuracy performance while requiring fewer computations and faster inference is a big challenge. This is especially important in medical imaging tasks but one metric is usually compromised for the other. To address this problem, this paper presents an extremely fast, small and computationally effective deep neural network called Stripped-Down UNet (SD-UNet), designed for the segmentation of biomedical data on devices with limited computational resources. By making use of depthwise separable convolutions in the entire network, we design a lightweight deep convolutional neural network architecture inspired by the widely adapted U-Net model. In order to recover the expected performance degradation in the process, we introduce a weight standardization algorithm with the group normalization method. We demonstrate that SD-UNet has three major advantages including: (i) smaller model size (23x smaller than U-Net); (ii) 8x fewer parameters; and (iii) faster inference time with a computational complexity lower than 8M floating point operations (FLOPs). Experiments on the benchmark dataset of the Internatioanl Symposium on Biomedical Imaging (ISBI) challenge for segmentation of neuronal structures in electron microscopic (EM) stacks and the Medical Segmentation Decathlon (MSD) challenge brain tumor segmentation (BRATs) dataset show that the proposed model achieves comparable and sometimes better results compared to the current state-of-the-art.


Author(s):  
Qingjun Wang ◽  
Peng Lu

With the continuous expansion of the application scope of computer network technology, various malicious attacks that exist in the Internet range have caused serious harm to computer users and network resources. This paper attempts to apply artificial intelligence (AI) to computer network technology and research on the application of AI in computing network technology. Designing an intrusion detection model based on improved back propagation (BP) neural network. By studying the attack principle, analyzing the characteristics of the attack method, extracting feature data, establishing feature sets, and using the agent technology as the supporting technology, the simulation experiment is used to prove the improvement effect of the system in terms of false alarm rate, convergence speed, and false negative rate, the rate reached 86.7%. The results show that this fast algorithm reduces the training time of the network, reduces the network size, improves the classification performance, and improves the intrusion detection rate.


2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
Shengkun Xie ◽  
Sridhar Krishnan

Classification of electroencephalography (EEG) is the most useful diagnostic and monitoring procedure for epilepsy study. A reliable algorithm that can be easily implemented is the key to this procedure. In this paper a novel signal feature extraction method based on dynamic principal component analysis and nonoverlapping moving window is proposed. Along with this new technique, two detection methods based on extracted sparse features are applied to deal with signal classification. The obtained results demonstrated that our proposed methodologies are able to differentiate EEGs from controls and interictal for epilepsy diagnosis and to separate EEGs from interictal and ictal for seizure detection. Our approach yields high classification accuracy for both single-channel short-term EEGs and multichannel long-term EEGs. The classification performance of the method is also compared with other state-of-the-art techniques on the same datasets and the effect of signal variability on the presented methods is also studied.


2013 ◽  
Vol 13 (3) ◽  
pp. 142-151 ◽  
Author(s):  
Muhammad Ibn Ibrahimy ◽  
Rezwanul Ahsan ◽  
Othman Omran Khalifa

This paper presents an application of artificial neural network for the classification of single channel EMG signal in the context of hand motion detection. Seven statistical input features that are extracted from the preprocessed single channel EMG signals recorded for four predefined hand motions have been used for neural network classifier. Different structures of neural network, based on the number of hidden neurons and two prominent training algorithms, have been considered in the research to find out their applicability for EMG signal classification. The classification performances are analyzed for different architectures of neural network by considering the number of input features, number of hidden neurons, learning algorithms, correlation between network outputs and targets, and mean square error. Between the Levenberg-Marquardt and scaled conjugate gradient learning algorithms, the aforesaid algorithm shows better classification performance. The outcomes of the research show that the optimal design of Levenberg-Marquardt based neural network classifier can perform well with an average classification success rate of 88.4%. A comparison of results has also been presented to validate the effectiveness of the designed neural network classifier to discriminate EMG signals.


2016 ◽  
Vol 15 (06) ◽  
pp. 1313-1343 ◽  
Author(s):  
Rahime Ceylan ◽  
Hasan Koyuncu

Neural Network (NN) is an effective classifier, but it generally uses the Backpropagation type algorithms which are insufficient because of trapping to local minimum of error rate. For elimination of this handicap, stochastic optimization algorithms are used to update the parameters of NN. Particle Swarm Optimization (PSO) is one of these providing a robust coherence with NN. In realized studies about Hybrid PSO-NN, position and velocity boundaries of weight and bias are chosen equal or set free in space which leave the performance of PSO-NN in suspense. In this paper, the limitations of weight velocity (wv), weight position (wp), bias velocity (bv) and bias position (bp) are diversely changed and their effects on the output of hybrid structure are examined. Concerning this, the formed structure is called as Bounded PSO-NN on account of adjusting the optimum operating conditions (intervals). On performance evaluation, proposed method is tested on binary and multiclass pattern classification by using six medical datasets: Wisconsin Breast Cancer (WBC), Pima Indian Diabetes (PID), Bupa Liver Disorders (BLD), Heart Statlog (HS), Breast Tissue (BT) and Dermatology Data (DD). Upon analyzing the results, it was revealed that Bounded PSO-NN has a faster processing time than general PSO-NNs in which set-free and wpi[Formula: see text]bpi and wvi[Formula: see text]bvi conditions are settled. The superiority in terms of processing time is about 199[Formula: see text]s (set-free) and 307[Formula: see text]s (wpi[Formula: see text]bpi and wvi[Formula: see text]bvi) for training, about 16[Formula: see text]ms (set-free) and 9[Formula: see text]ms (wpi[Formula: see text]bpi and wvi[Formula: see text]bvi) for test. In terms of classification performance, PSO-NN (set-free condition), PSO-NN (wpi[Formula: see text]bpi & wvi[Formula: see text]bvi) and PSO-NN with individual boundary adjustment (bounded PSO-NN) respectively achieves to accuracy rates as 69.84%, 95.31% and 97.22% on WBC, 47.01%, 76.69% and 77.73% on PID, 55.36%, 67.54% and 73.91% on BLD, 64.82%, 81.48% and 85.56% on HS, 75%, 92.31% and 100% on BT, 27.47%, 92.31% and 100% on DD. In the light of experiments, it is seen that Bounded PSO-NN is better than general PSO-NNs for obtaining the optimum results. Consequently, the importance of limitations is clarified and it is proven that each limitation must be adjusted individually, not be set free or not be chosen equal.


2017 ◽  
Author(s):  
Pierre Peterlongo ◽  
Chloé Riou ◽  
Erwan Drezen ◽  
Claire Lemaitre

AbstractMotivationNext Generation Sequencing (NGS) data provide an unprecedented access to life mechanisms. In particular, these data enable to detect polymorphisms such as SNPs and indels. As these polymorphisms represent a fundamental source of information in agronomy, environment or medicine, their detection in NGS data is now a routine task. The main methods for their prediction usually need a reference genome. However, non-model organisms and highly divergent genomes such as in cancer studies are extensively investigated.ResultsWe propose DiscoSnp++, in which we revisit the DiscoSnp algorithm. DiscoSnp++ is designed for detecting and ranking all kinds of SNPs and small indels from raw read set(s). It outputs files in fasta and VCF formats. In particular, predicted variants can be automatically localized afterwards on a reference genome if available. Its usage is extremely simple and its low resource requirements make it usable on common desktop computers. Results show that DiscoSnp++ performs better than state-of-the-art methods in terms of computational resources and in terms of results quality. An important novelty is the de novo detection of indels, for which we obtained 99% precision when calling indels on simulated human datasets and 90% recall on high confident indels from the Platinum dataset.LicenseGNU Affero general public licenseAvailabilityhttps://github.com/GATB/[email protected]


2021 ◽  
Author(s):  
Mehmet Bilal ER ◽  
Esme ISIK ◽  
Ibrahim ISIK

Abstract The dysfunction of the cells in the brain that contain the substance known as dopamine, which enables brain cells to interact with each other, results in Parkinson's disease (PD). PD can cause many non-motor and motor symptoms such as speech and smell. One of the difficulties that Parkinson’s patients can experience is a change in speech or speaking difficulties. Therefore, the right diagnosis in the early period is important in reducing the possible effects of speech disorders caused by the disease. Speech signal of Parkinson patients shows major differences compared to normal people. In this study, a new approach based on pre-trained deep networks and Long short-term memory (LSTM) by using mel-spectrograms obtained from denoised speech signals with Variational Mode Decomposition (VMD) for detecting PD from speech sounds is proposed. The proposed model consists of four stages. In the first step, the noise is removed by applying VMD to the signals. In the second stage, Mel-spectrograms are extracted from the enhanced sound signals with VMD. In the third stage, pre-trained deep networks are preferred to extract deep features from the Mel-spectrograms. For this purpose, ResNet-18, ResNet-50 and ResNet-101 models are used as pre-trained deep network architecture. In the last step, the classification process is occured by giving these features as input to the LSTM model, which is designed to define sequential information from the extracted features. Experiments are performed with the PC-GITA dataset, which consists of two classes and is widely used in the literature. The results obtained from the proposed method are compared with the latest methods in the literature, it is seen that it has a better performance in terms of classification performance.


Author(s):  
E. Barnefske ◽  
H. Sternberg

<p><strong>Abstract.</strong> Point clouds give a very detailed and sometimes very accurate representation of the geometry of captured objects. In surveying, point clouds captured with laser scanners or camera systems are an intermediate result that must be processed further. Often the point cloud has to be divided into regions of similar types (object classes) for the next process steps. These classifications are very time-consuming and cost-intensive compared to acquisition. In order to automate this process step, conventional neural networks (ConvNet), which take over the classification task, are investigated in detail. In addition to the network architecture, the classification performance of a ConvNet depends on the training data with which the task is learned. This paper presents and evaluates the point clould classification tool (PCCT) developed at HCU Hamburg. With the PCCT, large point cloud collections can be semi-automatically classified. Furthermore, the influence of erroneous points in three-dimensional point clouds is investigated. The network architecture PointNet is used for this investigation.</p>


Sign in / Sign up

Export Citation Format

Share Document