scholarly journals Feature Learning with Multi-objective Evolutionary Computation in the generation of Acoustic Features

2019 ◽  
Vol 22 (64) ◽  
pp. 14-35
Author(s):  
José Antonio Alves Menezes ◽  
Giordano Cabral ◽  
Bruno Gomes ◽  
Paulo Pereira

To choice audio features has been a very interesting theme for audio classification experts. They have seen that this process is probably the most important effort to solve the classification problem. In this sense, there are techniques of Feature Learning for generate new features more suitable for classification model than conventional features. However, these techniques generally do not depend on knowledge domain and they can apply in various types of raw data. However, less agnostic approaches learn a type of knowledge restricted to the area studded. The audio data requires a specific knowledge type. There are many techniques that seek to improve the performance of the new generation of acoustic features, among which stands the technique that use evolutionary algorithms to explore analytical space of function. However, the efforts made leave opportunities for improvement. The purpose of this work is to propose and evaluate a multi-objective alternative to the exploitation of analytical audio features. In addition, experiments were arranged to be validated the method, with the help a computational prototype that implemented the proposed solution. After it was found the effectiveness of the model and ensuring that there is still opportunity for improvement in the chosen segment.

Symmetry ◽  
2020 ◽  
Vol 12 (11) ◽  
pp. 1822
Author(s):  
Zohaib Mushtaq ◽  
Shun-Feng Su

Over the past few years, the study of environmental sound classification (ESC) has become very popular due to the intricate nature of environmental sounds. This paper reports our study on employing various acoustic features aggregation and data enhancement approaches for the effective classification of environmental sounds. The proposed data augmentation techniques are mixtures of the reinforcement, aggregation, and combination of distinct acoustics features. These features are known as spectrogram image features (SIFs) and retrieved by different audio feature extraction techniques. All audio features used in this manuscript are categorized into two groups: one with general features and the other with Mel filter bank-based acoustic features. Two novel and innovative features based on the logarithmic scale of the Mel spectrogram (Mel), Log (Log-Mel) and Log (Log (Log-Mel)) denoted as L2M and L3M are introduced in this paper. In our study, three prevailing ESC benchmark datasets, ESC-10, ESC-50, and Urbansound8k (Us8k) are used. Most of the audio clips in these datasets are not fully acquired with sound and include silence parts. Therefore, silence trimming is implemented as one of the pre-processing techniques. The training is conducted by using the transfer learning model DenseNet-161, which is further fine-tuned with individual optimal learning rates based on the discriminative learning technique. The proposed methodologies attain state-of-the-art outcomes for all used ESC datasets, i.e., 99.22% for ESC-10, 98.52% for ESC-50, and 97.98% for Us8k. This work also considers real-time audio data to evaluate the performance and efficiency of the proposed techniques. The implemented approaches also have competitive results on real-time audio data.


Author(s):  
Ritam Guha ◽  
Manosij Ghosh ◽  
Pawan Kumar Singh ◽  
Ram Sarkar ◽  
Mita Nasipuri

AbstractIn any multi-script environment, handwritten script classification is an unavoidable pre-requisite before the document images are fed to their respective Optical Character Recognition (OCR) engines. Over the years, this complex pattern classification problem has been solved by researchers proposing various feature vectors mostly having large dimensions, thereby increasing the computation complexity of the whole classification model. Feature Selection (FS) can serve as an intermediate step to reduce the size of the feature vectors by restricting them only to the essential and relevant features. In the present work, we have addressed this issue by introducing a new FS algorithm, called Hybrid Swarm and Gravitation-based FS (HSGFS). This algorithm has been applied over three feature vectors introduced in the literature recently—Distance-Hough Transform (DHT), Histogram of Oriented Gradients (HOG), and Modified log-Gabor (MLG) filter Transform. Three state-of-the-art classifiers, namely, Multi-Layer Perceptron (MLP), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM), are used to evaluate the optimal subset of features generated by the proposed FS model. Handwritten datasets at block, text line, and word level, consisting of officially recognized 12 Indic scripts, are prepared for experimentation. An average improvement in the range of 2–5% is achieved in the classification accuracy by utilizing only about 75–80% of the original feature vectors on all three datasets. The proposed method also shows better performance when compared to some popularly used FS models. The codes used for implementing HSGFS can be found in the following Github link: https://github.com/Ritam-Guha/HSGFS.


Author(s):  
Xiao Yang ◽  
Madian Khabsa ◽  
Miaosen Wang ◽  
Wei Wang ◽  
Ahmed Hassan Awadallah ◽  
...  

Community-based question answering (CQA) websites represent an important source of information. As a result, the problem of matching the most valuable answers to their corresponding questions has become an increasingly popular research topic. We frame this task as a binary (relevant/irrelevant) classification problem, and present an adversarial training framework to alleviate label imbalance issue. We employ a generative model to iteratively sample a subset of challenging negative samples to fool our classification model. Both models are alternatively optimized using REINFORCE algorithm. The proposed method is completely different from previous ones, where negative samples in training set are directly used or uniformly down-sampled. Further, we propose using Multi-scale Matching which explicitly inspects the correlation between words and ngrams of different levels of granularity. We evaluate the proposed method on SemEval 2016 and SemEval 2017 datasets and achieves state-of-the-art or similar performance.


2018 ◽  
Vol 101 (6) ◽  
pp. 1967-1976 ◽  
Author(s):  
Shiva Ahmadi ◽  
Ahmad Mani-Varnosfaderani ◽  
Biuck Habibi

Abstract Motor oil classification is important for quality control and the identification of oil adulteration. In this work, we propose a simple, rapid, inexpensive and nondestructive approach based on image analysis and pattern recognition techniques for the classification of nine different types of motor oils according to their corresponding color histograms. For this, we applied color histogram in different color spaces such as red green blue (RGB), grayscale, and hue saturation intensity (HSI) in order to extract features that can help with the classification procedure. These color histograms and their combinations were used as input for model development and then were statistically evaluated by using linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and support vector machine (SVM) techniques. Here, two common solutions for solving a multiclass classification problem were applied: (1) transformation to binary classification problem using a one-against-all (OAA) approach and (2) extension from binary classifiers to a single globally optimized multilabel classification model. In the OAA strategy, LDA, QDA, and SVM reached up to 97% in terms of accuracy, sensitivity, and specificity for both the training and test sets. In extension from binary case, despite good performances by the SVM classification model, QDA and LDA provided better results up to 92% for RGB-grayscale-HSI color histograms and up to 93% for the HSI color map, respectively. In order to reduce the numbers of independent variables for modeling, a principle component analysis algorithm was used. Our results suggest that the proposed method is promising for the identification and classification of different types of motor oils.


2019 ◽  
Author(s):  
Nansu Zong ◽  
Rachael Sze Nga Wong ◽  
Victoria Ngo ◽  
Yue Yu ◽  
Ning Li

AbstractMotivationDespite the existing classification- and inference-based machine learning methods that show promising results in drug-target prediction, these methods possess inevitable limitations, where: 1) results are often biased as it lacks negative samples in the classification-based methods, and 2) novel drug-target associations with new (or isolated) drugs/targets cannot be explored by inference-based methods. As big data continues to boom, there is a need to study a scalable, robust, and accurate solution that can process large heterogeneous datasets and yield valuable predictions.ResultsWe introduce a drug-target prediction method that improved our previously proposed method from the three aspects: 1) we constructed a heterogeneous network which incorporates 12 repositories and includes 7 types of biomedical entities (#20,119 entities, # 194,296 associations), 2) we enhanced the feature learning method with Node2Vec, a scalable state-of-art feature learning method, 3) we integrate the originally proposed inference-based model with a classification model, which is further fine-tuned by a negative sample selection algorithm. The proposed method shows a better result for drug–target association prediction: 95.3% AUC ROC score compared to the existing methods in the 10-fold cross-validation tests. We studied the biased learning/testing in the network-based pairwise prediction, and conclude a best training strategy. Finally, we conducted a disease specific prediction task based on 20 diseases. New drug-target associations were successfully predicted with AUC ROC in average, 97.2% (validated based on the DrugBank 5.1.0). The experiments showed the reliability of the proposed method in predicting novel drug-target associations for the disease treatment.


2020 ◽  
Author(s):  
Enrique Garcia-Ceja ◽  
Vajira Thambawita ◽  
Steven Hicks ◽  
Debesh Jha ◽  
Petter Jakobsen ◽  
...  

In this paper, we present HTAD: A Home Tasks Activities Dataset. The dataset contains wrist-accelerometer and audio data from people performing at-home tasks such as sweeping, brushing teeth, washing hands, or watching TV. These activities represent a subset of activities that are needed to be able to live independently. Being able to detect activities with wearable devices in real-time has the potential for the realization of assistive technologies with applications in different domains such as elderly care and mental health monitoring. Preliminary results show that using machine learning with the dataset leads to promising results, but also that there is still improvement potential. By making this dataset public, researchers can test different machine learning algorithms for activity recognition, especially, sensor data fusion methods.


Author(s):  
Sourabh Suke ◽  
Ganesh Regulwar ◽  
Nikesh Aote ◽  
Pratik Chaudhari ◽  
Rajat Ghatode ◽  
...  

This project describes "VoiEmo- A Speech Emotion Recognizer", a system for recognizing the emotional state of an individual from his/her speech. For example, one's speech becomes loud and fast, with a higher and wider range in pitch, when in a state of fear, anger, or joy whereas human voice is generally slow and low pitched in sadness and tiredness. We have particularly developed a classification model speech emotion detection based on Convolutional neural networks (CNNs), Support Vector Machine (SVM), Multilayer Perceptron (MLP) Classification which make predictions considering the acoustic features of speech signal such as Mel Frequency Cepstral Coefficient (MFCC). Our models have been trained to recognize seven common emotions (neutral, calm, happy, sad, angry, fearful, disgust, surprise). For training and testing the model, we have used relevant data from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset and the Toronto Emotional Speech Set (TESS) Dataset. The system is advantageous as it can provide a general idea about the emotional state of the individual based on the acoustic features of the speech irrespective of the language the speaker speaks in, moreover, it also saves time and effort. Speech emotion recognition systems have their applications in various fields like in call centers and BPOs, criminal investigation, psychiatric therapy, the automobile industry, etc.


Electronics ◽  
2021 ◽  
Vol 10 (20) ◽  
pp. 2471
Author(s):  
Iordanis Thoidis ◽  
Marios Giouvanakis ◽  
George Papanikolaou

In this study, we aim to learn highly descriptive representations for a wide set of machinery sounds and exploit this knowledge to perform condition monitoring of mechanical equipment. We propose a comprehensive feature learning approach that operates on raw audio, by supervising the formation of salient audio embeddings in latent states of a deep temporal convolutional neural network. By fusing the supervised feature learning approach with an unsupervised deep one-class neural network, we are able to model the characteristics of each source and implicitly detect anomalies in different operational states of industrial machines. Moreover, we enable the exploitation of spatial audio information in the learning process, by formulating a novel front-end processing strategy for circular microphone arrays. Experimental results on the MIMII dataset demonstrate the effectiveness of the proposed method, reaching a state-of-the-art mean AUC score of 91.0%. Anomaly detection performance is significantly improved by incorporating multi-channel audio data in the feature extraction process, as well as training the convolutional neural network on the spatially invariant front-end. Finally, the proposed semi-supervised approach allows the concise modeling of normal machine conditions and accurately detects system anomalies, compared to existing anomaly detection methods.


2021 ◽  
Author(s):  
Shrey Bansal ◽  
Mukul Singh ◽  
Rahul Dubey ◽  
Bijaya Ketan Panigrahi

Abstract In early 2020, the world is amid a significant pandemic due to the novel coronavirus disease outbreak, commonly called the COVID-19. Coronavirus is a lung infection disease caused by the Severe Acute Respiratory Syndrome Coronavirus 2 virus (SARS-CoV-2). Because of its high transmission rate, it is crucial to detect cases as soon as possible to effectively control the spread of this pandemic and treat patients in the early stages. RT-PCR-based kits are the current standard kits used for COVID-19 diagnosis, but these tests take much time despite their high precision. A faster automated diagnostic tool is required for the effective screening of COVID-19. In this study, a new semi-supervised feature learning technique is proposed to screen COVID-19 patients using chest CT Scans. The model proposed in this study uses a three-step architecture, consisting of a Convolutional Autoencoder based unsupervised feature extractor, a Multi-Objective Genetic Algorithm based feature selector, and a Bagging Ensemble of Support Vector Machines(SVMs) based classifier. The Autoencoder generates a diverse set of features from the images, and an optimal subset, free of redundant and irrelevant features, is selected by the evolutionary selector. The Ensemble of SVMs then performs the binary classification of the features. The proposed architecture has been designed to provide precise and robust diagnostics for binary classification (COVID vs.nonCOVID). A Dataset of 1252 COVID-19 CT scan images, collected from 60 patients, has been used to train and evaluate the model. The experimental results prove the superiority of the proposed methodology in comparison to existing methods.


Sign in / Sign up

Export Citation Format

Share Document