scholarly journals Using the Fisher Vector Approach for Cold Identification

2021 ◽  
Vol 25 (2) ◽  
pp. 223-232
Author(s):  
José Vicente Egas-López ◽  
Gábor Gosztolya

In this paper, we present a computational paralinguistic method for assessing whether a person has an upper respiratory tract infection (i.e. cold) using their speech. Having a system that can accurately assess a cold can be helpful for predicting its propagation. For this purpose, we utilize Mel-frequency Cepstral Coefficients (MFCC) as audio-signal representations, extracted from the utterances, which allowed us to fit a generative Gaussian Mixture Model (GMM) that serves to produce an encoding based on the Fisher Vector (FV) approach. Here, we use the URTIC dataset provided by the organizers of the ComParE Challenge 2017 of the Interspeech Conference. The classification is done by a linear kernel Support Vector Machines (SVM); owing to the high imbalance of classes on the training dataset, we opt for undersampling the majority class, that is, to reduce the number of samples to those of the minority class. We find that applying Power Normalization (PN) and Principal Component Analysis (PCA) on the Fisher vector features is an effective strategy for the classification performance. We get better performance than that of the Bag-of-Audio-Words approach reported in the paper of the challenge.

2014 ◽  
Author(s):  
◽  
Liang Liu

Fall among elders is a main reason to cause accidental death among the population over the age 65 in United States. The fall detection methods have been brought into scene by implemented on different fall monitoring devices. For the advantages in privacy protection and non-invasive, independent of light, I design the fall detection system based on Doppler radar sensor. This dissertation explores different Doppler radar sensor configurations and positioning in both of the lab and real senior home environment, signal processing and machine learning algorithms. Firstly, I design the system based on the data collected with three configurations: two floor radars, one ceiling and one wall radars, one ceiling and one floor radars in lab. The performance of the sensor positioning and features are evaluated with classifiers: support vector machine, nearest neighbor, naïve Bayes, hidden Markov model. In the real senior home, I investigate the system by evaluating the detection variances caused by training dataset due to the variable subjects and environment settings. Moreover, I adjust the automatic fall detection system for the actual retired community apartment. I examine different features: Mel-frequency cepstral coefficients (MFCCs), local binary patterns (LBP) and the combined version of features with RELIEF algorithm. I also improve the detection performance with both pre-screener and features selection. I fuse the radar fall detection system with motion sensors. I develop a standalone fall detection system and generate a result to display on a designed webpage.


2019 ◽  
Vol 33 (35) ◽  
pp. 1950438 ◽  
Author(s):  
Manish Gupta ◽  
Shambhu Shankar Bharti ◽  
Suneeta Agarwal

Speech is a convenient medium for communication among human beings. Speaker recognition is a process of automatically recognizing the speaker by processing the information included in the speech signal. In this paper, a new approach is proposed for speaker recognition through speech signal. Here, a two-level approach is proposed. In the first-level, the gender of the speaker is recognized, and in the second-level speaker is recognized based on recognized gender at first-level. After recognizing the gender of the speaker, search space is reduced to half for the second-level as speaker recognition system searches only in a set of speech signals belonging to identified gender. To identify gender, gender-specific features: Mel Frequency Cepstral Coefficients (MFCC) and pitch are used. Speaker is recognized by using speaker specific features: MFCC, Pitch and RASTA-PLP. Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) classifiers are used for identifying the gender and recognizing the speaker, respectively. Experiments are performed on speech signals of two databases: “IIT-Madras speech synthesis and recognition” (containing speech samples spoken by eight male and eight female speakers of eight different regions in English language) and “ELSDSR” (containing speech samples spoken by five male and five female in English language). Experimentally, it is observed that by using two-level approach, time taken for speaker recognition is reduced by 30–32% as compared to the approach when speaker is recognized without identifying the gender (single-level approach). The accuracy of speaker recognition in this proposed approach is also improved from 99.7% to 99.9% as compared to single-level approach. It is concluded through the experiments that speech signal of a minimum 1.12 duration (after neglecting silence parts) is sufficient for recognizing the speaker.


Author(s):  
Ergün Yücesoy

In this study, the classification of the speakers according to age and gender was discussed. Age and gender classes were first examined separately, and then by combining these classes a classification with a total of 7 classes was made. Speech signals represented by Mel-Frequency Cepstral Coefficients (MFCC) and delta parameters were converted into Gaussian Mixture Model (GMM) mean supervectors and classified with a Support Vector Machine (SVM). While the GMM mean supervectors were formed according to the Maximum-a-posteriori (MAP) adaptive GMM-Universal Background Model (UBM) configuration, the number of components was changed from 16 to 512, and the optimum number of components was decided. Gender classification accuracy of the system developed using aGender dataset was measured as 99.02% for two classes and 92.58% for three classes and age group classification accuracy was measured as 67.03% for female and 63.79% for male. In the classification of age and gender classes together in one step, an accuracy of 61.46% was obtained. In the study, a two-level approach was proposed for classifying age and gender classes together. According to this approach, the speakers were first divided into three classes as child, male and female, then males and females were classified according to their age groups and thus a 7-class classification was realized. This two-level approach was increased the accuracy of the classification in all other cases except when 32-component GMMs were used. While the highest improvement of 2.45% was achieved with 64 component GMMs, an improvement of 0.79 was achieved with 256 component GMMs.


2020 ◽  
Vol 1 (1) ◽  
pp. 14-17
Author(s):  
Nur Aini Zakaria ◽  
Zuraini Ali Shah ◽  
Shahreen Kasim

Existence of bioinformatics is to increase the further understanding of biological process. Proteins structure is one of the major challenges in structural bioinformatics. With former knowledge of the structure, the quality of secondary structure, prediction of tertiary structure, and prediction function of amino acid from its sequence increase significantly. Recently, the gap between sequence known and structure known proteins had increase dramatically. So it is compulsory to understand on proteins structure to overcome this problem so further functional analysis could be easier. The research applying RPCA algorithm to extract the essential features from the original high-dimensional input vectors. Then the process followed by experimenting SVM with RBF kernel. The proposed method obtains accuracy by 84.41% for training dataset and 89.09% for testing dataset. The result then compared with the same method but PCA was applied as the feature extraction. The prediction assessment is conducted by analyzing the accuracy and number of principal component selected. It shows that combination of RPCA and SVM produce a high quality classification of protein structure


Foods ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1579
Author(s):  
Huanhuan Feng ◽  
Mengjie Zhang ◽  
Pengfei Liu ◽  
Yiliu Liu ◽  
Xiaoshuan Zhang

Salmon is a highly perishable food due to temperature, pH, odor, and texture changes during cold storage. Intelligent monitoring and spoilage rapid detection are effective approaches to improve freshness. The aim of this work was an evaluation of IoT-enabled monitoring system (IoTMS) and electronic nose spoilage detection for quality parameters changes and freshness under cold storage conditions. The salmon samples were analyzed and divided into three groups in an incubator set at 0 °C, 4 °C, and 6 °C. The quality parameters, i.e., texture, color, sensory, and pH changes, were measured and evaluated at different temperatures after 0, 3, 6, 9, 12, and 14 days of cold storage. The principal component analysis (PCA) algorithm can be used to cluster electronic nose information. Furthermore, a Convolutional Neural Networks and Support Vector Machine (CNN-SVM) based algorithm is used to cluster the freshness level of salmon samples stored in a specific storage condition. In the tested samples, the results show that the training dataset of freshness is about 95.6%, and the accuracy rate of the test dataset is 93.8%. For the training dataset of corruption, the accuracy rate is about 91.4%, and the accuracy rate of the test dataset is 90.5%. The overall accuracy rate is more than 90%. This work could help to reduce quality loss during salmon cold storage.


2019 ◽  
Vol 8 (3) ◽  
pp. 8342-8348

In this paper, the research work investigated on various spectral accents, for example, M.F.C.C, pitch-chroma, skew-ness, and centroid for feeling acknowledgment. For the test arrangement, the feelings considered in this investigation are Fear, Anger, Neutral, and Happy. The framework is assessed for different blends of spectral accents. At last, it makes sense of the blend of MFCC and skewness gave a superior acknowledgment execution when contrasted with different mixes. The previously mentioned accents are inspected utilizing Gaussian Mixture models (G.M.M.s) and Support Vector Machines (S.V.M.s). To expand the framework execution and evacuate insignificant data shape the recently produced vigorous accents, in this paper investigated an approach, namely Principal Component Analysis (PCA) is utilized to expel high dimensional information. It was set up that the acknowledgment execution for include sets in the wake of applying PCA got expanded in both grouping models utilizing GMMs and SVMs. The general framework is perceived 35% preceding PCA 58.3% later than PCA utilizing GMMs, and 28% preceding PCA, 50.5% later than PCA utilizing SVMs. The database utilized as a part of this examination is Telugu feeling speech corpus (IIT-KGP)


Author(s):  
AMITA PAL ◽  
SMARAJIT BOSE ◽  
GOPAL K. BASAK ◽  
AMITAVA MUKHOPADHYAY

For solving speaker identification problems, the approach proposed by Reynolds [IEEE Signal Process. Lett.2 (1995) 46–48], using Gaussian Mixture Models (GMMs) based on Mel Frequency Cepstral Coefficients (MFCCs) as features, is one of the most effective available in the literature. The use of GMMs for modeling speaker identity is motivated by the interpretation that the Gaussian components represent some general speaker-dependent spectral shapes, and also by the capability of Gaussian mixtures to model arbitrary densities. In this work, we have initially illustrated, with the help of a new bilingual speech corpus, how the well-known principal component transformation, in conjunction with the principle of classifier combination can be used to enhance the performance of the MFCC-GMM speaker recognition systems significantly. Subsequently, we have emphatically and rigorously established the same using the benchmark speech corpus NTIMIT. A significant outcome of this work is that the proposed approach has the potential to enhance the performance of any speaker recognition system based on correlated features.


Author(s):  
Arijit Ghosal ◽  
Suchibrota Dutta ◽  
Debanjan Banerjee

Automatic recognition of instrument types from an audio signal is a challenging and a promising research topic. It is challenging as there has been work performed in this domain and because of its applications in the music industry. Different broad categories of instruments like strings, woodwinds, etc., have already been identified. Very few works have been done for the sub-categorization of different categories of instruments. Mel Frequency Cepstral Coefficients (MFCC) is a frequently used acoustic feature. In this work, a hierarchical scheme is proposed to classify string instruments without using MFCC-based features. Chroma reflects the strength of notes in a Western 12-note scale. Chroma-based features are able to differentiate from the different broad categories of string instruments in the first level. The identity of an instrument can be traced through the sound envelope produced by a note which bears a certain pitch. Pitch-based features have been considered to further sub-classify string instruments in the second level. To classify, a neural network, k-NN, Naïve Bayes' and Support Vector Machine have been used.


Author(s):  
Samuel Kim ◽  
Panayiotis Georgiou ◽  
Shrikanth Narayanan

We propose the notion of latent acoustic topics to capture contextual information embedded within a collection of audio signals. The central idea is to learn a probability distribution over a set of latent topics of a given audio clip in an unsupervised manner, assuming that there exist latent acoustic topics and each audio clip can be described in terms of those latent acoustic topics. In this regard, we use the latent Dirichlet allocation (LDA) to implement the acoustic topic models over elemental acoustic units, referred as acoustic words, and perform text-like audio signal processing. Experiments on audio tag classification with the BBC sound effects library demonstrate the usefulness of the proposed latent audio context modeling schemes. In particular, the proposed method is shown to be superior to other latent structure analysis methods, such as latent semantic analysis and probabilistic latent semantic analysis. We also demonstrate that topic models can be used as complementary features to content-based features and offer about 9% relative improvement in audio classification when combined with the traditional Gaussian mixture model (GMM)–Support Vector Machine (SVM) technique.


Author(s):  
Amara Fethi ◽  
Fezari Mohamed

In this paper we investigate the proprieties of automatic speaker recognition (ASR) to develop a system for voice pathologies detection, where the model does not correspond to a speaker but it corresponds to group of patients who shares the same diagnostic. One of essential part in this topic is the database (described later), the samples voices (healthy and pathological) are chosen from a German database which contains many diseases, spasmodic dysphonia is proposed for this study. This problematic can be solved by statistical pattern recognition techniques where we have proposed the mel frequency cepstral coefficients (MFCC) to be modeled first, with gaussian mixture model (GMM) massively used in ASR then, they are modeled with support vector machine (SVM). The obtained results are compared in order to evaluate the more preferment classifier. The performance of each method is evaluated in a term of the accuracy, sensitivity, specificity. The best performance is obtained with 12 coefficientsMFCC, energy and second derivate along SVM with a polynomial kernel function, the classification rate is 90% for normal class and 93% for pathological class.This work is developed under MATLAB


Sign in / Sign up

Export Citation Format

Share Document