Feature Extraction of Speech Signal by Genetic Algorithms-Simulated Annealing and Comparison with Linear Predictive Coding Based Methods

Author(s):  
Melih İnal

Electrocardiogram (ECG) examination via computer techniques that involve feature extraction, pre-processing and post-processing was implemented due to its significant advantages. Extracting ECG signal standard features that requires high processing operation level was the main focusing point for many studies. In this paper, up to 6 different ECG signal classes are accurately predicted in the absence of ECG feature extraction. The corner stone of the proposed technique in this paper is the Linear predictive coding (LPC) technique that regress and normalize the signal during the pre-processing phase. Prior to the feature extraction using Wavelet energy (WE), a direct Wavelet transform (DWT) is implemented that converted ECG signal to frequency domain. In addition, the dataset was divided into two parts , one for training and the other for testing purposes Which have been classified in this proposed algorithm using support vector machine (SVM). Moreover, using MIT AI2 Companion was developed by MIT Center for Mobile Learning, the classification result was shared to the patient mobile phone that can call the ambulance and send the location in case of serious emergency. Finally, the confusion matrix values are used to measure the proposed classification performance. For 6 different ECG classes, an accuracy ration of about 98.15% was recorded. This ratio became 100% for 3 ECG signal classes and decreases to 97.95% by increasing ECG signal to 7 classes.


Organon ◽  
2011 ◽  
Vol 26 (51) ◽  
Author(s):  
Ana Cristina Cunha

! is study aimed at investigating how Brazilian learners ofEnglish organize their knowledge about lexical stress of a speci" c wordcategory at an early stage of L2 acquisition with the help of an unsuper-vised neural network, a self-organizing map (SOM), also called Kohonennetwork. ! e basic hypothesis tested was whether the parameterization ofthe speech signal from learner’s utterances through processing techniquessuch as Linear Predictive Coding (LPC), which consisted of the input ofthe network, would be e# ective in the classi" cation of learners and theirutterances. ! e study consisted of an empirical part and a computationalone. ! e participants were beginner students aged between 18 and 25.Preliminary results indicate that the combination of LPC+SOM allowedthe creation of well-de" ned category clusters, which is an important stepin data classi" cation to aid language pro" ciency level determination, andcomputer-assisted pronunciation teaching.


2020 ◽  
Vol 7 (6) ◽  
pp. 1177
Author(s):  
Siti Helmiyah ◽  
Imam Riadi ◽  
Rusydi Umar ◽  
Abdullah Hanif ◽  
Anton Yudhana ◽  
...  

<p class="Abstrak">Ucapan merupakan sinyal yang memiliki kompleksitas tinggi terdiri dari berbagai informasi. Informasi yang dapat ditangkap dari ucapan dapat berupa pesan terhadap lawan bicara, pembicara, bahasa, bahkan emosi pembicara itu sendiri tanpa disadari oleh si pembicara. Speech Processing adalah cabang dari pemrosesan sinyal digital yang bertujuan untuk terwujudnya interaksi yang natural antar manusia dan mesin. Karakteristik emosional adalah fitur yang terdapat dalam ucapan yang membawa ciri-ciri dari emosi pembicara. Linear Predictive Coding (LPC) adalah sebuah metode untuk mengekstraksi ciri dalam pemrosesan sinyal. Penelitian ini, menggunakan LPC sebagai ekstraksi ciri dan Metode Euclidean Distance untuk identifikasi emosi berdasarkan ciri yang didapatkan dari LPC.  Penelitian ini menggunakan data emosi marah, sedih, bahagia, netral dan bosan. Data yang digunakan diambil dari Berlin Emo DB, dengan menggunakan tiga kalimat berbeda dan aktor yang berbeda juga. Penelitian ini menghasilkan akurasi pada emosi sedih 58,33%, emosi netral 50%, emosi marah 41,67%, emosi bahagia 8,33% dan untuk emosi bosan tidak dapat dikenali. Penggunaan Metode LPC sebagai ekstraksi ciri memberikan hasil yang kurang baik pada penelitian ini karena akurasi rata-rata hanya sebesar 31,67% untuk identifikasi semua emosi. Data suara yang digunakan dengan kalimat, aktor, umur dan aksen yang berbeda dapat mempengaruhi dalam pengenalan emosi, maka dari itu ekstraksi ciri dalam pengenalan pola ucapan emosi manusia sangat penting. Hasil akurasi pada penelitian ini masih sangat kecil dan dapat ditingkatkan dengan menggunakan ekstraksi ciri yang lain seperti prosidis, spektral, dan kualitas suara, penggunaan parameter <em>max, min, mean, median, kurtosis dan skewenes.</em> Selain itu penggunaan metode klasifikasi juga dapat mempengaruhi hasil pengenalan emosi.</p><p class="Judul2" align="left"> </p><p class="Judul2"><strong><em>Abstract</em></strong></p><p class="Abstrak"><em>Speech is a signal that has a high complexity consisting of various information. Information that can be captured from speech can be in the form of messages to interlocutor, the speaker, the language, even the speaker's emotions themselves without the speaker realizing it. Speech Processing is a branch of digital signal processing aimed at the realization of natural interactions between humans and machines. Emotional characteristics are features contained in the speech that carry the characteristics of the speaker's emotions. Linear Predictive Coding (LPC) is a method for extracting features in signal processing. This research uses LPC as a feature extraction and Euclidean Distance Method to identify emotions based on features obtained from LPC. This study uses data on emotions of anger, sadness, happiness, neutrality, and boredom. The data used was taken from Berlin Emo DB, using three different sentences and different actors. This research resulted in inaccuracy in sad emotions 58.33%, neutral emotions 50%, angry emotions 41.67%, happy emotions 8.33% and bored emotions could not be recognized. The use of the LPC method as feature extraction gave unfavorable results in this study because the average accuracy was only 31.67% for the identification of all emotions. Voice data used with different sentences, actors, ages, and accents</em><em> </em><em>can influence the recognition of emotions, therefore the extraction of features in the recognition of speech patterns of human emotions is very important. Accuracy results in this study are still very small and can be improved by using other feature extractions such as provides, spectral, and sound quality, using parameters max, min, mean, median, kurtosis, and skewness. Besides the use of classification methods can also affect the results of emotional recognition.</em></p><p class="Abstrak"> </p>


Author(s):  
Nsiri Benayad ◽  
Zayrit Soumaya ◽  
Belhoussine Drissi Taoufiq ◽  
Ammoumou Abdelkrim

<span lang="EN-US">Among the several ways followed for detecting Parkinson's disease, there is the one based on the speech signal, which is a symptom of this disease. In this paper focusing on the signal analysis, a data of voice records has been used. In these records, the patients were asked to utter vowels “a”, “o”, and “u”. Discrete wavelet transforms (DWT) applied to the speech signal to fetch the variable resolution that could hide the most important information about the patients. From the approximation a3 obtained by Daubechies wavelet at the scale 2 level 3, 21 features have been extracted: a <a name="_Hlk88480766"></a>linear predictive coding (LPC), energy, zero-crossing rate (ZCR), mel frequency cepstral coefficient (MFCC), and wavelet Shannon entropy. Then for the classification, the K-nearest neighbour (KNN) has been used. The KNN is a type of instance-based learning that can make a decision based on approximated local functions, besides the ensemble learning. However, through the learning process, the choice of the training features can have a significant impact on overall the process. So, here it stands out the role of the genetic algorithm (GA) to select the best training features that give the best accurate classification.</span>


2021 ◽  
Vol 39 (1B) ◽  
pp. 30-40
Author(s):  
Ahmed M. Ahmed ◽  
Aliaa K. Hassan

Speaker Recognition Defined by the process of recognizing a person by his\her voice through specific features that extract from his\her voice signal. An Automatic Speaker recognition (ASP) is a biometric authentication system. In the last decade, many advances in the speaker recognition field have been attained, along with many techniques in feature extraction and modeling phases. In this paper, we present an overview of the most recent works in ASP technology. The study makes an effort to discuss several modeling ASP techniques like Gaussian Mixture Model GMM, Vector Quantization (VQ), and Clustering Algorithms. Also, several feature extraction techniques like Linear Predictive Coding (LPC) and Mel frequency cepstral coefficients (MFCC) are examined. Finally, as a result of this study, we found MFCC and GMM methods could be considered as the most successful techniques in the field of speaker recognition so far.


2018 ◽  
Vol 7 (3) ◽  
pp. 1531
Author(s):  
Mandeep Singh ◽  
Gurpreet Singh

This paper presents a technique for isolated word recognition from speech signal using Spectrum Analysis and Linear Predictive Coding (LPC). In the present study, only those words have been analyzed which are commonly used during a telephonic conversations by criminals. Since each word is characterized by unique frequency spectrum signature, thus, spectrum analysis of a speech signal has been done using certain statistical parameters. These parameters help in recognizing a particular word from a speech signal, as there is a unique value of a feature for each word, which helps in distinguishing one word from the other. Second method used is based on LPC coefficients. Analysis of features extracted using LPC coefficients help in identification of a specific word from the input speech signal. Finally, a combination of best features from these two methods has been used and a hybrid technique is proposed. An accuracy of 94% has been achieved for sample size of 400 speech words.  


2020 ◽  
Vol 9 (1) ◽  
pp. 2431-2435

ASR is the use of system software and hardware based techniques to identify and process human voice. In this research, Tamil words are analyzed, segmented as syllables, followed by feature extraction and recognition. Syllables are segmented using short term energy and segmentation is done in order to minimize the corpus size. The algorithm for syllable segmentation works by performing the STE function of the continuous speech signal. The proposed approach for speech recognition uses the combination of Mel-Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC). MFCC features are used to extract a feature vector containing all information about the linguistic message. The LPC affords a robust, dependable and correct technique for estimating the parameters that signify the vocal tract system.LPC features can reduce the bit rate of speech (i.e reducing the measurement of transmitting signal).The combined feature extraction technique will minimize the size of transmitting signal. Then the proposed FE algorithm is evaluated on the speech corpus using the Random forest approach. Random forest is an effective algorithm which can build a reliable training model as its training time is less because the classifier works on the subset of features alone.


Sign in / Sign up

Export Citation Format

Share Document