scholarly journals Analysis of Spectral Features for Speaker Clustering

In this paper Spectral feature like Spectral Roll off, Spectral Centroid, RMS (Root Mean Square) energy, Zero crossing Rate, Spectral irregularity, Brightness, of speech audio signals are extracted and analyzed. From analysis, prominent features are selected. These prominent features are used for speaker identification. For performing feature analysis, database of seven speakers is created. By using features, speakers are divided into two groups or clusters.

2017 ◽  
Vol 3 (2) ◽  
pp. 94
Author(s):  
Prisca Pakan ◽  
Rocky Yefrenes Dillak

Penelitian ini bertujuan mengembangkan suatu metode yang dapat digunakan untuk melakukanklasifikasi terhadap jenis musik berdasarkan file audio dengan format wav menggunakan algoritmaRidge Polynomial Neural Network (RPNN). Pengklasifikasian file audio ke dalam suatu kelompokatau kelas, memerlukan ciri atau fitur dari file audio tersebut. Metode ekstrak fitur yang digunakanuntuk memperoleh ciri atau fitur dari file yang dimaksud adalah Spectral Centroid (SC), SortTime Energy (STE) dan Zero Crossing Rate (ZCR) yang diturunkan dalam domain waktu (timedomain) yang merupakan salah satu komponen data audio. Berdasarkan hasil dari penelitian inimenunjukkan bahwa pendekatan yang diusulkan mampu melakukan klasifikasi terhadap jenis musikberdasarkan file audio berformat wav dengan akurasi sebesar 90%


2012 ◽  
Vol 4 (1) ◽  
Author(s):  
David David

Abstract. Voice recognition technology is currently experiencing growth, especially in the case of speech processing. Speech processing is a way to extract the desired information from a voice signal. This study discusses the classification of human voice system male and female. Extract the characteristics of the voice signal in each frame time domain and frequency domain is to help simplify and speed calculations. The features for voice or other audio between Short Time Energy, Zero Crossing Rate, Spectral Centroid, and others. Test results show that the classification system the human voice using the backpropagation neural network and Levenberg-Marquadt algorithm to change matrix weight is very good because of the complexity and rapid calculation which is not too high. Database voice sample of 40 voices with the test data as much as 5 votes. The output of the system is the result of the classification that has been identified with a similarity value>=0.5 for male and <0.5 as a female. Testing using artificial neural network produced an average success rate in voice classification amounted to 91%.Keywords: Feature Extraction, Classification, Backpropagation, Levenberg-Marquadt Algorithm, Human Voice Abstrak. Teknologi pengenalan suara saat ini telah mengalami perkembangan terutama dalam hal speech processing. Speech processing merupakan suatu cara untuk mengekstrak informasi yang diinginkan dari sebuah sinyal suara. Penelitian ini membahas sistem klasifikasi suara manusia male dan female. Mengekstrak ciri dari sinyal suara setiap frame pada kawasan waktu dan kawasan frekuensi sangat membantu untuk  menyederhanakan dan mempercepat perhitungan. Adapun fitur-fitur untuk suara atau audio antara lain Short Time Energy, Zero Crossing Rate, Spectral Centroid dan lain-lain. Hasil pengujian sistem menunjukkan bahwa klasifikasi suara manusia dengan menggunakan jaringan saraf tiruan backpropagation dan algoritma Levenberg-Marquadt untuk perubahan matriks bobot, sangat baik dan cepat karena kompleksitas perhitungan yang tidak terlalu tinggi. Database sample suara sebanyak 40 buah dengan data test sebanyak 5 suara. Output dari sistem adalah hasil klasifikasi yang telah dikenali dengan nilai kemiripan >= 0,5 sebagai pria dan < 0,5 sebagai wanita. Pengujian dengan menggunakan jaringan saraf tiruan dihasilkan rata-rata tingkat keberhasilan dalam klasifikasi suara adalah sebesar 91 %.Kata Kunci: Feature Extraction, Klasifikasi, Backpropagation, Algoritma Levenberg-Marquadt, Suara Manusia


Author(s):  
Vaishali Nandedkar

Content Based Audio Retrieval system is very helpful to facilitate users to find the target audio materials. Audio signals are classified into speech, music, several types of environmental sounds and silence based on audio content analysis. The extracted audio features include temporal curves of the average zero-crossing rate, the spectral Centroid, the spectral flux, as well as spectral roll-off of these curves. In this dissertation we have used the four features for extracting the audio from the database, use of this multiple features increase the accuracy of the audio file which we are retrieving from the audio database.


2021 ◽  
Vol 39 (1B) ◽  
pp. 1-10
Author(s):  
Iman H. Hadi ◽  
Alia K. Abdul-Hassan

Speaker recognition depends on specific predefined steps. The most important steps are feature extraction and features matching. In addition, the category of the speaker voice features has an impact on the recognition process. The proposed speaker recognition makes use of biometric (voice) attributes to recognize the identity of the speaker. The long-term features were used such that maximum frequency, pitch and zero crossing rate (ZCR).  In features matching step, the fuzzy inner product was used between feature vectors to compute the matching value between a claimed speaker voice utterance and test voice utterances. The experiments implemented using (ELSDSR) data set. These experiments showed that the recognition accuracy is 100% when using text dependent speaker recognition.


Circulation ◽  
2021 ◽  
Vol 143 (Suppl_1) ◽  
Author(s):  
Meghana Gadgil ◽  
Alexis F Wood ◽  
Ibrahim Karaman ◽  
Goncalo Gomes Da Graca ◽  
Ioanna Tzoulaki ◽  
...  

Introduction: Poor dietary quality is a well-known risk factor for diabetes and cardiovascular disease (CVD), however metabolites marking adherence to U.S. dietary guidelines are unknown. Our goal was to determine a pattern of metabolites associated with the Healthy Eating Index-2015 (HEI-2015). We hypothesize that there will be metabolites positively and negatively associated with the HEI-2015 score, including those previously linked to diabetes and CVD. Methods: Sample: 2269 adult men and women from the Multi-Ethnic Study of Atherosclerosis (MESA) longitudinal cohort study without known cardiovascular disease or diabetes. Data/specimens: Fasting serum specimens, diet and demographic questionnaires at baseline. Metabolomics: Untargeted 1 H NMR CPMG spectroscopy (600 MHz) annotated by internal and external reference data sets. Statistical analysis: Metabolome-wide association study (MWAS) using linear regression models specifying each spectral feature as the outcome in separate models, HEI-2015 score as the predictor, and adjustment for age, sex, race, and study site, accounting for multiple comparisons. Elastic net regularized regression was used to select an optimal subset of features associated with HEI-2015 score. Separately, hierarchical clustering defined discrete groups of correlated NMR features also tested for association with HEI-2015 score. Results: MWAS identified 1914 spectral features significantly associated with the HEI-2015 diet score. After elastic net regression, 35 metabolomic spectral features remained associated with HEI-2015 diet score. Cluster analysis identified seven clusters, three of which were significantly associated with HEI-2015 score after Bonferroni correction. (Table) Conclusions: Cholesterol moieties, proline betaine, proline/glutamate and fatty acyls chains were significantly associated with higher diet quality in the MESA cohort. Further analysis may clarify the link between dietary quality, metabolites, and pathogenesis of diabetes and CVD.


2021 ◽  
Author(s):  
Talieh Seyed Tabtabae

Automatic Emotion Recognition (AER) is an emerging research area in the Human-Computer Interaction (HCI) field. As Computers are becoming more and more popular every day, the study of interaction between humans (users) and computers is catching more attention. In order to have a more natural and friendly interface between humans and computers, it would be beneficial to give computers the ability to recognize situations the same way a human does. Equipped with an emotion recognition system, computers will be able to recognize their users' emotional state and show the appropriate reaction to that. In today's HCI systems, machines can recognize the speaker and also content of the speech, using speech recognition and speaker identification techniques. If machines are equipped with emotion recognition techniques, they can also know "how it is said" to react more appropriately, and make the interaction more natural. One of the most important human communication channels is the auditory channel which carries speech and vocal intonation. In fact people can perceive each other's emotional state by the way they talk. Therefore in this work the speech signals are analyzed in order to set up an automatic system which recognizes the human emotional state. Six discrete emotional states have been considered and categorized in this research: anger, happiness, fear, surprise, sadness, and disgust. A set of novel spectral features are proposed in this contribution. Two approaches are applied and the results are compared. In the first approach, all the acoustic features are extracted from consequent frames along the speech signals. The statistical values of features are considered to constitute the features vectors. Suport Vector Machine (SVM), which is a relatively new approach in the field of machine learning is used to classify the emotional states. In the second approach, spectral features are extracted from non-overlapping logarithmically-spaced frequency sub-bands. In order to make use of all the extracted information, sequence discriminant SVMs are adopted. The empirical results show that the employed techniques are very promising.


Sign in / Sign up

Export Citation Format

Share Document