mel frequency cepstral coefficient Latest Research Papers

Speaker recognition is the process of recognizing a speaker from his speech. This can be used in many aspects of life, such as taking access remotely to a personal device, securing access to voice control, and doing a forensic investigation. In speaker recognition, extracting features from the speech is the most critical process. The features are used to represent the speech as unique features to distinguish speech samples from one another. In this research, we proposed the use of a combination of Wavelet and Mel Frequency Cepstral Coefficient (MFCC), Wavelet-MFCC, as feature extraction methods, and Hidden Markov Model (HMM) as classification. The speech signal is first extracted using Wavelet into one level of decomposition, then only the sub-band detail coefficient is used as the feature for further extraction using MFCC. The modeled system was applied in 300 speech datasets of 30 speakers uttering “HADIR” in the Indonesian language. K-fold cross-validation is implemented with five folds. As much as 80% of the data were trained for each fold, while the rest was used as testing data. Based on the testing, the system's accuracy using the combination of Wavelet-MFCC obtained is 96.67%. ABSTRAK: Pengecaman penutur adalah proses mengenali penutur dari ucapannya yang dapat digunakan dalam banyak aspek kehidupan, seperti mengambil akses dari jauh ke peranti peribadi, mendapat kawalan ke atas akses suara, dan melakukan penyelidikan forensik. Ciri-ciri khas dari ucapan merupakan proses paling kritikal dalam pengecaman penutur. Ciri-ciri ini digunakan bagi mengenali ciri unik yang terdapat pada sesebuah ucapan dalam membezakan satu sama lain. Penyelidikan ini mencadangkan penggunaan kombinasi Wavelet dan Mel Frekuensi Pekali Cepstral (MFCC), Wavelet-MFCC, sebagai kaedah ekstrak ciri-ciri penutur, dan Model Markov Tersembunyi (HMM) sebagai pengelasan. Isyarat penuturan pada awalnya diekstrak menggunakan Wavelet menjadi satu tahap penguraian, kemudian hanya pekali perincian sub-jalur digunakan bagi pengekstrakan ciri-ciri berikutnya menggunakan MFCC. Model ini diterapkan kepada 300 kumpulan data ucapan daripada 30 penutur yang mengucapkan kata "HADIR" dalam bahasa Indonesia. Pengesahan silang K-lipat dilaksanakan dengan 5 lipatan. Sebanyak 80% data telah dilatih bagi setiap lipatan, sementara selebihnya digunakan sebagai data ujian. Berdasarkan ujian ini, ketepatan sistem yang menggunakan kombinasi Wavelet-MFCC memperolehi 96.67%.

Download Full-text

The Implementation Of Mfcc Feature Extraction And Selection of Cepstral Coefficient for Qur’an Recitation in TPA (Qur’an Learning Center) Nurul Huda Plus Purbayan

RSF Conference Series: Engineering and Technology ◽

10.31098/cset.v1i1.417 ◽

2021 ◽

Vol 1 (1) ◽

pp. 453-478

Author(s):

Heriyanto Heriyanto ◽

Herlina Jayadianti ◽

Juwairiah Juwairiah

Keyword(s):

Feature Extraction ◽

Recognition System ◽

Automatic Speech Recognition System ◽

Linear Predictive Code ◽

Predictive Code ◽

Feature Extraction And Selection ◽

Approach Method ◽

The Right ◽

Mel Frequency Cepstral Coefficient ◽

Selection Of

There are two approaches to Qur’an recitation, namely talaqqi and qira'ati. Both approaches use the science of recitation containing knowledge of the rules and procedures for reading the Qur'an properly. Talaqqi requires the teacher and students to sit facing each other while qira'ati is the recitation of the Qur'an with rhythms and tones. Many studies have developed an automatic speech recognition system for Qur’an recitation to help the learning process. Feature extraction model using Mel Frequency Cepstral Coefficient (MFCC) and Linear Predictive Code (LPC). The MFCC method has an accuracy of 50% to 60% while the accuracy of Linear Predictive Code (LPC) is only 45% to 50%, so the non-linear MFCC method has higher accuracy than the linear approach method. The cepstral coefficient feature that is used starts from 0 to 23 or 24 cepstral coefficients. Meanwhile, the frame taken consists of 0 to 10 frames or eleven frames. Voting for 300 recorded voice samples was tested against 200 voice recordings, both male and female voices. The frequency used was 44.100 kHz stereo 16 bit. This study aims to obtain good accuracy by selecting the right feature on the cepstral coefficient using MFCC feature extraction and matching accuracy through the selection of the cepstral coefficient feature with Dominant Weight Normalization (NBD) at TPA Nurul Huda Plus Purbayan. Accuracy results showed that the MFCC method with the selection of the 23rd cepstral coefficient has a higher accuracy rate of 90.2% compared to the others. It can be concluded that the selection of the right features on the 23rd cepstral coefficient affects the accuracy of the voice of Qur’an recitation.

Download Full-text

Comparison of Mel Frequency Cepstral Coefficient (MFCC) Feature Extraction, With and Without Framing Feature Selection, to Test the Shahada Recitation

RSF Conference Series: Engineering and Technology ◽

10.31098/cset.v1i1.395 ◽

2021 ◽

Vol 1 (1) ◽

pp. 335-354

Author(s):

Heriyanto Heriyanto ◽

Dyah Ayu Irawati

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Dominant Weight ◽

Accuracy Rate ◽

Matching Accuracy ◽

Cepstral Coefficients ◽

The Right ◽

Mel Frequency Cepstral Coefficient ◽

The Voice ◽

Selection Of

Voice research for feature extraction using MFCC. Introduction with feature extraction as the first step to get features. Features need to be done further through feature selection. The feature selection in this research used the Dominant Weight feature for the Shahada voice, which produced frames and cepstral coefficients as the feature extraction. The cepstral coefficient was used from 0 to 23 or 24 cepstral coefficients. At the same time, the taken frame consisted of 0 to 10 frames or eleven frames. Voting as many as 300 samples of recorded voices were tested on 200 voices of both male and female voice recordings. The frequency used was 44.100 kHz 16-bit stereo. This research aimed to gain accuracy by selecting the right features on the frame using MFCC feature extraction and matching accuracy with frame feature selection using the Dominant Weight Normalization (NBD). The accuracy results obtained that the MFCC method with the selection of the 9th frame had a higher accuracy rate of 86% compared to other frames. The MFCC without feature selection had an average of 60%. The conclusion was that selecting the right features in the 9th frame impacted the accuracy of the voice of shahada recitation.

Download Full-text

Motion state classification for micro‐drones via modified Mel frequency cepstral coefficient and hidden Markov mode

Electronics Letters ◽

10.1049/ell2.12384 ◽

2021 ◽

Author(s):

He Tian ◽

Chunzhu Dong ◽

Li Yuan ◽

Hongcheng Yin

Keyword(s):

Hidden Markov ◽

Motion State ◽

State Classification ◽

Mel Frequency Cepstral Coefficient ◽

Hidden Markov Mode

Download Full-text

Music Genre Classification for Indian Music Genres

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37669 ◽

2021 ◽

Vol 9 (8) ◽

pp. 1756-1762

Author(s):

Neha Kumari

Keyword(s):

Feature Extraction ◽

Contemporary Music ◽

Speech Analysis ◽

Machine Learning Techniques ◽

Genre Classification ◽

Music Genre ◽

Indian Music ◽

Music Information ◽

Mel Frequency Cepstral Coefficient ◽

Music Genre Classification

Abstract: Due to the enormous expansion in the accessibility of music data, music genre classification has taken on new significance in recent years. In order to have better access to them, we need to correctly index them. Automatic music genre classification is essential when working with a large collection of music. For the majority of contemporary music genre classification methodologies, researchers have favoured machine learning techniques. In this study, we employed two datasets with different genres. A Deep Learning approach is utilised to train and classify the system. A convolution neural network is used for training and classification. In speech analysis, the most crucial task is to perform speech analysis is feature extraction. The Mel Frequency Cepstral Coefficient (MFCC) is utilised as the main audio feature extraction technique. By extracting the feature vector, the suggested method classifies music into several genres. Our findings suggest that our system has an 80% accuracy level, which will substantially improve on further training and facilitate music genre classification. Keywords: Music Genre Classification, CNN, KNN, Music information retrieval, feature extraction, spectrogram, GTZAN dataset, Indian music genre dataset.

Download Full-text

Al-Quran recitation verification for memorization test using Siamese LSTM network

Communications in Science and Technology ◽

10.21924/cst.6.1.2021.344 ◽

2021 ◽

Vol 6 (1) ◽

pp. 35-40

Author(s):

Rian Adam Rajagede ◽

Rochana Prih Hastuti

Keyword(s):

Binary Classification ◽

Model Performance ◽

Classification Approach ◽

The Public ◽

Spectral Coefficient ◽

Public Dataset ◽

Lstm Network ◽

Mel Frequency Cepstral Coefficient ◽

Existing Data

In the process of verifying Al-Quran memorization, a person is usually asked to recite a verse without looking at the text. This process is generally done together with a partner to verify the reading. This paper proposes a model using Siamese LSTM Network to help users check their Al-Quran memorization alone. Siamese LSTM network will verify the recitation by matching the input with existing data for a read verse. This study evaluates two Siamese LSTM architectures, the Manhattan LSTM and the Siamese-Classifier. The Manhattan LSTM outputs a single numerical value that represents the similarity, while the Siamese-Classifier uses a binary classification approach. In this study, we compare Mel-Frequency Cepstral Coefficient (MFCC), Mel-Frequency Spectral Coefficient (MFSC), and delta features against model performance. We use the public dataset from Every Ayah website and provide the usage information for future comparison. Our best model, using MFCC with delta and Manhattan LSTM, produces an F1-score of 77.35%

Download Full-text

Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems

Cognitive Computation ◽

10.1007/s12559-021-09914-w ◽

2021 ◽

Author(s):

Musatafa Abbas Abbood Albadr ◽

Sabrina Tiun ◽

Masri Ayob ◽

Manal Mohammed ◽

Fahad Taha AL-Dhief

Keyword(s):

Principal Component Analysis ◽

Standard Deviation ◽

Principal Component ◽

Component Analysis ◽

Language Identification ◽

Mel Frequency Cepstral Coefficient

Download Full-text

Pengenalan Pola Fonem Vokal menggunakan Short Time Fourier Transform (STFT) dan Fitur Mel Frequency Cepstral Coefficient (MFCC)

Jurnal Teknologi Terpadu ◽

10.54914/jtt.v7i1.298 ◽

2021 ◽

Vol 7 (1) ◽

pp. 1-6

Author(s):

Ahmad rio Adriansyah ◽

Kurniawan Dwi Prasetyo ◽

Hamdan Ainul Atmam Al Faruqi

Keyword(s):

Fourier Transform ◽

Short Time Fourier Transform ◽

Short Time ◽

Mel Frequency Cepstral Coefficient ◽

Bahasa Indonesia

Fonem adalah bagian yang menyusun semua bahasa lisan. Setiap kata dan kalimat yang diutarakan terdiri dari satu fonem atau lebih. Untuk meningkatkan akurasi dari model akustik, peneliti mencoba mengidentifikasi pola fonem vokal dalam bahasa Indonesia menggunakan STFT dan Fitur MFCC. Dalam penelitian ini, peneliti menganalisis data dari 398 file suara yang bersumber dari 51 orang partisipan dan mengeksplorasi perbedaan pola dari fonem vokal a,i,u,e,o. Dengan menggunakan SVM dan JST, fitur tersebut diklasifikasikan dan diuji. Hasil pengujian memberikan akurasi 93,8% menggunakan SVM dengan kernel radial.

Download Full-text

Mel frequency cepstral coefficient temporal feature integration for classifying squeak and rattle noise

The Journal of the Acoustical Society of America ◽

10.1121/10.0005201 ◽

2021 ◽

Vol 150 (1) ◽

pp. 193-201

Author(s):

Asith Abeysinghe ◽

Mohammad Fard ◽

Reza Jazar ◽

Fabio Zambetta ◽

John Davy

Keyword(s):

Feature Integration ◽

Mel Frequency Cepstral Coefficient ◽

Temporal Feature

Download Full-text

Identification of Fake Stereo Audio Using SVM and CNN

Information ◽

10.3390/info12070263 ◽

2021 ◽

Vol 12 (7) ◽

pp. 263

Author(s):

Tianyun Liu ◽

Diqun Yan ◽

Rangding Wang ◽

Nan Yan ◽

Gang Chen

Keyword(s):

The Other ◽

Perceptual Quality ◽

Support Vector ◽

Digital Audio ◽

Audio Quality ◽

Vector Machines ◽

Mel Frequency Cepstral Coefficient ◽

Audio System ◽

Haas Effect ◽

Effect Technique

The number of channels is one of the important criteria in regard to digital audio quality. Generally, stereo audio with two channels can provide better perceptual quality than mono audio. To seek illegal commercial benefit, one might convert a mono audio system to stereo with fake quality. Identifying stereo-faking audio is a lesser-investigated audio forensic issue. In this paper, a stereo faking corpus is first presented, which is created using the Haas effect technique. Two identification algorithms for fake stereo audio are proposed. One is based on Mel-frequency cepstral coefficient features and support vector machines. The other is based on a specially designed five-layer convolutional neural network. The experimental results on two datasets with five different cut-off frequencies show that the proposed algorithm can effectively detect stereo-faking audio and has good robustness.

Download Full-text

mel frequency cepstral coefficient
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

WAVELET DETAIL COEFFICIENT AS A NOVEL WAVELET-MFCC FEATURES IN TEXT-DEPENDENT SPEAKER RECOGNITION SYSTEM

The Implementation Of Mfcc Feature Extraction And Selection of Cepstral Coefficient for Qur’an Recitation in TPA (Qur’an Learning Center) Nurul Huda Plus Purbayan

Comparison of Mel Frequency Cepstral Coefficient (MFCC) Feature Extraction, With and Without Framing Feature Selection, to Test the Shahada Recitation

Motion state classification for micro‐drones via modified Mel frequency cepstral coefficient and hidden Markov mode

Music Genre Classification for Indian Music Genres

Al-Quran recitation verification for memorization test using Siamese LSTM network

Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems

Pengenalan Pola Fonem Vokal menggunakan Short Time Fourier Transform (STFT) dan Fitur Mel Frequency Cepstral Coefficient (MFCC)

Mel frequency cepstral coefficient temporal feature integration for classifying squeak and rattle noise

Identification of Fake Stereo Audio Using SVM and CNN

Export Citation Format

mel frequency cepstral coefficientRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

WAVELET DETAIL COEFFICIENT AS A NOVEL WAVELET-MFCC FEATURES IN TEXT-DEPENDENT SPEAKER RECOGNITION SYSTEM

The Implementation Of Mfcc Feature Extraction And Selection of Cepstral Coefficient for Qur’an Recitation in TPA (Qur’an Learning Center) Nurul Huda Plus Purbayan

Comparison of Mel Frequency Cepstral Coefficient (MFCC) Feature Extraction, With and Without Framing Feature Selection, to Test the Shahada Recitation

Motion state classification for micro‐drones via modified Mel frequency cepstral coefficient and hidden Markov mode

Music Genre Classification for Indian Music Genres

Al-Quran recitation verification for memorization test using Siamese LSTM network

Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems

Pengenalan Pola Fonem Vokal menggunakan Short Time Fourier Transform (STFT) dan Fitur Mel Frequency Cepstral Coefficient (MFCC)

Mel frequency cepstral coefficient temporal feature integration for classifying squeak and rattle noise

Identification of Fake Stereo Audio Using SVM and CNN

mel frequency cepstral coefficient
Recently Published Documents