Identifikasi Emosi Manusia Berdasarkan Ucapan Menggunakan Metode Ekstraksi Ciri LPC dan Metode Euclidean Distance

Siti Helmiyah; Imam Riadi; Rusydi Umar; Abdullah Hanif; Anton Yudhana; Abdul Fadlil

doi:10.25126/jtiik.2020722693

Identifikasi Emosi Manusia Berdasarkan Ucapan Menggunakan Metode Ekstraksi Ciri LPC dan Metode Euclidean Distance

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2020722693 ◽

2020 ◽

Vol 7 (6) ◽

pp. 1177

Author(s):

Siti Helmiyah ◽

Imam Riadi ◽

Rusydi Umar ◽

Abdullah Hanif ◽

Anton Yudhana ◽

...

Keyword(s):

Signal Processing ◽

Feature Extraction ◽

Speech Processing ◽

Euclidean Distance ◽

Predictive Coding ◽

Digital Signal ◽

Linear Predictive Coding ◽

Distance Method ◽

Average Accuracy ◽

Voice Data

Ucapan merupakan sinyal yang memiliki kompleksitas tinggi terdiri dari berbagai informasi. Informasi yang dapat ditangkap dari ucapan dapat berupa pesan terhadap lawan bicara, pembicara, bahasa, bahkan emosi pembicara itu sendiri tanpa disadari oleh si pembicara. Speech Processing adalah cabang dari pemrosesan sinyal digital yang bertujuan untuk terwujudnya interaksi yang natural antar manusia dan mesin. Karakteristik emosional adalah fitur yang terdapat dalam ucapan yang membawa ciri-ciri dari emosi pembicara. Linear Predictive Coding (LPC) adalah sebuah metode untuk mengekstraksi ciri dalam pemrosesan sinyal. Penelitian ini, menggunakan LPC sebagai ekstraksi ciri dan Metode Euclidean Distance untuk identifikasi emosi berdasarkan ciri yang didapatkan dari LPC. Penelitian ini menggunakan data emosi marah, sedih, bahagia, netral dan bosan. Data yang digunakan diambil dari Berlin Emo DB, dengan menggunakan tiga kalimat berbeda dan aktor yang berbeda juga. Penelitian ini menghasilkan akurasi pada emosi sedih 58,33%, emosi netral 50%, emosi marah 41,67%, emosi bahagia 8,33% dan untuk emosi bosan tidak dapat dikenali. Penggunaan Metode LPC sebagai ekstraksi ciri memberikan hasil yang kurang baik pada penelitian ini karena akurasi rata-rata hanya sebesar 31,67% untuk identifikasi semua emosi. Data suara yang digunakan dengan kalimat, aktor, umur dan aksen yang berbeda dapat mempengaruhi dalam pengenalan emosi, maka dari itu ekstraksi ciri dalam pengenalan pola ucapan emosi manusia sangat penting. Hasil akurasi pada penelitian ini masih sangat kecil dan dapat ditingkatkan dengan menggunakan ekstraksi ciri yang lain seperti prosidis, spektral, dan kualitas suara, penggunaan parameter max, min, mean, median, kurtosis dan skewenes. Selain itu penggunaan metode klasifikasi juga dapat mempengaruhi hasil pengenalan emosi. AbstractSpeech is a signal that has a high complexity consisting of various information. Information that can be captured from speech can be in the form of messages to interlocutor, the speaker, the language, even the speaker's emotions themselves without the speaker realizing it. Speech Processing is a branch of digital signal processing aimed at the realization of natural interactions between humans and machines. Emotional characteristics are features contained in the speech that carry the characteristics of the speaker's emotions. Linear Predictive Coding (LPC) is a method for extracting features in signal processing. This research uses LPC as a feature extraction and Euclidean Distance Method to identify emotions based on features obtained from LPC. This study uses data on emotions of anger, sadness, happiness, neutrality, and boredom. The data used was taken from Berlin Emo DB, using three different sentences and different actors. This research resulted in inaccuracy in sad emotions 58.33%, neutral emotions 50%, angry emotions 41.67%, happy emotions 8.33% and bored emotions could not be recognized. The use of the LPC method as feature extraction gave unfavorable results in this study because the average accuracy was only 31.67% for the identification of all emotions. Voice data used with different sentences, actors, ages, and accents can influence the recognition of emotions, therefore the extraction of features in the recognition of speech patterns of human emotions is very important. Accuracy results in this study are still very small and can be improved by using other feature extractions such as provides, spectral, and sound quality, using parameters max, min, mean, median, kurtosis, and skewness. Besides the use of classification methods can also affect the results of emotional recognition.

Get full-text (via PubEx)

Perancangan Sistem Kontrol Otomatis Lampu Menggunakan Metode Pengenalan Suara Berbasis Arduino

TELKA - Telekomunikasi Elektronika Komputasi dan Kontrol ◽

10.15575/telka.v2n2.106-117 ◽

2016 ◽

Vol 2 (2) ◽

pp. 106-117

Author(s):

Adam Faroqi ◽

Mada Sanjaya WS ◽

Riyan Nugraha

Keyword(s):

Real Time ◽

Speech Processing ◽

Predictive Coding ◽

Linear Predictive Coding ◽

Fuzzy Interference System ◽

Interference System ◽

Neuro Fuzzy

Perkembangan teknologi saat ini sangat bermanfaat bagi kehidupan banyak orang. Semua aspek kehidupan dapat memanfaatkan teknologi sesuai dengan bidang yang dibutuhkan, termasuk kendali rumah. Dari berbagai penelitian yang telah dilakukan diketahui bahwa sinyal suara dapat juga digunakan untuk berinteraksi dengan komputer, sehingga interaksi tersebut dapat berjalan lebih alami. Penelitian yang dilakukan dengan menggunakan data sinyal suara ini umumnya disebut dengan pemrosesan sinyal suara (speech processing).Penelitian ini bertujuan untuk membuat sistem yang dapat mengenali suara dalam bentuk kalimat agar kedepannya bisa digunakan dalam teknologi listrik. Proses pengolahan suara pun perlu melawati beberapa proses seperti: sampling, ektraksi dan pembelajaran. Dengan proses ekstraksi suatu sinyal suara dapat diketahui karakteristiknya. Terdapat beberapa macam metode ekstraksi ciri yang biasa digunakan, tetapi pada penelitian kali ini menggunakan metode Linear Predictive Coding (LPC). LPC digunakan karena sistem ekstraksinya yang mengadopsi sistem pendengaran manusia sebagai filter pengambilan informasi. Kemudian proses pembelajaran dan pengenalan suara sendiri akan dilakukan oleh Adaptive Neuro Fuzzy Interference System (ANFIS) karena kemampuannya yang bisa melakukan analisis probabilitas dan kemudian menghasilkan respon sesuai dengan parameter. Proses pengenalan suara untuk mengenali kalimat diawali dengan proses perekaman yang akan dijadikan data latih sebanyak 20 buah. Dari hasil uji coba, hasil ekstraksi dengan 4 ciri mempunyai akurasi paling kecil dengan 60% - 70% , sedangkan dengan 5 ciri akurasinya 60% - 80% dan 6 ciri menghasilkan akurasi yang sama yaitu 70% - 80%. Hasil identifikasi secara secara real time dengan 2 orang sebagai pengujiannya menghasilkan akurasi 60% pada pengujian orang pertama dan 70% pada orang kedua untuk pengujian dengan 4 ciri. Analisa waktu respon dengan ciri adalah ciri lebih sedikit akan mempercepat respon matlab dan analisi dengan banyak ciri akan melambatkan waktu respon.

Get full-text (via PubEx)

Speech to Text Processing for Interactive Agent of Virtual Tour Navigation

International Journal of Artificial Intelligence & Robotics (IJAIR) ◽

10.25139/ijair.v1i1.2030 ◽

2019 ◽

Vol 1 (1) ◽

pp. 31

Author(s):

Dian Ahkam Sani ◽

Muchammad Saifulloh

Keyword(s):

Speech Recognition ◽

Text Processing ◽

Predictive Coding ◽

Digital Signal ◽

Recognition System ◽

Human Interaction ◽

Linear Predictive Coding ◽

Voice Input ◽

Human Voice ◽

Backpropagation Method

The development of science and technology is one way to replace the method of human interaction with computers, one of which is to provide voice input. Conversion of sound into text form with the Backpropagation method can be understood and realized through feature extraction, including the use of Linear Predictive Coding (LPC). Linear Predictive Coding is one way to represent the signal in obtaining the features of each sound pattern. In brief, the way this speech recognition system worked was by inputting human voice through a microphone (analog signal) which then sampled with a sampling speed of 8000 Hz so that it became a digital signal with the assistance of sound card on the computer. The digital signal from the sample then entered the initial process using LPC, so that several LPC coefficients were obtained. The LPC outputs were then trained using the Backpropagation learning method. The results of the learning were classified with a word and stored in a database afterwards. The results of the test were in the form of an introduction program that able display the voice plots. the results of speech recognition with voice recognition percentage of respondents in the database iss 80% of the 100 data in the test in Real Time

Get full-text (via PubEx)

VLSI arrays for speech processing with linear predictive coding

Proceedings of the 12th IAPR International Conference on Pattern Recognition (Cat. No.94CH3440-5) ◽

10.1109/icpr.1994.577201 ◽

2002 ◽

Cited By ~ 2

Author(s):

Y.Y. Tang ◽

Tao Li ◽

C.Y. Suen

Keyword(s):

Speech Processing ◽

Predictive Coding ◽

Linear Predictive Coding

Get full-text (via PubEx)

Random-Walk Laplacian for Frequency Analysis in Periodic Graphs

Sensors ◽

10.3390/s21041275 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1275

Author(s):

Rachid Boukrab ◽

Alba Pagès-Zamora

Keyword(s):

Signal Processing ◽

Random Walk ◽

Transition Matrix ◽

Euclidean Distance ◽

Shift Operator ◽

Digital Signal ◽

Laplacian Matrix ◽

Zero Padding ◽

Periodic Graph ◽

Basic Graph

This paper presents the benefits of using the random-walk normalized Laplacian matrix as a graph-shift operator and defines the frequencies of a graph by the eigenvalues of this matrix. A criterion to order these frequencies is proposed based on the Euclidean distance between a graph signal and its shifted version with the transition matrix as shift operator. Further, the frequencies of a periodic graph built through the repeated concatenation of a basic graph are studied. We show that when a graph is replicated, the graph frequency domain is interpolated by an upsampling factor equal to the number of replicas of the basic graph, similarly to the effect of zero-padding in digital signal processing.

Get full-text (via PubEx)

Classification of Multi Heart Diseases With Android Based Monitoring System

Iraqi Journal of Computer Communication Control and System Engineering ◽

10.33103/uot.ijccce.20.2.3 ◽

2020 ◽

pp. 14-22

Keyword(s):

Feature Extraction ◽

Heart Diseases ◽

Confusion Matrix ◽

Predictive Coding ◽

Classification Performance ◽

Support Vector ◽

Ecg Signal ◽

Linear Predictive Coding ◽

Electrocardiogram Ecg

Electrocardiogram (ECG) examination via computer techniques that involve feature extraction, pre-processing and post-processing was implemented due to its significant advantages. Extracting ECG signal standard features that requires high processing operation level was the main focusing point for many studies. In this paper, up to 6 different ECG signal classes are accurately predicted in the absence of ECG feature extraction. The corner stone of the proposed technique in this paper is the Linear predictive coding (LPC) technique that regress and normalize the signal during the pre-processing phase. Prior to the feature extraction using Wavelet energy (WE), a direct Wavelet transform (DWT) is implemented that converted ECG signal to frequency domain. In addition, the dataset was divided into two parts , one for training and the other for testing purposes Which have been classified in this proposed algorithm using support vector machine (SVM). Moreover, using MIT AI2 Companion was developed by MIT Center for Mobile Learning, the classification result was shared to the patient mobile phone that can call the ambulance and send the location in case of serious emergency. Finally, the confusion matrix values are used to measure the proposed classification performance. For 6 different ECG classes, an accuracy ration of about 98.15% was recorded. This ratio became 100% for 3 ECG signal classes and decreases to 97.95% by increasing ECG signal to 7 classes.

Get full-text (via PubEx)

IMPLEMENTATION OF SPEECH RECOGNITION SYSTEM USING DSP PROCESSOR ADSP2181

International Journal of Electronics Signals and Systems ◽

10.47893/ijess.2012.1056 ◽

2012 ◽

pp. 1-6

Author(s):

KALPANA JOSHI ◽

NILIMA KOLHARE ◽

V.M. PANDHARIPANDE

Keyword(s):

Signal Processing ◽

Speech Recognition ◽

Digital Signal Processing ◽

Speech Processing ◽

Speaker Recognition ◽

Digital Signal ◽

Recognition System ◽

Theory And Practice ◽

Clock Frequency ◽

Application Development

While many Automatic Speech Recognition applications employ powerful computers to handle the complex recognition algorithms, there is a clear demand for effective solutions on embedded platforms. Digital Signal Processing (DSP) is one of the most commonly used hardware platform that provides good development flexibility and requires relatively short application development cycle.DSP techniques have been at the heart of progress in Speech Processing during the last 25years.Simultaneously speech processing has been an important catalyst for the development of DSP theory and practice. Today DSP methods are used in speech analysis, synthesis, coding, recognition, enhancement as well as voice modification, speaker recognition, language identification.Speech recognition is generally computationally-intensive task and includes many of digital signal processing algorithms. In real-time and real environment speech recognisers applications, it’s often necessary to use embedded resource-limited hardware. Less memory, clock frequency, space and cost related to common architecture PC (x86), must be balanced by more effective computation.

Get full-text (via PubEx)

INDIVIDUAL IDENTIFICATION SYSTEM DESIGN THROUGH VOICE USING LINEAR PREDICTIVE CODING METHOD AND K-NEAREST NEIGHBOR

Jurnal Teknik Informatika (Jutif) ◽

10.20884/1.jutif.2021.2.2.71 ◽

2021 ◽

Vol 2 (2) ◽

pp. 95-100

Author(s):

Davita Nadia Fadhilah ◽

Rita Magdalena ◽

Sofia Sa’idah

Keyword(s):

Speech Recognition ◽

Nearest Neighbor ◽

Predictive Coding ◽

Voice Recognition ◽

Individual Identification ◽

Identification System ◽

K Nearest Neighbor ◽

Linear Predictive Coding ◽

Distance Method ◽

K Value

Humans have a variety of characteristics that are different from one another. Characteristics possessed by humans are genuine which can be used as a differentiator between one individual and another, one of which is sound. Voice recognition is called speech recognition. In this study, it was developed as an individual voice recognition system using a combination of the Linear Predictive Coding (LPC) method of feature extraction and K-Nearest Neighbor (K-NN) classification in the speech recognition process. Testing is done by testing changes in several parameters, namely the LPC order value, the number of frames, the K value, and different distance methods. The results of the parameter combination test showed a fairly good presentation of 73.56321839% with the combination parameter or LPC 8, the number of frames 480, the value of K 5, with the distance method used by Chebychev.

Get full-text (via PubEx)

A Novel Text to Speech Technique for Tamil Language using Hidden Markov Models (HMM)

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i8589.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 38-47

Keyword(s):

Signal Processing ◽

Markov Model ◽

Speech Processing ◽

Major Part ◽

Markov Models ◽

Hidden Markov ◽

Digital Signal ◽

Text To Speech ◽

Traditional System ◽

Local Languages

Application of digital signal processing in speech processing plays a major part in our everyday life. Text to speech system lets people to see and read out loud consecutively. Text-to-speech synthesizers use synthesis techniques that require good quality speech. Text to speech conversion (TTS) can apply to many applications such as automation, audio recording and audio-based assistance system. Text to speech conversion can be applied for various multinational language as well as for a number of local languages. An efficient text to speech conversion for Tamil language with extreme accuracy is proposed in this work. Multi feature, with a Hidden Markov Model (HMM) predictor is used to convert text to speech efficiently. By using the proposed method, the precision of the framework is enhanced by a factor of 6% when contrasted with the traditional system.

Get full-text (via PubEx)

Identifikasi Tulisan Tangan Huruf Katakana Jepang Dengan Metode Euclidean

J-SAKTI (Jurnal Sains Komputer dan Informatika) ◽

10.30645/j-sakti.v4i1.184 ◽

2020 ◽

Vol 4 (1) ◽

pp. 29

Author(s):

Imam Riadi ◽

Abdul Fadlil ◽

Putri Annisa

Keyword(s):

Image Processing ◽

Feature Extraction ◽

Learning Process ◽

Euclidean Distance ◽

Image Data ◽

Gray Level ◽

Accuracy Rate ◽

Distance Method ◽

Object A ◽

Data Acquisition Process

Katakana is one of the traditional Japanese letters used to absorption words from other languanges. In the inttroduction of an object a learning process is needed, which is obtained through the characteristics and experience of observing similar objects after being acquired. But manually it is quite difficult to distinguish between 5 hiragana vowels starting from the image data acquisition process, image processing, feature extraction using Gray Level Co-occurance Matrix (GLCM) while classifiers use the euclidean distance method. The results of the tests carried out showed an accuracy rate of around 78% using the euclidean method.

Get full-text (via PubEx)

Speaker Recognition Systems in the Last Decade – A Survey

Engineering and Technology Journal ◽

10.30684/etj.v39i1b.1589 ◽

2021 ◽

Vol 39 (1B) ◽

pp. 30-40

Author(s):

Ahmed M. Ahmed ◽

Aliaa K. Hassan

Keyword(s):

Feature Extraction ◽

Speaker Recognition ◽

Clustering Algorithms ◽

Predictive Coding ◽

Gaussian Mixture ◽

Linear Predictive Coding ◽

Mel Frequency Cepstral Coefficients ◽

Voice Signal ◽

Automatic Speaker Recognition ◽

Authentication System

Speaker Recognition Defined by the process of recognizing a person by his\her voice through specific features that extract from his\her voice signal. An Automatic Speaker recognition (ASP) is a biometric authentication system. In the last decade, many advances in the speaker recognition field have been attained, along with many techniques in feature extraction and modeling phases. In this paper, we present an overview of the most recent works in ASP technology. The study makes an effort to discuss several modeling ASP techniques like Gaussian Mixture Model GMM, Vector Quantization (VQ), and Clustering Algorithms. Also, several feature extraction techniques like Linear Predictive Coding (LPC) and Mel frequency cepstral coefficients (MFCC) are examined. Finally, as a result of this study, we found MFCC and GMM methods could be considered as the most successful techniques in the field of speaker recognition so far.

Get full-text (via PubEx)