The cortical organization of speech processing: Feedback control and predictive coding the context of a dual-stream model

Perkembangan teknologi saat ini sangat bermanfaat bagi kehidupan banyak orang. Semua aspek kehidupan dapat memanfaatkan teknologi sesuai dengan bidang yang dibutuhkan, termasuk kendali rumah. Dari berbagai penelitian yang telah dilakukan diketahui bahwa sinyal suara dapat juga digunakan untuk berinteraksi dengan komputer, sehingga interaksi tersebut dapat berjalan lebih alami. Penelitian yang dilakukan dengan menggunakan data sinyal suara ini umumnya disebut dengan pemrosesan sinyal suara (speech processing).Penelitian ini bertujuan untuk membuat sistem yang dapat mengenali suara dalam bentuk kalimat agar kedepannya bisa digunakan dalam teknologi listrik. Proses pengolahan suara pun perlu melawati beberapa proses seperti: sampling, ektraksi dan pembelajaran. Dengan proses ekstraksi suatu sinyal suara dapat diketahui karakteristiknya. Terdapat beberapa macam metode ekstraksi ciri yang biasa digunakan, tetapi pada penelitian kali ini menggunakan metode Linear Predictive Coding (LPC). LPC digunakan karena sistem ekstraksinya yang mengadopsi sistem pendengaran manusia sebagai filter pengambilan informasi. Kemudian proses pembelajaran dan pengenalan suara sendiri akan dilakukan oleh Adaptive Neuro Fuzzy Interference System (ANFIS) karena kemampuannya yang bisa melakukan analisis probabilitas dan kemudian menghasilkan respon sesuai dengan parameter. Proses pengenalan suara untuk mengenali kalimat diawali dengan proses perekaman yang akan dijadikan data latih sebanyak 20 buah. Dari hasil uji coba, hasil ekstraksi dengan 4 ciri mempunyai akurasi paling kecil dengan 60% - 70% , sedangkan dengan 5 ciri akurasinya 60% - 80% dan 6 ciri menghasilkan akurasi yang sama yaitu 70% - 80%. Hasil identifikasi secara secara real time dengan 2 orang sebagai pengujiannya menghasilkan akurasi 60% pada pengujian orang pertama dan 70% pada orang kedua untuk pengujian dengan 4 ciri. Analisa waktu respon dengan ciri adalah ciri lebih sedikit akan mempercepat respon matlab dan analisi dengan banyak ciri akan melambatkan waktu respon.

Download Full-text

Segregation, connectivity, and gradients of deactivation in neural correlates of evidence in social decision making

10.1101/2020.02.04.934836 ◽

2020 ◽

Cited By ~ 1

Author(s):

Roberto Viviani ◽

Lisa Dommes ◽

Julia E. Bosch ◽

Karin Labek

Keyword(s):

Decision Making ◽

Social Cognition ◽

Functional Imaging ◽

Predictive Coding ◽

Imaging Study ◽

Neural Correlates ◽

Laboratory Animals ◽

Cortical Function ◽

Cortical Organization ◽

Associative Cortex

AbstractFunctional imaging studies of sensory decision making have detected a signal associated with evidence for decisions that is consistent with data from single-cell recordings in laboratory animals. However, the generality of this finding and its implications on our understanding of the organization of the fMRI signal are not clear. In the present functional imaging study, we investigated decisions in an elementary social cognition domain to identify the neural correlates of evidence, their segregation, connectivity, and their relationship to task deactivations. Besides providing data in support of an evidence-related signal in a social cognition task, we were interested in embedding these neural correlates in models of supramodal associative cortex placed at the top of a hierarchy of processing areas. Participants were asked to decide which of two depicted individuals was saddest based on information rich in sensory features (facial expressions) or through contextual cues suggesting the mental state of others (stylized drawings of mourning individuals). The signal associated with evidence for the decision was located in two distinct networks differentially recruited depending on the information type. Using the largest peaks of the signal associated with evidence as seeds in a database of connectivity data, these two networks were retrieved. Furthermore, the hubs of these networks were located near or along a ribbon of cortex located between task activations and deactivations between areas affected by perceptual priming and the deactivated areas of the default network system. In associative cortex, these findings suggest gradients of progressive relative deactivation as a possible neural correlate of the cortical organization envisaged by structural models of cortical organization and by predictive coding theories of cortical function.

Download Full-text

Expectations boost the reconstruction of auditory features from electrophysiological responses to noisy speech

10.1101/2021.09.06.459160 ◽

2021 ◽

Author(s):

Andrew W Corcoran ◽

Ricardo Perera ◽

Matthieu Koroma ◽

Sid Kouider ◽

Jakob Hohwy ◽

...

Keyword(s):

Sentence Processing ◽

Speech Processing ◽

Sentence Comprehension ◽

Predictive Coding ◽

Acoustic Stimulus ◽

Visual Presentation ◽

Active Listening ◽

Dramatic Improvement ◽

Written Information ◽

Band Power

Online speech processing imposes significant computational demands on the listening brain. Predictive coding provides an elegant account of the way this challenge is met through the exploitation of prior knowledge. While such accounts have accrued considerable evidence at the sublexical- and word-levels, relatively little is known about the predictive mechanisms that support sentence-level processing. Here, we exploit the 'pop-out' phenomenon (i.e. dramatic improvement in the intelligibility of degraded speech following prior information) to investigate the psychophysiological correlates of sentence comprehension. We recorded electroencephalography and pupillometry from 21 humans (10 females) while they rated the clarity of full sentences that had been degraded via noise-vocoding or sine-wave synthesis. Sentence pop-out was reliably elicited following visual presentation of the corresponding written sentence, despite never hearing the undistorted speech. No such effect was observed following incongruent or no written information. Pop-out was associated with improved reconstruction of the acoustic stimulus envelope from low-frequency EEG activity, implying that pop-out is mediated via top-down signals that enhance the precision of cortical speech representations. Spectral analysis revealed that pop-out was accompanied by a reduction in theta-band power, consistent with predictive coding accounts of acoustic filling-in and incremental sentence processing. Moreover, delta- and alpha-band power, as well as pupil diameter, were increased following the provision of any written information. We interpret these findings as evidence of a transition to a state of active listening, whereby participants selectively engage attentional and working memory processes to evaluate the congruence between expected and actual sensory input.

Download Full-text

VLSI arrays for speech processing with linear predictive coding

Proceedings of the 12th IAPR International Conference on Pattern Recognition (Cat. No.94CH3440-5) ◽

10.1109/icpr.1994.577201 ◽

2002 ◽

Cited By ~ 2

Author(s):

Y.Y. Tang ◽

Tao Li ◽

C.Y. Suen

Keyword(s):

Speech Processing ◽

Predictive Coding ◽

Linear Predictive Coding

Download Full-text

Predictive coding with neural transmission delays: a real-time temporal alignment hypothesis

10.1101/453183 ◽

2018 ◽

Cited By ~ 2

Author(s):

Hinze Hogendoorn ◽

Anthony N Burkitt

Keyword(s):

Real Time ◽

Visual Motion ◽

Predictive Coding ◽

Extended Model ◽

Prediction Errors ◽

Transmission Delays ◽

Cortical Organization ◽

Visual Hierarchy ◽

Neural Transmission ◽

Feedback Connections

AbstractHierarchical predictive coding is an influential model of cortical organization, in which sequential hierarchical layers are connected by feedback connections carrying predictions, as well as feedforward connections carrying prediction errors. To date, however, predictive coding models have neglected to take into account that neural transmission itself takes time. For a time-varying stimulus, such as a moving object, this means that feedback predictions become misaligned with new sensory input. We present an extended model implementing both feed-forward and feedback extrapolation mechanisms that realigns feedback predictions to minimize prediction error. This realignment has the consequence that neural representations across all hierarchical stages become aligned in real-time. Using visual motion as an example, we show that the model is neurally plausible, that it is consistent with evidence of extrapolation mechanisms throughout the visual hierarchy, that it predicts several known motion-position illusions, and that it provides a solution to the temporal binding problem.

Download Full-text

Identifikasi Emosi Manusia Berdasarkan Ucapan Menggunakan Metode Ekstraksi Ciri LPC dan Metode Euclidean Distance

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2020722693 ◽

2020 ◽

Vol 7 (6) ◽

pp. 1177

Author(s):

Siti Helmiyah ◽

Imam Riadi ◽

Rusydi Umar ◽

Abdullah Hanif ◽

Anton Yudhana ◽

...

Keyword(s):

Signal Processing ◽

Feature Extraction ◽

Speech Processing ◽

Euclidean Distance ◽

Predictive Coding ◽

Digital Signal ◽

Linear Predictive Coding ◽

Distance Method ◽

Average Accuracy ◽

Voice Data

Ucapan merupakan sinyal yang memiliki kompleksitas tinggi terdiri dari berbagai informasi. Informasi yang dapat ditangkap dari ucapan dapat berupa pesan terhadap lawan bicara, pembicara, bahasa, bahkan emosi pembicara itu sendiri tanpa disadari oleh si pembicara. Speech Processing adalah cabang dari pemrosesan sinyal digital yang bertujuan untuk terwujudnya interaksi yang natural antar manusia dan mesin. Karakteristik emosional adalah fitur yang terdapat dalam ucapan yang membawa ciri-ciri dari emosi pembicara. Linear Predictive Coding (LPC) adalah sebuah metode untuk mengekstraksi ciri dalam pemrosesan sinyal. Penelitian ini, menggunakan LPC sebagai ekstraksi ciri dan Metode Euclidean Distance untuk identifikasi emosi berdasarkan ciri yang didapatkan dari LPC. Penelitian ini menggunakan data emosi marah, sedih, bahagia, netral dan bosan. Data yang digunakan diambil dari Berlin Emo DB, dengan menggunakan tiga kalimat berbeda dan aktor yang berbeda juga. Penelitian ini menghasilkan akurasi pada emosi sedih 58,33%, emosi netral 50%, emosi marah 41,67%, emosi bahagia 8,33% dan untuk emosi bosan tidak dapat dikenali. Penggunaan Metode LPC sebagai ekstraksi ciri memberikan hasil yang kurang baik pada penelitian ini karena akurasi rata-rata hanya sebesar 31,67% untuk identifikasi semua emosi. Data suara yang digunakan dengan kalimat, aktor, umur dan aksen yang berbeda dapat mempengaruhi dalam pengenalan emosi, maka dari itu ekstraksi ciri dalam pengenalan pola ucapan emosi manusia sangat penting. Hasil akurasi pada penelitian ini masih sangat kecil dan dapat ditingkatkan dengan menggunakan ekstraksi ciri yang lain seperti prosidis, spektral, dan kualitas suara, penggunaan parameter max, min, mean, median, kurtosis dan skewenes. Selain itu penggunaan metode klasifikasi juga dapat mempengaruhi hasil pengenalan emosi. AbstractSpeech is a signal that has a high complexity consisting of various information. Information that can be captured from speech can be in the form of messages to interlocutor, the speaker, the language, even the speaker's emotions themselves without the speaker realizing it. Speech Processing is a branch of digital signal processing aimed at the realization of natural interactions between humans and machines. Emotional characteristics are features contained in the speech that carry the characteristics of the speaker's emotions. Linear Predictive Coding (LPC) is a method for extracting features in signal processing. This research uses LPC as a feature extraction and Euclidean Distance Method to identify emotions based on features obtained from LPC. This study uses data on emotions of anger, sadness, happiness, neutrality, and boredom. The data used was taken from Berlin Emo DB, using three different sentences and different actors. This research resulted in inaccuracy in sad emotions 58.33%, neutral emotions 50%, angry emotions 41.67%, happy emotions 8.33% and bored emotions could not be recognized. The use of the LPC method as feature extraction gave unfavorable results in this study because the average accuracy was only 31.67% for the identification of all emotions. Voice data used with different sentences, actors, ages, and accents can influence the recognition of emotions, therefore the extraction of features in the recognition of speech patterns of human emotions is very important. Accuracy results in this study are still very small and can be improved by using other feature extractions such as provides, spectral, and sound quality, using parameters max, min, mean, median, kurtosis, and skewness. Besides the use of classification methods can also affect the results of emotional recognition.

Download Full-text

A New Approach To Speech Coding: the Neural Predictive Coding

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2000.p0120 ◽

2000 ◽

Vol 4 (1) ◽

pp. 120-127 ◽

Cited By ~ 5

Author(s):

Bruno Gas ◽

◽

Jean Luc Zarader ◽

Cyril Chavy

Keyword(s):

Discriminant Analysis ◽

Speech Processing ◽

Speech Signal ◽

Predictive Coding ◽

Class Membership ◽

New Approach ◽

Coding Systems ◽

Signal Coding ◽

Membership Information ◽

Connectionist Methods

In this article we propose a new speech signal coding model applied to the recognition of phonemes. This model is an extension to the non linear area of adaptive coding systems used in speech processing. For this purpose, we use predictive connectionist methods. We show that it is possible to take into account class membership information of the phonemes from the stage of coding. To evaluate the NPC encoder, a study of a database of phonemes by discriminant analysis and an application to phonemes recognition are carried out. Simulations presented here show that classification has obviously been improved, compared to currently used types of coding.

Download Full-text

Faculty Opinions recommendation of The hierarchical cortical organization of human speech processing.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.727692323.793550757 ◽

2018 ◽

Author(s):

Andrew King

Keyword(s):

Speech Processing ◽

Human Speech ◽

Cortical Organization

Download Full-text

Fast Computation of LSP Frequencies Using the Bairstow Method

Electronics ◽

10.3390/electronics9030387 ◽

2020 ◽

Vol 9 (3) ◽

pp. 387 ◽

Cited By ~ 1

Author(s):

Yuqun Xue ◽

Zhijiu Zhu ◽

Jianhua Jiang ◽

Yi Zhan ◽

Zenghui Yu ◽

...

Keyword(s):

Speech Processing ◽

Linear Prediction ◽

Predictive Coding ◽

Computation Time ◽

Fast Computation ◽

Linear Predictive Coding ◽

Polynomial Roots ◽

Alternative Representation ◽

Perceptual Evaluation ◽

Initial Method

Linear prediction is the kernel technology in speech processing. It has been widely applied in speech recognition, synthesis, and coding, and can efficiently and correctly represent the speech frequency spectrum with only a few parameters. Line Spectrum Pairs (LSPs) frequencies, as an alternative representation of Linear Predictive Coding (LPC), have the advantages of good quantization accuracy and low spectral sensitivity. However, computing the LSPs frequencies takes a long time. To address this issue, a fast computation algorithm, based on the Bairstow method for computing LSPs frequencies from linear prediction coefficients, is proposed in this paper. The algorithm process first transforms the symmetric and antisymmetric polynomial to general polynomial, then extracts the polynomial roots. Associated with the short-term stationary property of speech signal, an adaptive initial method is applied to reduce the average iteration numbers by 26%, as compared to the statics in the initial method, with a Perceptual Evaluation of Speech Quality (PESQ) score reaching 3.46. Experimental results show that the proposed method can extract the polynomial roots efficiently and accurately with significantly reduced computation complexity. Compared to previous works, the proposed method is 17 times faster than Tschirnhus Transform, and has a 22% PESQ improvement on the Birge-Vieta method with an almost comparable computation time.

Download Full-text