Performance evaluation and implementations of MFCC, SVM and MLP algorithms in the FPGA board

One of the most difficult speech recognition tasks is accurate recognition of human-to-human communication. Advances in deep learning over the last few years have produced major speech improvements in recognition on the representative Switch-board conversational corpus. Word error rates that just a few years ago were 14% have dropped to 8.0%, then 6.6% and most recently 5.8%, and are now believed to be within striking range of human performance. This raises two issues - what is human performance, and how far down can we still drive speech recognition error rates? The main objective of this article is the development of a comparative study of the performance of Automatic Speech Recognition (ASR) algorithms using a database made up of a set of signals created by female and male speakers of different ages. We will also develop techniques for the Software and Hardware implementation of these algorithms and test them in an embedded electronic card based on a reconfigurable circuit (Field Programmable Gate Array FPGA). We will present an analysis of the results of classifications for the best Support Vector Machine architectures (SVM) and Artificial Neural Networks of Multi-Layer Perceptron (MLP). Following our analysis, we created NIOSII processors and we tested their operations as well as their characteristics. The characteristics of each processor are specified in this article (cost, size, speed, power consumption and complexity). At the end of this work, we physically implemented the architecture of the Mel Frequency Cepstral Coefficients (MFCC) extraction algorithm as well as the classification algorithm that provided the best results.

Download Full-text

Environmental effects on reliability and accuracy of MFCC based voice recognition for industrial human-robot-interaction

Proceedings of the Institution of Mechanical Engineers Part B Journal of Engineering Manufacture ◽

10.1177/09544054211014492 ◽

2021 ◽

pp. 095440542110144

Author(s):

B Birch ◽

CA Griffiths ◽

A Morgan

Keyword(s):

Speech Recognition ◽

Voice Recognition ◽

Human Robot Interaction ◽

Hole Drilling ◽

Time Warping ◽

Mel Frequency Cepstral Coefficients ◽

Robot Interaction ◽

Extraction Algorithm ◽

Dynamic Time ◽

Manufacturing Environments

Collaborative robots are becoming increasingly important for advanced manufacturing processes. The purpose of this paper is to determine the capability of a novel Human-Robot-interface to be used for machine hole drilling. Using a developed voice activation system, environmental factors on speech recognition accuracy are considered. The research investigates the accuracy of a Mel Frequency Cepstral Coefficients-based feature extraction algorithm which uses Dynamic Time Warping to compare an utterance to a limited, user-dependent dictionary. The developed Speech Recognition method allows for Human-Robot-Interaction using a novel integration method between the voice recognition and robot. The system can be utilised in many manufacturing environments where robot motions can be coupled to voice inputs rather than using time consuming physical interfaces. However, there are limitations to uptake in industries where the volume of background machine noise is high.

Download Full-text

A Prototyping Environment for Hardware/Software Codesign of OFDM Systems

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.2803 ◽

2013 ◽

Vol 380-384 ◽

pp. 2803-2806

Author(s):

Xu Ming Lu ◽

Wei Jie Wen ◽

Hong Zhou Tan

Keyword(s):

Integrated Circuits ◽

Data Transmission ◽

Hardware Implementation ◽

Ofdm Systems ◽

Ofdm System ◽

Field Programmable ◽

Software And Hardware ◽

Front End Module ◽

Application Specific ◽

Insight Into

To make rapid implementation and verification for the systems becomes important in frontend Application Specific Integrated Circuits. Therefore, a field programmable gate array based hardware/software codesign prototyping environment is proposed to simulate the software implementation and verify the hardware implementation of a baseband OFDM system. The system is implemented by software and hardware partitions, respectively. The analog radio frequency front-end module helps take a full insight into the actual baseband system performance. User datagram protocol is used for data transmission between these two partitions, and hence makes a complete baseband system. With the proposed codesign environment, the software simulation is running over real wireless channels, and the hardware implemental results can be flexibly processed in real time and enhances the design efficiency.

Download Full-text

Speech Recognition Based on Feature Extraction with the Aid of Multi Support Vector Machine

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2016.5607 ◽

2016 ◽

Vol 13 (10) ◽

pp. 6616-6627

Author(s):

B Kanisha ◽

G Balakrishnan

Keyword(s):

Support Vector Machine ◽

Feature Extraction ◽

Speech Recognition ◽

Back Propagation ◽

Spectral Feature ◽

Signal Frequency ◽

Support Vector ◽

Mel Frequency Cepstral Coefficients ◽

Proper Form ◽

Feed Forward Back Propagation

Speech recognition process applications are emerging as ever-zooming and efficient mechanisms in the hi-tech universe. There is a host of diverse interactive speech-aware applications in the market. With the rocketing requirement for upcoming embedded platforms and with the incredible increase in the demand for embedded computing, it is highly indispensable that the speech recognition systems (SRS) are put in place at the right time and in the proper form so that it is easily possible to perform multimedia tasks on these mechanisms. In this work, primarily through preprocessing the speech signal is processed where for the recognition of the particular signal, the noise is detached and then it enters into feature extraction in that peak signal frequency and it is compared with the standard signal and recognized. The signal is processed and noise free signal is produced by processing the signal to Mel frequency cepstral coefficients (MFCC), Tri-spectral feature, and discrete wave transform (DWT). To the input of the multi-class Support vector machine, the output of the above mentioned features is given. The processed signal is converted in to text by multi SVM. It is proved that our proposed technique is better than the existing technique by comparing the existing technique (FFBN) feed forward back propagation with the proposed technique. The proposed technique is implemented in the working platform of MATLAB.

Download Full-text

HARDWARE IMPLEMENTATION OF A NEW ARTIFICIAL NEURON

International Journal of Neural Systems ◽

10.1142/s0129065705000402 ◽

2005 ◽

Vol 15 (06) ◽

pp. 427-433 ◽

Cited By ~ 1

Author(s):

RICHARD LABIB ◽

FRANCIS AUDETTE ◽

ALEXANDRE FORTIN ◽

REZA ASSADI

Keyword(s):

Neural Network ◽

Hardware Implementation ◽

Field Programmable Gate Arrays ◽

External Temperature ◽

Artificial Neuron ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

New Type ◽

Software And Hardware

This paper describes an FPGA (Field Programmable Gate Arrays) implementation of a new type of neuron, the Quantron. The goal is to demonstrate the capability of current technology to closely recreate the human body's reaction to a change of temperature. This is accomplished by creating a function that adds a number of kernels at different frequencies depending on the external temperature. Once the sum of the kernels reaches a certain threshold, the artificial neural network, equivalent to its biological counterpart, "reacts" by sending a specific output signal designed to trigger a response. The various elements of each subsystem are discussed and implemented in software and hardware. The results are analyzed in terms of accuracy and efficiency compared to the biological equivalent.

Download Full-text

Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.3465 ◽

2020 ◽

Vol 10 (2) ◽

pp. 5547-5553

Author(s):

A. A. Alasadi ◽

T. H. Aldhayni ◽

R. R. Deshmukh ◽

A. H. Alahmadi ◽

A. S. Alshebami

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Group Delay ◽

Recognition System ◽

Support Vector ◽

Speech Recognition System ◽

Mel Frequency Cepstral Coefficients ◽

Delay Function ◽

Cepstral Coefficients ◽

Arabic Speech Recognition

This paper studies three feature extraction methods, Mel-Frequency Cepstral Coefficients (MFCC), Power-Normalized Cepstral Coefficients (PNCC), and Modified Group Delay Function (ModGDF) for the development of an Automated Speech Recognition System (ASR) in Arabic. The Support Vector Machine (SVM) algorithm processed the obtained features. These feature extraction algorithms extract speech or voice characteristics and process the group delay functionality calculated straight from the voice signal. These algorithms were deployed to extract audio forms from Arabic speakers. PNCC provided the best recognition results in Arabic speech in comparison with the other methods. Simulation results showed that PNCC and ModGDF were more accurate than MFCC in Arabic speech recognition.

Download Full-text

GESTURE RECOGNITION SYSTEM FOR NIGERIAN TRIBAL GREETING POSTURES USING SUPPORT VECTOR MACHINE

MALAYSIAN JOURNAL OF COMPUTING ◽

10.24191/mjoc.v5i2.10347 ◽

2020 ◽

Vol 5 (2) ◽

pp. 609

Author(s):

Segun Aina ◽

Kofoworola V. Sholesi ◽

Aderonke R. Lawal ◽

Samuel D. Okegbile ◽

Adeniran I. Oluwaranti

Keyword(s):

Support Vector Machine ◽

Gesture Recognition ◽

Recognition Rate ◽

Recognition Task ◽

Recognition System ◽

Human Interaction ◽

Support Vector ◽

System A ◽

Extraction Algorithm ◽

Gaussian Blur

This paper presents the application of Gaussian blur filters and Support Vector Machine (SVM) techniques for greeting recognition among the Yoruba tribe of Nigeria. Existing efforts have considered different recognition gestures. However, tribal greeting postures or gestures recognition for the Nigerian geographical space has not been studied before. Some cultural gestures are not correctly identified by people of the same tribe, not to mention other people from different tribes, thereby posing a challenge of misinterpretation of meaning. Also, some cultural gestures are unknown to most people outside a tribe, which could also hinder human interaction; hence there is a need to automate the recognition of Nigerian tribal greeting gestures. This work hence develops a Gaussian Blur – SVM based system capable of recognizing the Yoruba tribe greeting postures for men and women. Videos of individuals performing various greeting gestures were collected and processed into image frames. The images were resized and a Gaussian blur filter was used to remove noise from them. This research used a moment-based feature extraction algorithm to extract shape features that were passed as input to SVM. SVM is exploited and trained to perform the greeting gesture recognition task to recognize two Nigerian tribe greeting postures. To confirm the robustness of the system, 20%, 25% and 30% of the dataset acquired from the preprocessed images were used to test the system. A recognition rate of 94% could be achieved when SVM is used, as shown by the result which invariably proves that the proposed method is efficient.

Download Full-text

Cognitive-Affective Emotion Classification: Comparing Features Extraction Algorithm Classified by Multi-Class Support Vector Machine

SSRN Electronic Journal ◽

10.2139/ssrn.3417178 ◽

2016 ◽

Author(s):

Nova Diana ◽

Ahmad Sabiq

Keyword(s):

Support Vector Machine ◽

Features Extraction ◽

Support Vector ◽

Emotion Classification ◽

Extraction Algorithm

Download Full-text

Comments on “Towards increasing speech recognition error rates” by H. Bourlard, H. Hermansky, and N. Morgan

Speech Communication ◽

10.1016/0167-6393(96)00008-8 ◽

1996 ◽

Vol 18 (3) ◽

pp. 238

Author(s):

Sadaoki Furui

Keyword(s):

Speech Recognition ◽

Error Rates ◽

Recognition Error

Download Full-text

Recurrent support vector machines for speech recognition

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2016.7472806 ◽

2016 ◽

Cited By ~ 3

Author(s):

Shi-Xiong Zhang ◽

Rui Zhao ◽

Chaojun Liu ◽

Jinyu Li ◽

Yifan Gong

Keyword(s):

Speech Recognition ◽

Support Vector Machines ◽

Support Vector ◽

Vector Machines

Download Full-text

The Design and Evaluation of a Speech Interface for Generation of Air Tasking Orders

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/154193120004402266 ◽

2000 ◽

Vol 44 (22) ◽

pp. 750-753 ◽

Cited By ~ 1

Author(s):

David T. Williamson ◽

Timothy P. Barry

Keyword(s):

Speech Recognition ◽

Planning Process ◽

Data Entry ◽

Command And Control ◽

Error Rates ◽

Subject Matter Experts ◽

Speech Interface ◽

System Selection ◽

And Control ◽

Operations Center

This paper discusses the design, implementation, and evaluation of a prototype speech recognition interface to the Theater Air Planning (TAP) module of Theater Battle Management Core Systems (TBMCS). This effort was in support of a Kenney Battlelab Initiative proposal submitted to the Command and Control Battlelab at Hurlburt Field, FL to assess the operational benefits of speech recognition for data entry applications in a Joint Air Operations Center environment. Several factors contributing to the design of the “TAPTalk” speech interface included interviews with subject matter experts, speech system selection, grammar development, and integration into TAP, which required only minor modification of existing software. Results from the two week operational assessment with sixteen subjects from the Command and Control Training and Innovation Group, numbered Air Forces, Navy, and Marine Corp indicated that the Theater Air Planning process could be accomplished significantly faster with no increase in error rates. Subjectively, the sixteen planners unanimously agreed that the TAPTalk speech interface was a valuable addition to TAP and would recommend its inclusion in a future upgrade. Recommendations for further improving the TAPTalk system are discussed.

Download Full-text