scholarly journals Performance evaluation and implementations of MFCC, SVM and MLP algorithms in the FPGA board

Author(s):  
Salaheddine Khamlich ◽  
Fathallah Khamlich ◽  
Issam Atouf ◽  
Mohamed Benrabh

One of the most difficult speech recognition tasks is accurate recognition of human-to-human communication. Advances in deep learning over the last few years have produced major speech improvements in recognition on the representative Switch-board conversational corpus. Word error rates that just a few years ago were 14% have dropped to 8.0%, then 6.6% and most recently 5.8%, and are now believed to be within striking range of human performance. This raises two issues - what is human performance, and how far down can we still drive speech recognition error rates? The main objective of this article is the development of a comparative study of the performance of Automatic Speech Recognition (ASR) algorithms using a database made up of a set of signals created by female and male speakers of different ages. We will also develop techniques for the Software and Hardware implementation of these algorithms and test them in an embedded electronic card based on a reconfigurable circuit (Field Programmable Gate Array FPGA). We will present an analysis of the results of classifications for the best Support Vector Machine architectures (SVM) and Artificial Neural Networks of Multi-Layer Perceptron (MLP). Following our analysis, we created NIOSII processors and we tested their operations as well as their characteristics. The characteristics of each processor are specified in this article (cost, size, speed, power consumption and complexity). At the end of this work, we physically implemented the architecture of the Mel Frequency Cepstral Coefficients (MFCC) extraction algorithm as well as the classification algorithm that provided the best results.

Author(s):  
B Birch ◽  
CA Griffiths ◽  
A Morgan

Collaborative robots are becoming increasingly important for advanced manufacturing processes. The purpose of this paper is to determine the capability of a novel Human-Robot-interface to be used for machine hole drilling. Using a developed voice activation system, environmental factors on speech recognition accuracy are considered. The research investigates the accuracy of a Mel Frequency Cepstral Coefficients-based feature extraction algorithm which uses Dynamic Time Warping to compare an utterance to a limited, user-dependent dictionary. The developed Speech Recognition method allows for Human-Robot-Interaction using a novel integration method between the voice recognition and robot. The system can be utilised in many manufacturing environments where robot motions can be coupled to voice inputs rather than using time consuming physical interfaces. However, there are limitations to uptake in industries where the volume of background machine noise is high.


2013 ◽  
Vol 380-384 ◽  
pp. 2803-2806
Author(s):  
Xu Ming Lu ◽  
Wei Jie Wen ◽  
Hong Zhou Tan

To make rapid implementation and verification for the systems becomes important in frontend Application Specific Integrated Circuits. Therefore, a field programmable gate array based hardware/software codesign prototyping environment is proposed to simulate the software implementation and verify the hardware implementation of a baseband OFDM system. The system is implemented by software and hardware partitions, respectively. The analog radio frequency front-end module helps take a full insight into the actual baseband system performance. User datagram protocol is used for data transmission between these two partitions, and hence makes a complete baseband system. With the proposed codesign environment, the software simulation is running over real wireless channels, and the hardware implemental results can be flexibly processed in real time and enhances the design efficiency.


2016 ◽  
Vol 13 (10) ◽  
pp. 6616-6627
Author(s):  
B Kanisha ◽  
G Balakrishnan

Speech recognition process applications are emerging as ever-zooming and efficient mechanisms in the hi-tech universe. There is a host of diverse interactive speech-aware applications in the market. With the rocketing requirement for upcoming embedded platforms and with the incredible increase in the demand for embedded computing, it is highly indispensable that the speech recognition systems (SRS) are put in place at the right time and in the proper form so that it is easily possible to perform multimedia tasks on these mechanisms. In this work, primarily through preprocessing the speech signal is processed where for the recognition of the particular signal, the noise is detached and then it enters into feature extraction in that peak signal frequency and it is compared with the standard signal and recognized. The signal is processed and noise free signal is produced by processing the signal to Mel frequency cepstral coefficients (MFCC), Tri-spectral feature, and discrete wave transform (DWT). To the input of the multi-class Support vector machine, the output of the above mentioned features is given. The processed signal is converted in to text by multi SVM. It is proved that our proposed technique is better than the existing technique by comparing the existing technique (FFBN) feed forward back propagation with the proposed technique. The proposed technique is implemented in the working platform of MATLAB.


2005 ◽  
Vol 15 (06) ◽  
pp. 427-433 ◽  
Author(s):  
RICHARD LABIB ◽  
FRANCIS AUDETTE ◽  
ALEXANDRE FORTIN ◽  
REZA ASSADI

This paper describes an FPGA (Field Programmable Gate Arrays) implementation of a new type of neuron, the Quantron. The goal is to demonstrate the capability of current technology to closely recreate the human body's reaction to a change of temperature. This is accomplished by creating a function that adds a number of kernels at different frequencies depending on the external temperature. Once the sum of the kernels reaches a certain threshold, the artificial neural network, equivalent to its biological counterpart, "reacts" by sending a specific output signal designed to trigger a response. The various elements of each subsystem are discussed and implemented in software and hardware. The results are analyzed in terms of accuracy and efficiency compared to the biological equivalent.


2020 ◽  
Vol 10 (2) ◽  
pp. 5547-5553
Author(s):  
A. A. Alasadi ◽  
T. H. Aldhayni ◽  
R. R. Deshmukh ◽  
A. H. Alahmadi ◽  
A. S. Alshebami

This paper studies three feature extraction methods, Mel-Frequency Cepstral Coefficients (MFCC), Power-Normalized Cepstral Coefficients (PNCC), and Modified Group Delay Function (ModGDF) for the development of an Automated Speech Recognition System (ASR) in Arabic. The Support Vector Machine (SVM) algorithm processed the obtained features. These feature extraction algorithms extract speech or voice characteristics and process the group delay functionality calculated straight from the voice signal. These algorithms were deployed to extract audio forms from Arabic speakers. PNCC provided the best recognition results in Arabic speech in comparison with the other methods. Simulation results showed that PNCC and ModGDF were more accurate than MFCC in Arabic speech recognition.


2020 ◽  
Vol 5 (2) ◽  
pp. 609
Author(s):  
Segun Aina ◽  
Kofoworola V. Sholesi ◽  
Aderonke R. Lawal ◽  
Samuel D. Okegbile ◽  
Adeniran I. Oluwaranti

This paper presents the application of Gaussian blur filters and Support Vector Machine (SVM) techniques for greeting recognition among the Yoruba tribe of Nigeria. Existing efforts have considered different recognition gestures. However, tribal greeting postures or gestures recognition for the Nigerian geographical space has not been studied before. Some cultural gestures are not correctly identified by people of the same tribe, not to mention other people from different tribes, thereby posing a challenge of misinterpretation of meaning. Also, some cultural gestures are unknown to most people outside a tribe, which could also hinder human interaction; hence there is a need to automate the recognition of Nigerian tribal greeting gestures. This work hence develops a Gaussian Blur – SVM based system capable of recognizing the Yoruba tribe greeting postures for men and women. Videos of individuals performing various greeting gestures were collected and processed into image frames. The images were resized and a Gaussian blur filter was used to remove noise from them. This research used a moment-based feature extraction algorithm to extract shape features that were passed as input to SVM. SVM is exploited and trained to perform the greeting gesture recognition task to recognize two Nigerian tribe greeting postures. To confirm the robustness of the system, 20%, 25% and 30% of the dataset acquired from the preprocessed images were used to test the system. A recognition rate of 94% could be achieved when SVM is used, as shown by the result which invariably proves that the proposed method is efficient.


Author(s):  
David T. Williamson ◽  
Timothy P. Barry

This paper discusses the design, implementation, and evaluation of a prototype speech recognition interface to the Theater Air Planning (TAP) module of Theater Battle Management Core Systems (TBMCS). This effort was in support of a Kenney Battlelab Initiative proposal submitted to the Command and Control Battlelab at Hurlburt Field, FL to assess the operational benefits of speech recognition for data entry applications in a Joint Air Operations Center environment. Several factors contributing to the design of the “TAPTalk” speech interface included interviews with subject matter experts, speech system selection, grammar development, and integration into TAP, which required only minor modification of existing software. Results from the two week operational assessment with sixteen subjects from the Command and Control Training and Innovation Group, numbered Air Forces, Navy, and Marine Corp indicated that the Theater Air Planning process could be accomplished significantly faster with no increase in error rates. Subjectively, the sixteen planners unanimously agreed that the TAPTalk speech interface was a valuable addition to TAP and would recommend its inclusion in a future upgrade. Recommendations for further improving the TAPTalk system are discussed.


Sign in / Sign up

Export Citation Format

Share Document