A speech and character combined recognition engine for mobile devices

Author(s):  
Soo‐Young Suk ◽  
Hyun‐Yeol Chung

PurposeThe purpose of this paper is to describe a speech and character combined recognition engine (SCCRE) developed for working on personal digital assistants (PDAs) or on mobile devices. Also, the architecture of a distributed recognition system for providing a more convenient user interface is discussed.Design/methodology/approachIn SCCRE, feature extraction for speech and for character is carried out separately, but the recognition is performed in an engine. The client recognition engine essentially employs a continuous hidden Markov model (CHMM) structure and this CHMM structure consists of variable parameter topology in order to minimize the number of model parameters and to reduce recognition time. This model also adopts the proposed successive state and mixture splitting (SSMS) method for generating context independent model. SSMS optimizes the number of mixtures through splitting in mixture domain and the number of states through splitting in time domain.FindingsThe recognition results show that the developed engine can reduce the total number of Gaussian up to 40 per cent compared with the fixed parameter models at the same recognition performance when applied to speech recognition for mobile devices. It shows that SSMS can reduce the size of memory for models to 65 per cent and that for processing to 82 per cent. Moreover, the recognition time decreases 17 per cent with the SMS model while maintaining the recognition rate.Originality/valueThe proposed system will be very useful for many on‐line multimodal interfaces such as PDAs and mobile applications.

Circuit World ◽  
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Neethu P.S. ◽  
Suguna R. ◽  
Palanivel Rajan S.

Purpose This paper aims to propose a novel methodology for classifying the gestures using support vector machine (SVM) classification method. Initially, the Red Green Blue color hand gesture image is converted into YCbCr image in preprocessing stage and then palm with finger region is segmented by threshold process. Then, distance transformation method is applied on the palm with finger segmented image. Further, the center point (centroid) of palm region is detected and the fingertips are detected using SVM classification algorithm based on the detected centroids of the detected palm region. Design/methodology/approach Gesture is a physical indication of the body to convey information. Though any bodily movement can be considered a gesture, generally it originates from the movement of hand or face or combination of both. Combined gestures are quiet complex and difficult for a machine to classify. This paper proposes a novel methodology for classifying the gestures using SVM classification method. Initially, the color hand gesture image is converted into YCbCr image in preprocessing stage and then palm with finger region is segmented by threshold process. Then, distance transformation method is applied on the palm with finger segmented image. Further, the center point of the palm region is detected and the fingertips are detected using SVM classification algorithm. The proposed hand gesture image classification system is applied and tested on “Jochen Triesch,” “Sebastien Marcel” and “11Khands” data set hand gesture images to evaluate the efficiency of the proposed system. The performance of the proposed system is analyzed with respect to sensitivity, specificity, accuracy and recognition rate. The simulation results of the proposed method on these different data sets are compared with the conventional methods. Findings This paper proposes a novel methodology for classifying the gestures using SVM classification method. Distance transform method is used to detect the center point of the segmented palm region. The proposed hand gesture detection methodology achieves 96.5% of sensitivity, 97.1% of specificity, 96.9% of accuracy and 99.3% of recognition rate on “Jochen Triesch” data set. The proposed hand gesture detection methodology achieves 94.6% of sensitivity, 95.4% of specificity, 95.3% of accuracy and 97.8% of recognition rate on “Sebastien Marcel” data set. The proposed hand gesture detection methodology achieves 97% of sensitivity, 98% of specificity, 98.1% of accuracy and 98.8% of recognition rate on “11Khands” data set. The proposed hand gesture detection methodology consumes 0.52 s as recognition time on “Jochen Triesch” data set images, 0.71 s as recognition time on “Sebastien Marcel” data set images and 0.22 s as recognition time on “11Khands” data set images. It is very clear that the proposed hand gesture detection methodology consumes less recognition rate on “11Khands” data set when compared with other data set images. Hence, this data set is very suitable for real-time hand gesture applications with multi background environments. Originality/value The modern world requires more numbers of automated systems for improving our daily routine activities in an efficient manner. This present day technology emerges touch screen methodology for operating or functioning many devices or machines with or without wire connections. This also makes impact on automated vehicles where the vehicles can be operated without any interfacing with the driver. This is possible through hand gesture recognition system. This hand gesture recognition system captures the real-time hand gestures, a physical movement of human hand, as a digital image and recognizes them with the pre stored set of hand gestures.


2014 ◽  
Vol 2 (2) ◽  
pp. 43-53 ◽  
Author(s):  
S. Rojathai ◽  
M. Venkatesulu

In speech word recognition systems, feature extraction and recognition plays a most significant role. More number of feature extraction and recognition methods are available in the existing speech word recognition systems. In most recent Tamil speech word recognition system has given high speech word recognition performance with PAC-ANFIS compared to the earlier Tamil speech word recognition systems. So the investigation of speech word recognition by various recognition methods is needed to prove their performance in the speech word recognition. This paper presents the investigation process with well known Artificial Intelligence method as Feed Forward Back Propagation Neural Network (FFBNN) and Adaptive Neuro Fuzzy Inference System (ANFIS). The Tamil speech word recognition system with PAC-FFBNN performance is analyzed in terms of statistical measures and Word Recognition Rate (WRR) and compared with PAC-ANFIS and other existing Tamil speech word recognition systems.


2020 ◽  
Vol 39 (4) ◽  
pp. 5749-5760
Author(s):  
Yanfei Hai

The purpose of this paper is to use English specific syllables and prosodic features in spoken speech data to carry out English spoken recognition, and to explore effective methods for the design and application of English speech detection and automatic recognition systems. The method proposed by this study is a combination of SVM_FF based classifier, SVM_IER based classifier and syllable classifier. Compared with the method based on the combination of other phonological characteristics such as phonological rate, intensity, formant and energy statistics and pronunciation rate, and the syllable-based classifier based on specific syllable training, a better recognition rate is obtained. In addition, this study conducts simulation experiments on the proposed English recognition and identification method based on specific syllables and prosodic features and analyzes the experimental results. The result found that the recognition performance of the English spoken recognition system constructed by this study is significantly better than the traditional model.


Author(s):  
Y. S. Huang ◽  
K. Liu ◽  
C. Y. Suen ◽  
Y. Y. Tang

This paper proposes a novel method which enables a Chinese character recognition system to obtain reliable recognition. In this method, two thresholds, i.e. class region thresholdRk and disambiguity thresholdAk, are used by each Chinese character k when the classifier is designed based on the nearest neighbor rule, where Rk defines the pattern distribution region of character k, and Ak prevents the samples not belonging to character k from being ambiguously recognized as character k. A novel algorithm to derive the appropriate thresholds Ak and Rk is developed so that a better recognition reliability can be obtained through iterative learning. Experiments performed on the ITRI printed Chinese character database have achieved highly reliable recognition performance (such as 0.999 reliability with a 95.14% recognition rate), which shows the feasibility and effectiveness of the proposed method.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5185
Author(s):  
Yu Zhai ◽  
Jieyu Lei ◽  
Wenze Xia ◽  
Shaokun Han ◽  
Fei Liu ◽  
...  

This work introduces a super-resolution (SR) algorithm for range images on the basis of self-guided joint filtering (SGJF), adding the range information of the range image as a coefficient of the filter to reduce the influence of the intensity image texture on the super-resolved image. A range image SR recognition system is constructed to study the effect of four SR algorithms including the SGJF algorithm on the recognition of the laser radar (ladar) range image. The effects of different model library sizes, SR algorithms, SR factors and noise conditions on the recognition are tested via experiments. Results demonstrate that all tested SR algorithms can improve the recognition rate of low-resolution (low-res) range images to varying degrees and the proposed SGJF algorithm has a very good comprehensive recognition performance. Finally, suggestions for the use of SR algorithms in actual scene recognition are proposed on the basis of the experimental results.


Author(s):  
PEI-YIH TING ◽  
CHIU-YU TSENG ◽  
LIN-SHAN LEE

In a long-term research project, the recognition of Mandarin speech for very large vocabulary and unlimited text is considered. Its first stage goal is to recognize the Mandarin syllables. In a previous paper, an initial/final two-phase recognition approach to recognize these very confusing syllables was proposed, in which each syllable is divided into initial and final parts and recognized separately, and efficient recognition techniques for the finals were proposed and discussed. This paper serves as a continuation and proposes an efficient system to recognize the Mandarin initials. In this system, a classification procedure is first used to categorize the unknown initials into two groups C1 and C2; different approaches are then separately applied and independently optimized to recognize C1 and C2. It is found that Finite State Vector Quantization (FSVQ) is very useful, whose two modified versions, Modified FSVQ (MFSVQ) and the Second Order FSVQ (SOFSVQ), can provide the best recognition performance for C1 and C2 by carefully adjusting a design parameter called characteristic interval. Experimental results show that a recognition rate of 94.1% to 94.7% can be achieved using this system. Such a design is accomplished by carefully considering the special characteristics of Mandarin syllables and initials.


2016 ◽  
Vol 2016 ◽  
pp. 1-15 ◽  
Author(s):  
Yibing Li ◽  
Jie Chen ◽  
Fang Ye ◽  
Dandan Liu

ATR system has a broad application prospect in the military field, especially in the field of modern defense technology. When paradoxes are in existence in ATR system due to adverse battlefield environment, integration cannot be effectively and reliably carried out only by traditional DS evidence theory. In this paper, a modified DS evidence theory is presented and applied in IR/MMW target recognition system. The improvement of DS evidence theory is realized by three parts: the introduction of sensor priority and evidence credibility to realize the discount processing of evidences, the modification of DS combination rule to enhance the accuracy of synthesis results, and the compound decision-making rule. The application of the modified algorithm in IR/MMW system is designed to deal with paradoxes, improve the target recognition rate, and ensure the reliability of target recognition system. Experiments are given to illustrate that the introduction of the modified DS evidence theory in IR/MMW system is better able to realize satisfactory target recognition performance through multisensor information fusion than any single-mode system.


Author(s):  
Brendan MccCane ◽  
Terry Caelli ◽  
Olivier de Vel

In this paper we further explore the use of machine learning (ML) for the recognition of 3D objects in isolation or embedded in scenes. Of particular interest is the use of a recent ML technique (specifically CRG — Conditional Rule Generation) which generates descriptions of objects in terms of object parts and part-relational attribute bounds. We show how this technique can be combined with intensity-based model and scene–views to locate objects and their pose. The major contributions of this paper are: the extension of the CRG classifier to incorporate fuzzy decisions (FCRG), the application of the FCRG classifier to the problem of learning 3D objects from 2D intensity images, the study of the usefulness of sparse depth data in regards to recognition performance, and the implementation of a complete object recognition system that does not rely on perfect or synthetic data. We report a recognition rate of 80% for unseen single object scenes in a database of 18 non-trivial objects.


2020 ◽  
Vol 13 (4) ◽  
pp. 527-543
Author(s):  
Wenjuan Shen ◽  
Xiaoling Li

Purposerecent years, facial expression recognition has been widely used in human machine interaction, clinical medicine and safe driving. However, there is a limitation that conventional recurrent neural networks can only learn the time-series characteristics of expressions based on one-way propagation information.Design/methodology/approachTo solve such limitation, this paper proposes a novel model based on bidirectional gated recurrent unit networks (Bi-GRUs) with two-way propagations, and the theory of identity mapping residuals is adopted to effectively prevent the problem of gradient disappearance caused by the depth of the introduced network. Since the Inception-V3 network model for spatial feature extraction has too many parameters, it is prone to overfitting during training. This paper proposes a novel facial expression recognition model to add two reduction modules to reduce parameters, so as to obtain an Inception-W network with better generalization.FindingsFinally, the proposed model is pretrained to determine the best settings and selections. Then, the pretrained model is experimented on two facial expression data sets of CK+ and Oulu- CASIA, and the recognition performance and efficiency are compared with the existing methods. The highest recognition rate is 99.6%, which shows that the method has good recognition accuracy in a certain range.Originality/valueBy using the proposed model for the applications of facial expression, the high recognition accuracy and robust recognition results with lower time consumption will help to build more sophisticated applications in real world.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Haixia Yang ◽  
Zhaohui Ji ◽  
Jun Sun ◽  
Fanan Xing ◽  
Yixian Shen ◽  
...  

Human gestures have been considered as one of the important human-computer interaction modes. With the fast development of wireless technology in urban Internet of Things (IoT) environment, Wi-Fi can not only provide the function of high-speed network communication but also has great development potential in the field of environmental perception. This paper proposes a gesture recognition system based on the channel state information (CSI) within the physical layer of Wi-Fi transmission. To solve the problems of noise interference and phase offset in the CSI, we adopt a model based on CSI quotient. Then, the amplitude and phase curves of CSI are smoothed using Savitzky-Golay filter, and the one-dimensional convolutional neural network (1D-CNN) is used to extract the gesture features. Then, the support vector machine (SVM) classifier is adopted to recognize the gestures. The experimental results have shown that our system can achieve a recognition rate of about 90% for three common gestures, including pushing forward, left stroke, and waving. Meanwhile, the effects of different human orientation and model parameters on the recognition results are analyzed as well.


Sign in / Sign up

Export Citation Format

Share Document