speaker classification
Recently Published Documents


TOTAL DOCUMENTS

51
(FIVE YEARS 5)

H-INDEX

7
(FIVE YEARS 0)

Author(s):  
Yanxia Zhao ◽  
Wei Ren ◽  
Zheng Li

Facing the English conversion system, the existing accent marking algorithms cannot acquire the morphological rules of English, making the accent marking inaccurate, inefficient, and time-consuming. To solve these problems, this paper puts forward an accent marking algorithm of English conversion system based on morphological rules. Specifically, the English audios in a self-developed English corpus were classified by the speaker classification software based on hidden Markov model, as well as audio classification technology, producing the morphological rules of English. After that, the English accents were marked by the maximum entropy model in the English conversion system. The proposed method was proved accurate and efficient in accent marking through experiments. The research results provide a good reference for marking the accents in English conversion system.


2020 ◽  
Vol 64 (4) ◽  
pp. 40404-1-40404-16
Author(s):  
I.-J. Ding ◽  
C.-M. Ruan

Abstract With rapid developments in techniques related to the internet of things, smart service applications such as voice-command-based speech recognition and smart care applications such as context-aware-based emotion recognition will gain much attention and potentially be a requirement in smart home or office environments. In such intelligence applications, identity recognition of the specific member in indoor spaces will be a crucial issue. In this study, a combined audio-visual identity recognition approach was developed. In this approach, visual information obtained from face detection was incorporated into acoustic Gaussian likelihood calculations for constructing speaker classification trees to significantly enhance the Gaussian mixture model (GMM)-based speaker recognition method. This study considered the privacy of the monitored person and reduced the degree of surveillance. Moreover, the popular Kinect sensor device containing a microphone array was adopted to obtain acoustic voice data from the person. The proposed audio-visual identity recognition approach deploys only two cameras in a specific indoor space for conveniently performing face detection and quickly determining the total number of people in the specific space. Such information pertaining to the number of people in the indoor space obtained using face detection was utilized to effectively regulate the accurate GMM speaker classification tree design. Two face-detection-regulated speaker classification tree schemes are presented for the GMM speaker recognition method in this study—the binary speaker classification tree (GMM-BT) and the non-binary speaker classification tree (GMM-NBT). The proposed GMM-BT and GMM-NBT methods achieve excellent identity recognition rates of 84.28% and 83%, respectively; both values are higher than the rate of the conventional GMM approach (80.5%). Moreover, as the extremely complex calculations of face recognition in general audio-visual speaker recognition tasks are not required, the proposed approach is rapid and efficient with only a slight increment of 0.051 s in the average recognition time.


Author(s):  
Rupinderdeep Kaur ◽  
R. K. Sharma ◽  
Parteek Kumar

It has been observed from the literature that speech is the most natural means of communication between humans. Human beings start speaking without any tool or any explicit education. The environment surrounding them helps them to learn the art of speaking. From the existing literature, it is found that the existing speaker classification techniques suffer from over-fitting and parameter tuning issues. An efficient tuning of machine learning techniques can improve the classification accuracy of speaker classification. To overcome this issue, in this paper, an efficient particle swarm optimization-based support vector machine is proposed. The proposed and the competitive speaker classification techniques are tested on the speaker classification data of Punjabi persons. The comparative analysis of the proposed technique reveals that it outperforms existing techniques in terms of accuracy, [Formula: see text]-measure, specificity and sensitivity.


Sign in / Sign up

Export Citation Format

Share Document