Speaker Adaptation for Lip-Reading Using Visual Identity Vectors

Author(s):  
Pujitha Appan Kandala ◽  
Abhinav Thanda ◽  
Dilip Kumar Margam ◽  
Rohith Chandrashekar Aralikatti ◽  
Tanay Sharma ◽  
...  
Keyword(s):  
2020 ◽  
Vol 64 (4) ◽  
pp. 40404-1-40404-16
Author(s):  
I.-J. Ding ◽  
C.-M. Ruan

Abstract With rapid developments in techniques related to the internet of things, smart service applications such as voice-command-based speech recognition and smart care applications such as context-aware-based emotion recognition will gain much attention and potentially be a requirement in smart home or office environments. In such intelligence applications, identity recognition of the specific member in indoor spaces will be a crucial issue. In this study, a combined audio-visual identity recognition approach was developed. In this approach, visual information obtained from face detection was incorporated into acoustic Gaussian likelihood calculations for constructing speaker classification trees to significantly enhance the Gaussian mixture model (GMM)-based speaker recognition method. This study considered the privacy of the monitored person and reduced the degree of surveillance. Moreover, the popular Kinect sensor device containing a microphone array was adopted to obtain acoustic voice data from the person. The proposed audio-visual identity recognition approach deploys only two cameras in a specific indoor space for conveniently performing face detection and quickly determining the total number of people in the specific space. Such information pertaining to the number of people in the indoor space obtained using face detection was utilized to effectively regulate the accurate GMM speaker classification tree design. Two face-detection-regulated speaker classification tree schemes are presented for the GMM speaker recognition method in this study—the binary speaker classification tree (GMM-BT) and the non-binary speaker classification tree (GMM-NBT). The proposed GMM-BT and GMM-NBT methods achieve excellent identity recognition rates of 84.28% and 83%, respectively; both values are higher than the rate of the conventional GMM approach (80.5%). Moreover, as the extremely complex calculations of face recognition in general audio-visual speaker recognition tasks are not required, the proposed approach is rapid and efficient with only a slight increment of 0.051 s in the average recognition time.


Author(s):  
Jan Borowicz

The author examines body politics in Nazi cinema and propaganda movies (medical short films and materials filmed in the Polish Ghettos) in terms of constructing the visual identity of a nation in opposition to the allegedly non-normative bodies of Jews and mentally ill persons. The author connects the visual material with notions of biopolitics (Foucault, Agamben, Esposito).


Author(s):  
Lin Han ◽  
Lu Han

With the rapid development of China’s market economy, brand image is becoming more and more important for an enterprise to enhance its market competitiveness and occupy a favorable market share. However, the brand image of many established companies gradually loses with the development of society and the improvement of people’s aesthetic pursuit. This has forced it to change its corporate brand image and regain the favor of the market. Based on this, this article combines the related knowledge and concepts of fuzzy theory, from the perspective of visual identity design, explores the development of corporate brand image visual identity intelligent system, and aims to design a set of visual identity system that is different from competitors in order to shape the enterprise. Distinctive brand image and improve its market competitiveness. This article first collected a large amount of information through the literature investigation method, and made a systematic and comprehensive introduction to fuzzy theory, visual recognition technology and related theoretical concepts of brand image, which laid a sufficient theoretical foundation for the later discussion of the application of fuzzy theory in the design of brand image visual recognition intelligent system; then the fuzzy theory algorithm is described in detail, a fuzzy neural network is proposed and applied to the design of the brand image visual recognition intelligent system, and the design experiment of the intelligent recognition system is carried out; finally, through the use of the specific case of KFC brand logo, the designed intelligent recognition system was tested, and it was found that the visual recognition intelligent system had an overall accuracy rate of 96.08% for the KFC brand logo. Among them, the accuracy rate of color recognition was the highest, 96.62%; comparing the changes in the output value of the training sample and the test sample, the output convergence effect of the color network is the best; through the comparison test of the BP neural network, the recognition effect of the fuzzy neural network is better.


Sign in / Sign up

Export Citation Format

Share Document