scholarly journals Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation

Author(s):  
Yasheng Sun ◽  
Hang Zhou ◽  
Ziwei Liu ◽  
Hideki Koike

What can we picture solely from a clip of speech? Previous research has shown the possibility of directly inferring the appearance of a person's face by listening to a voice. However, within human speech lies not only the biometric identity signal but also the identity-irrelevant information such as the talking content. Our goal is to extract as much information from a clip of speech as possible. In particular, we aim at not only inferring the face of a person but also animating it. Our key insight is to synchronize audio and visual representations from two perspectives in a style-based generative framework. Specifically, contrastive learning is leveraged to map both the identity and speech content information within the speech to visual representation spaces. Furthermore, the identity space is strengthened with class centroids. Through curriculum learning, the style-based generator is capable of automatically balancing the information from the two latent spaces. Extensive experiments show that our approach encourages better speech-identity correlation learning while generating vivid faces whose identities are consistent with given speech samples. Moreover, by leveraging the same model, these inferred faces can be driven to talk by the audio.

2013 ◽  
Vol 17 (5-6) ◽  
pp. 525-559 ◽  
Author(s):  
Opher Mansour

Abstract This article examines the progress of a series of ambassadorial visits to Rome by emissaries from the Kongo, Japan, and Safavid Persia as they unfolded over the reign of Pope Paul V. Close attention is paid to the visual representation of the ambassadors, and of their actions, in engravings and in the decoration of the Quirinal Palace. The author argues that the public aspects of diplomacy, and of the visual representations based on it, played a significant role in articulating the Papacy’s missionary ambitions and sense of its global position. Furthermore, it is argued that the diplomatic and courtly practices of the papal court played a significant role in mediating the representation of “other” cultures in early modern Europe.


2018 ◽  
Author(s):  
Dave F Kleinschmidt

One of the persistent puzzles in understanding human speech perception is how listeners cope with talker variability. One thing that might help listeners is structure in talker variability: rather than varying randomly, talkers of the same gender, dialect, age, etc. tend to produce language in similar ways. Sociolinguistic research has shown that listeners are sensitive to this covariation between linguistic variation and socio-indexical variables. In this paper I present new techniques based on ideal observer models to quantify 1) the amount and type of structure in talker variation, and 2) how useful such structure can be for robust speech recognition in the face of talker variability. I demonstrate these techniques in two phonetic domains---word-initial stop voicing and vowel identity---and show that these domains have different amounts and types of talker variability, consistent with previous, impressionistic findings. An `R` package accompanies this paper, enabling researchers to apply these techniques to their own data.


2019 ◽  
Vol 35 (12) ◽  
Author(s):  
Mayara Sanay da Silva Oliveira ◽  
Mark Anthony Arceño ◽  
Priscila de Morais Sato ◽  
Fernanda Baeza Scagliusi

Visual representations of food-based dietary guidelines (FBDG) express diverse dietary and sociocultural norms, especially as they relate to healthy eating habits. This article investigates government recommendations for healthy eating habits expressed in the visual representation of Latin American FBDGs. Drawing on 15 images published between 1991 and 2017, we conducted an anthropological visual analysis guided by the methodology proposed by James Collier and Malcolm Collier: unstructured analyses, open viewing analyses, structured analyses and microanalyses. Here, we explore government recommendations based on visual representation shapes, food classification systems, lifestyle recommendations and embedded sociocultural elements. Our main findings relate to how dietary and sociocultural norms are used to promote eating practices considered healthy. Dietary norms focus on variety, proportionality, and moderation, as expressed in terms of food classification and food standards considered healthy. Sociocultural norms are referenced by the use of cultural symbols as strategies to promote traditional foods, cooking practices, commensality, water consumption and physical activity. Ultimately, we argue that FBDG visual representations contain embedded messages that counsel individuals to plan, buy, prepare and consume food with family; to consume foods considered healthy; to pay full attention to their meals, without distractions, such as television and cell phones; and to celebrate traditional, local and/or native foods and culinary preparations.


Literator ◽  
2015 ◽  
Vol 36 (1) ◽  
Author(s):  
M.R. Masubelele

People have an inherent need to communicate. They communicate out of need as well as for leisure. Human speech abounds with unpleasant and undesirable statements that could embarrass and even humiliate those spoken to or oneself. Brown and Levinson assert that unpleasant and undesirable statements have the potential to threaten the ‘face’ or self-esteem of the other person or persons. They define ‘face’ as the public self-image that every member of society wants to claim for themself. Simply put, ‘facework’ refers to ways people cooperatively attempt to promote both the other’s and their own sense of self-esteem in a conversation. As linguistic speech forms, idioms perform a variety of functions in a language. Not only do they make speech more colourful, but they also perform a communicative function in that they tend to soften the embarrassment and humiliation that often accompanies unpleasant and undesirable statements in speech. IsiZulu idioms will be examined in this article to establish to what extent they could contribute to managing ‘face’ issues. Examples of idioms will be drawn from C.L.S. Nyembezi and O.E.H. Nxumalo’s work Inqolobane Yesizwe. The facework theory as espoused by Brown and Levinson will underpin this discussion on isiZulu idioms.


2011 ◽  
Vol 41 (4) ◽  
pp. 365-404 ◽  
Author(s):  
Isabelle Charmantier

Abstract The Swedish naturalist Carl Linnaeus (1707–1778) is reputed to have transformed botanical practice by shunning the process of illustrating plants and relying on the primacy of literary descriptions of plant specimens. Botanists and historians have long debated Linnaeus's capacities as a draftsman. While some of his detailed sketches of plants and insects reveal a sure hand, his more general drawings of landscapes and people seem ill-executed. The overwhelming consensus, based mostly on his Lapland diary (1732), is that Linnaeus could not draw. Little has been said, however, on the role of drawing and other visual representations in Linnaeus's daily work as seen in his other numerous manuscripts. These manuscripts, held mostly at the Linnean Society of London, are peppered with sketches, maps, tables, and diagrams. Reassessing these manuscripts, along with the printed works that also contain illustrations of plant species, shows that Linnaeus's thinking was profoundly visual and that he routinely used visual representational devices in his various publications. This paper aims to explore the full range of visual representations Linnaeus used through his working life, and to reevaluate the epistemological value of visualization in the making of natural knowledge. By analyzing Linnaeus's use of drawings, maps, tables, and diagrams, I will show that he did not, as has been asserted, reduce the discipline of botany to text, and that his visual thinking played a fundamental role in his construction of new systems of classification.


2002 ◽  
Vol 32 (4) ◽  
pp. 403-421 ◽  
Author(s):  
Steven Kaplan

AbstractThe prevailing image of Zär'a Ya'eqob has tended to emphasize the intellectual at the expense of the experiential and political power at the expense of religious power. It is to these relatively neglected aspects of religious life that this article is devoted. It is our purpose here to emphasize the importance of the Cross, the image of the Virgin, the construction of churches and other visual aspects of religious life in Zär'a Ya'eqob's Ethiopia. No other Ethiopian ruler confronted the religious challenges presented by a divided Church and a largely unChristianized empire as systematically and as successfully as Zär'a Ya'eqob. Moreover, he was as sensitive to the daily unspoken truths of religious life as he was to great theological debates and controversies. He understood power in all its manifestations and sought to protect his state, his church, and his people with every means at his disposal. By promoting devotion to both the Cross and the Virgin Mary, he built on the foundations prepared by his parents, especially his father Dawit. He also mobilized Christian symbols which transcended local rivalries and regional loyalties. These symbols, as well as the churches he built, were also particularly suited to visual representation and hence comparatively easy to propagate among Ethiopia's largely illiterate population. They were, moreover, effective instruments of divine power, which brought home not only the message of Christianity's truth, but also its efficacy in the face of the numerous threats that Christians faced on a daily basis.


2019 ◽  
Author(s):  
Remington Mallett ◽  
Anurima Mummaneni ◽  
Jarrod Lewis-Peacock

Working memory persists in the face of distraction, yet not without consequence. Previous research has shown that memory for low-level visual features is systematically influenced by the maintenance or presentation of a similar distractor stimulus. Responses are frequently biased in stimulus space towards a perceptual distractor, though this has yet to be determined for high-level stimuli. We investigated whether these influences are shared for complex visual stimuli such as faces. To quantify response accuracies for these stimuli, we used a delayed-estimation task with a computer-generated “face space” consisting of eighty faces that varied continuously as a function of age and sex. In a set of three experiments, we found that responses for a target face held in working memory were biased towards a distractor face presented during the maintenance period. The amount of response bias did not vary as a function of distance between target and distractor. Our data suggest that, similar to low-level visual features, high-level face representations in working memory are biased by the processing of related but task-irrelevant information.


Author(s):  
Stacy Costa

Mathematical understanding goes beyond grasping numerical values and problem solving. By incorporating visual representation, students can be able to grasp how math can be understood in terms of geometry, which is essentially a visual device. It is important that students be able to incorporate visual representations alongside numerical values to gain meaning from their own knowledge. However, it is also vital that students understand mathematical terminology, via a dialogical-rhetorical pedagogy that now comes under the rubric of “Math Talk,” which in turn is part of a system of teaching known as knowledge building, both of which aim to recapture, in a new way, the Socratic method of dialogical interaction. This chapter explores how knowledge building, as a methodology, can assist in furthering student understanding and how math talk leads to a deeper understanding of mathematical principles.


2015 ◽  
Vol 75 (3) ◽  
Author(s):  
Rozita Ismail ◽  
Azizah Jaafar

The use of information and communication technology in learning is an essential part of the National Curriculum in Malaysia. Dyslexic children exhibit different skills and motivations. They dislike reading and find reading to be a painful activity. This paper presents an exploration of the visual representation used by dyslexic children in learning Bahasa Melayu. Visual representations, such as the use of icons and images, in interface design are remembered easily and enhance the level of concentration and experience in learning. The experiment used an educational multimedia courseware to investigate how dyslexic children react towards the use of icons and images. This study involved 12 dyslexic children in a primary school in Malaysia. Findings from the experiment were examined to understand how children with dyslexia perceive visual representations. The findings will help designers to carefully design and choose icons and images that can easily be interpreted by these children. Well designed icons and images allow dyslexic children to recognise the meaning of those visual representations without the need of additional support.


10.5772/6226 ◽  
2008 ◽  
Vol 5 (4) ◽  
pp. 42
Author(s):  
Omid Banyasad ◽  
Philip T. Cox

General purpose visual programming languages (VPLs) promote the construction of programs that are more comprehensible, robust, and maintainable by enabling programmers to directly observe and manipulate algorithms and data. However, they usually do not exploit the visual representation of entities in the problem domain, even if those entities and their interactions have obvious visual representations, as is the case in the robot control domain. We present a formal control model for autonomous robots, based on subsumption, and use it as the basis for a VPL in which reactive behaviour is programmed via interactions with a simulation.


Sign in / Sign up

Export Citation Format

Share Document