Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation

What can we picture solely from a clip of speech? Previous research has shown the possibility of directly inferring the appearance of a person's face by listening to a voice. However, within human speech lies not only the biometric identity signal but also the identity-irrelevant information such as the talking content. Our goal is to extract as much information from a clip of speech as possible. In particular, we aim at not only inferring the face of a person but also animating it. Our key insight is to synchronize audio and visual representations from two perspectives in a style-based generative framework. Specifically, contrastive learning is leveraged to map both the identity and speech content information within the speech to visual representation spaces. Furthermore, the identity space is strengthened with class centroids. Through curriculum learning, the style-based generator is capable of automatically balancing the information from the two latent spaces. Extensive experiments show that our approach encourages better speech-identity correlation learning while generating vivid faces whose identities are consistent with given speech samples. Moreover, by leveraging the same model, these inferred faces can be driven to talk by the audio.

Download Full-text

Picturing Global Conversion: Art and Diplomacy at the Court of Paul V (1605-1621)

Journal of Early Modern History ◽

10.1163/15700658-12342380 ◽

2013 ◽

Vol 17 (5-6) ◽

pp. 525-559 ◽

Cited By ~ 2

Author(s):

Opher Mansour

Keyword(s):

Early Modern ◽

Significant Role ◽

Visual Representation ◽

Visual Representations ◽

Early Modern Europe ◽

Close Attention ◽

The Public ◽

Modern Europe ◽

Papal Court

Abstract This article examines the progress of a series of ambassadorial visits to Rome by emissaries from the Kongo, Japan, and Safavid Persia as they unfolded over the reign of Pope Paul V. Close attention is paid to the visual representation of the ambassadors, and of their actions, in engravings and in the decoration of the Quirinal Palace. The author argues that the public aspects of diplomacy, and of the visual representations based on it, played a significant role in articulating the Papacy’s missionary ambitions and sense of its global position. Furthermore, it is argued that the diplomatic and courtly practices of the papal court played a significant role in mediating the representation of “other” cultures in early modern Europe.

Download Full-text

Structure in talker variability: How much is there and how much can it help?

10.31234/osf.io/a4tkn ◽

2018 ◽

Author(s):

Dave F Kleinschmidt

Keyword(s):

Speech Recognition ◽

Speech Perception ◽

R Package ◽

Ideal Observer ◽

Linguistic Variation ◽

Robust Speech Recognition ◽

Talker Variability ◽

New Techniques ◽

Human Speech ◽

The Face

One of the persistent puzzles in understanding human speech perception is how listeners cope with talker variability. One thing that might help listeners is structure in talker variability: rather than varying randomly, talkers of the same gender, dialect, age, etc. tend to produce language in similar ways. Sociolinguistic research has shown that listeners are sensitive to this covariation between linguistic variation and socio-indexical variables. In this paper I present new techniques based on ideal observer models to quantify 1) the amount and type of structure in talker variation, and 2) how useful such structure can be for robust speech recognition in the face of talker variability. I demonstrate these techniques in two phonetic domains---word-initial stop voicing and vowel identity---and show that these domains have different amounts and types of talker variability, consistent with previous, impressionistic findings. An `R` package accompanies this paper, enabling researchers to apply these techniques to their own data.

Download Full-text

Comparison of government recommendations for healthy eating habits in visual representations of food-based dietary guidelines in Latin America

Cadernos de Saúde Pública ◽

10.1590/0102-311x00177418 ◽

2019 ◽

Vol 35 (12) ◽

Author(s):

Mayara Sanay da Silva Oliveira ◽

Mark Anthony Arceño ◽

Priscila de Morais Sato ◽

Fernanda Baeza Scagliusi

Keyword(s):

Latin American ◽

Healthy Eating ◽

Visual Analysis ◽

Eating Habits ◽

Visual Representation ◽

Visual Representations ◽

Dietary Guidelines ◽

Classification Systems ◽

Healthy Eating Habits ◽

Native Foods

Visual representations of food-based dietary guidelines (FBDG) express diverse dietary and sociocultural norms, especially as they relate to healthy eating habits. This article investigates government recommendations for healthy eating habits expressed in the visual representation of Latin American FBDGs. Drawing on 15 images published between 1991 and 2017, we conducted an anthropological visual analysis guided by the methodology proposed by James Collier and Malcolm Collier: unstructured analyses, open viewing analyses, structured analyses and microanalyses. Here, we explore government recommendations based on visual representation shapes, food classification systems, lifestyle recommendations and embedded sociocultural elements. Our main findings relate to how dietary and sociocultural norms are used to promote eating practices considered healthy. Dietary norms focus on variety, proportionality, and moderation, as expressed in terms of food classification and food standards considered healthy. Sociocultural norms are referenced by the use of cultural symbols as strategies to promote traditional foods, cooking practices, commensality, water consumption and physical activity. Ultimately, we argue that FBDG visual representations contain embedded messages that counsel individuals to plan, buy, prepare and consume food with family; to consume foods considered healthy; to pay full attention to their meals, without distractions, such as television and cell phones; and to celebrate traditional, local and/or native foods and culinary preparations.

Download Full-text

A critical analysis of ‘face’-managing factors in isiZulu idioms

Literator ◽

10.4102/lit.v36i1.1150 ◽

2015 ◽

Vol 36 (1) ◽

Author(s):

M.R. Masubelele

Keyword(s):

Critical Analysis ◽

Sense Of Self ◽

Self Esteem ◽

The Other ◽

Communicative Function ◽

Human Speech ◽

The Public ◽

Self Image ◽

The Face

People have an inherent need to communicate. They communicate out of need as well as for leisure. Human speech abounds with unpleasant and undesirable statements that could embarrass and even humiliate those spoken to or oneself. Brown and Levinson assert that unpleasant and undesirable statements have the potential to threaten the ‘face’ or self-esteem of the other person or persons. They define ‘face’ as the public self-image that every member of society wants to claim for themself. Simply put, ‘facework’ refers to ways people cooperatively attempt to promote both the other’s and their own sense of self-esteem in a conversation. As linguistic speech forms, idioms perform a variety of functions in a language. Not only do they make speech more colourful, but they also perform a communicative function in that they tend to soften the embarrassment and humiliation that often accompanies unpleasant and undesirable statements in speech. IsiZulu idioms will be examined in this article to establish to what extent they could contribute to managing ‘face’ issues. Examples of idioms will be drawn from C.L.S. Nyembezi and O.E.H. Nxumalo’s work Inqolobane Yesizwe. The facework theory as espoused by Brown and Levinson will underpin this discussion on isiZulu idioms.

Download Full-text

Carl Linnaeus and the Visual Representation of Nature

Historical Studies in the Natural Sciences ◽

10.1525/hsns.2011.41.4.365 ◽

2011 ◽

Vol 41 (4) ◽

pp. 365-404 ◽

Cited By ~ 11

Author(s):

Isabelle Charmantier

Keyword(s):

Plant Species ◽

Visual Representation ◽

Full Range ◽

Visual Representations ◽

Working Life ◽

Daily Work ◽

Natural Knowledge ◽

Linnean Society ◽

Carl Linnaeus

Abstract The Swedish naturalist Carl Linnaeus (1707–1778) is reputed to have transformed botanical practice by shunning the process of illustrating plants and relying on the primacy of literary descriptions of plant specimens. Botanists and historians have long debated Linnaeus's capacities as a draftsman. While some of his detailed sketches of plants and insects reveal a sure hand, his more general drawings of landscapes and people seem ill-executed. The overwhelming consensus, based mostly on his Lapland diary (1732), is that Linnaeus could not draw. Little has been said, however, on the role of drawing and other visual representations in Linnaeus's daily work as seen in his other numerous manuscripts. These manuscripts, held mostly at the Linnean Society of London, are peppered with sketches, maps, tables, and diagrams. Reassessing these manuscripts, along with the printed works that also contain illustrations of plant species, shows that Linnaeus's thinking was profoundly visual and that he routinely used visual representational devices in his various publications. This paper aims to explore the full range of visual representations Linnaeus used through his working life, and to reevaluate the epistemological value of visualization in the making of natural knowledge. By analyzing Linnaeus's use of drawings, maps, tables, and diagrams, I will show that he did not, as has been asserted, reduce the discipline of botany to text, and that his visual thinking played a fundamental role in his construction of new systems of classification.

Download Full-text

SEEING IS BELIEVING: THE POWER OF VISUAL CULTURE IN THE RELIGIOUS WORLD OF AŞE ZÄR'A YA'EQOB OF ETHIOPIA (1434-1468)

Journal of Religion in Africa ◽

10.1163/157006602321107621 ◽

2002 ◽

Vol 32 (4) ◽

pp. 403-421 ◽

Cited By ~ 3

Author(s):

Steven Kaplan

Keyword(s):

Visual Culture ◽

Political Power ◽

Visual Representation ◽

Religious Life ◽

Daily Basis ◽

Virgin Mary ◽

Divine Power ◽

The Face ◽

The Cross

AbstractThe prevailing image of Zär'a Ya'eqob has tended to emphasize the intellectual at the expense of the experiential and political power at the expense of religious power. It is to these relatively neglected aspects of religious life that this article is devoted. It is our purpose here to emphasize the importance of the Cross, the image of the Virgin, the construction of churches and other visual aspects of religious life in Zär'a Ya'eqob's Ethiopia. No other Ethiopian ruler confronted the religious challenges presented by a divided Church and a largely unChristianized empire as systematically and as successfully as Zär'a Ya'eqob. Moreover, he was as sensitive to the daily unspoken truths of religious life as he was to great theological debates and controversies. He understood power in all its manifestations and sought to protect his state, his church, and his people with every means at his disposal. By promoting devotion to both the Cross and the Virgin Mary, he built on the foundations prepared by his parents, especially his father Dawit. He also mobilized Christian symbols which transcended local rivalries and regional loyalties. These symbols, as well as the churches he built, were also particularly suited to visual representation and hence comparatively easy to propagate among Ethiopia's largely illiterate population. They were, moreover, effective instruments of divine power, which brought home not only the message of Christianity's truth, but also its efficacy in the face of the numerous threats that Christians faced on a daily basis.

Download Full-text

Distraction biases working memory for faces

10.31219/osf.io/qvez5 ◽

2019 ◽

Author(s):

Remington Mallett ◽

Anurima Mummaneni ◽

Jarrod Lewis-Peacock

Keyword(s):

Working Memory ◽

Irrelevant Information ◽

Visual Features ◽

Maintenance Period ◽

Stimulus Space ◽

Estimation Task ◽

Low Level ◽

The Face ◽

High Level ◽

Task Irrelevant

Working memory persists in the face of distraction, yet not without consequence. Previous research has shown that memory for low-level visual features is systematically influenced by the maintenance or presentation of a similar distractor stimulus. Responses are frequently biased in stimulus space towards a perceptual distractor, though this has yet to be determined for high-level stimuli. We investigated whether these influences are shared for complex visual stimuli such as faces. To quantify response accuracies for these stimuli, we used a delayed-estimation task with a computer-generated “face space” consisting of eighty faces that varied continuously as a function of age and sex. In a set of three experiments, we found that responses for a target face held in working memory were biased towards a distractor face presented during the maintenance period. The amount of response bias did not vary as a function of distance between target and distractor. Our data suggest that, similar to low-level visual features, high-level face representations in working memory are biased by the processing of related but task-irrelevant information.

Download Full-text

Math Talk as Discourse Strategy

Advances in Multimedia and Interactive Technologies - Empirical Research on Semiotics and Visual Rhetoric ◽

10.4018/978-1-5225-5622-0.ch010 ◽

2018 ◽

pp. 205-220

Author(s):

Stacy Costa

Keyword(s):

Problem Solving ◽

Knowledge Building ◽

Visual Representation ◽

Visual Representations ◽

Student Understanding ◽

Mathematical Understanding ◽

Socratic Method ◽

Math Talk ◽

Mathematical Terminology ◽

Discourse Strategy

Mathematical understanding goes beyond grasping numerical values and problem solving. By incorporating visual representation, students can be able to grasp how math can be understood in terms of geometry, which is essentially a visual device. It is important that students be able to incorporate visual representations alongside numerical values to gain meaning from their own knowledge. However, it is also vital that students understand mathematical terminology, via a dialogical-rhetorical pedagogy that now comes under the rubric of “Math Talk,” which in turn is part of a system of teaching known as knowledge building, both of which aim to recapture, in a new way, the Socratic method of dialogical interaction. This chapter explores how knowledge building, as a methodology, can assist in furthering student understanding and how math talk leads to a deeper understanding of mathematical principles.

Download Full-text

INTERFACE DESIGN FOR YOUNG DYSLEXICS: A SURVEY ON VISUAL REPRESENTATION

Jurnal Teknologi ◽

10.11113/jt.v75.5050 ◽

2015 ◽

Vol 75 (3) ◽

Author(s):

Rozita Ismail ◽

Azizah Jaafar

Keyword(s):

Primary School ◽

Information And Communication Technology ◽

Communication Technology ◽

Interface Design ◽

Visual Representation ◽

National Curriculum ◽

Visual Representations ◽

Educational Multimedia ◽

Information And Communication ◽

Additional Support

The use of information and communication technology in learning is an essential part of the National Curriculum in Malaysia. Dyslexic children exhibit different skills and motivations. They dislike reading and find reading to be a painful activity. This paper presents an exploration of the visual representation used by dyslexic children in learning Bahasa Melayu. Visual representations, such as the use of icons and images, in interface design are remembered easily and enhance the level of concentration and experience in learning. The experiment used an educational multimedia courseware to investigate how dyslexic children react towards the use of icons and images. This study involved 12 dyslexic children in a primary school in Malaysia. Findings from the experiment were examined to understand how children with dyslexia perceive visual representations. The findings will help designers to carefully design and choose icons and images that can easily be interpreted by these children. Well designed icons and images allow dyslexic children to recognise the meaning of those visual representations without the need of additional support.

Download Full-text

Visual Programming of Subsumption-Based Reactive Behaviour

International Journal of Advanced Robotic Systems ◽

10.5772/6226 ◽

2008 ◽

Vol 5 (4) ◽

pp. 42

Author(s):

Omid Banyasad ◽

Philip T. Cox

Keyword(s):

Programming Languages ◽

Robot Control ◽

Autonomous Robots ◽

Visual Representation ◽

Visual Representations ◽

Visual Programming ◽

Control Model ◽

General Purpose ◽

Problem Domain ◽

Formal Control

General purpose visual programming languages (VPLs) promote the construction of programs that are more comprehensible, robust, and maintainable by enabling programmers to directly observe and manipulate algorithms and data. However, they usually do not exploit the visual representation of entities in the problem domain, even if those entities and their interactions have obvious visual representations, as is the case in the robot control domain. We present a formal control model for autonomous robots, based on subsumption, and use it as the basis for a VPL in which reactive behaviour is programmed via interactions with a simulation.

Download Full-text