speaker detection
Recently Published Documents


TOTAL DOCUMENTS

94
(FIVE YEARS 19)

H-INDEX

13
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Christopher Birmingham ◽  
Maja Mataric ◽  
Kalin Stefanov

2021 ◽  
Author(s):  
Yuanhang Zhang ◽  
Susan Liang ◽  
Shuang Yang ◽  
Xiao Liu ◽  
Zhongqin Wu ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
You Jin Kim ◽  
Hee-Soo Heo ◽  
Soyeon Choe ◽  
Soo-Whan Chung ◽  
Yoohwan Kwon ◽  
...  

2021 ◽  
Author(s):  
Baptiste Pouthier ◽  
Laurent Pilati ◽  
Leela K. Gudupudi ◽  
Charles Bouveyron ◽  
Frederic Precioso

Author(s):  
Xin Yang ◽  
Zongliang Ma ◽  
Letian Yu ◽  
Ying Cao ◽  
Baocai Yin ◽  
...  

In this article, we propose a fully automatic system for generating comic books from videos without any human intervention. Given an input video along with its subtitles, our approach first extracts informative keyframes by analyzing the subtitles and stylizes keyframes into comic-style images. Then, we propose a novel automatic multi-page layout framework that can allocate the images across multiple pages and synthesize visually interesting layouts based on the rich semantics of the images (e.g., importance and inter-image relation). Finally, as opposed to using the same type of balloon as in previous works, we propose an emotion-aware balloon generation method to create different types of word balloons by analyzing the emotion of subtitles and audio. Our method is able to vary balloon shapes and word sizes in balloons in response to different emotions, leading to more enriched reading experience. Once the balloons are generated, they are placed adjacent to their corresponding speakers via speaker detection. Our results show that our method, without requiring any user inputs, can generate high-quality comic pages with visually rich layouts and balloons. Our user studies also demonstrate that users prefer our generated results over those by state-of-the-art comic generation systems.


2021 ◽  
Vol 11 (8) ◽  
pp. 3397
Author(s):  
Gustavo Assunção ◽  
Nuno Gonçalves ◽  
Paulo Menezes

Human beings have developed fantastic abilities to integrate information from various sensory sources exploring their inherent complementarity. Perceptual capabilities are therefore heightened, enabling, for instance, the well-known "cocktail party" and McGurk effects, i.e., speech disambiguation from a panoply of sound signals. This fusion ability is also key in refining the perception of sound source location, as in distinguishing whose voice is being heard in a group conversation. Furthermore, neuroscience has successfully identified the superior colliculus region in the brain as the one responsible for this modality fusion, with a handful of biological models having been proposed to approach its underlying neurophysiological process. Deriving inspiration from one of these models, this paper presents a methodology for effectively fusing correlated auditory and visual information for active speaker detection. Such an ability can have a wide range of applications, from teleconferencing systems to social robotics. The detection approach initially routes auditory and visual information through two specialized neural network structures. The resulting embeddings are fused via a novel layer based on the superior colliculus, whose topological structure emulates spatial neuron cross-mapping of unimodal perceptual fields. The validation process employed two publicly available datasets, with achieved results confirming and greatly surpassing initial expectations.


Author(s):  
Yingbo Ma ◽  
Joseph B. Wiggins ◽  
Mehmet Celepkolu ◽  
Kristy Elizabeth Boyer ◽  
Collin Lynch ◽  
...  

2021 ◽  
pp. 439-450
Author(s):  
Hugo Carneiro ◽  
Cornelius Weber ◽  
Stefan Wermter
Keyword(s):  

Author(s):  
Leibny Paola Garcia Perera ◽  
Jesus Villalba ◽  
Herve Bredin ◽  
Jun Du ◽  
Diego Castan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document