scholarly journals Adding Crowd Noise to Sports Commentary using Generative Models

2021 ◽  
Author(s):  
Neil Shah ◽  
Dharmeshkumar M. Agrawal ◽  
Niranajan Pedanekar

Crowd noise forms an integral part of a live sports experience. In the post-COVID era, when live audiences are absent, crowd noise needs to be added to the live commentary. This paper exploits the correlation between commentary and crowd noise of a live sports event and presents an audio stylizing sports commentary method by generating live stadium-like sound using neural generative models. We use the Generative Adversarial Network (GAN)-based architectures such as Cycle-consistent GANs (Cycle-GANs) and Mel-GANs to generate live stadium-like sound samples given the live commentary. Due to the unavailability of raw commentary sound samples, we use end-to-end time-domain source separation models (SEGAN and Wave-U-Net) to extract commentary sound from combined recordings of the live sound acquired from YouTube highlights of soccer videos. We present a qualitative and a subjective user evaluation of the similarity of the generated live sound with the reference live sound.

Author(s):  
Amey Thakur ◽  
Hasan Rizvi ◽  
Mega Satish

In the present study, we propose to implement a new framework for estimating generative models via an adversarial process to extend an existing GAN framework and develop a white-box controllable image cartoonization, which can generate high-quality cartooned images/videos from real-world photos and videos. The learning purposes of our system are based on three distinct representations: surface representation, structure representation, and texture representation. The surface representation refers to the smooth surface of the images. The structure representation relates to the sparse colour blocks and compresses generic content. The texture representation shows the texture, curves, and features in cartoon images. Generative Adversarial Network (GAN) framework decomposes the images into different representations and learns from them to generate cartoon images. This decomposition makes the framework more controllable and flexible which allows users to make changes based on the required output. This approach overcomes any previous system in terms of maintaining clarity, colours, textures, shapes of images yet showing the characteristics of cartoon images.


2021 ◽  
Vol 263 (5) ◽  
pp. 1527-1538
Author(s):  
Xenofon Karakonstantis ◽  
Efren Fernandez Grande

The characterization of Room Impulse Responses (RIR) over an extended region in a room by means of measurements requires dense spatial with many microphones. This can often become intractable and time consuming in practice. Well established reconstruction methods such as plane wave regression show that the sound field in a room can be reconstructed from sparsely distributed measurements. However, these reconstructions usually rely on assuming physical sparsity (i.e. few waves compose the sound field) or trait in the measured sound field, making the models less generalizable and problem specific. In this paper we introduce a method to reconstruct a sound field in an enclosure with the use of a Generative Adversarial Network (GAN), which s new variants of the data distributions that it is trained upon. The goal of the proposed GAN model is to estimate the underlying distribution of plane waves in any source free region, and map these distributions from a stochastic, latent representation. A GAN is trained on a large number of synthesized sound fields represented by a random wave field and then tested on both simulated and real data sets, of lightly damped and reverberant rooms.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0253868
Author(s):  
Luca Rossi ◽  
Andrea Ajmar ◽  
Marina Paolanti ◽  
Roberto Pierdicca

Vehicles’ trajectory prediction is a topic with growing interest in recent years, as there are applications in several domains ranging from autonomous driving to traffic congestion prediction and urban planning. Predicting trajectories starting from Floating Car Data (FCD) is a complex task that comes with different challenges, namely Vehicle to Infrastructure (V2I) interaction, Vehicle to Vehicle (V2V) interaction, multimodality, and generalizability. These challenges, especially, have not been completely explored by state-of-the-art works. In particular, multimodality and generalizability have been neglected the most, and this work attempts to fill this gap by proposing and defining new datasets, metrics, and methods to help understand and predict vehicle trajectories. We propose and compare Deep Learning models based on Long Short-Term Memory and Generative Adversarial Network architectures; in particular, our GAN-3 model can be used to generate multiple predictions in multimodal scenarios. These approaches are evaluated with our newly proposed error metrics N-ADE and N-FDE, which normalize some biases in the standard Average Displacement Error (ADE) and Final Displacement Error (FDE) metrics. Experiments have been conducted using newly collected datasets in four large Italian cities (Rome, Milan, Naples, and Turin), considering different trajectory lengths to analyze error growth over a larger number of time-steps. The results prove that, although LSTM-based models are superior in unimodal scenarios, generative models perform best in those where the effects of multimodality are higher. Space-time and geographical analysis are performed, to prove the suitability of the proposed methodology for real cases and management services.


2021 ◽  
Vol 8 ◽  
Author(s):  
Rodrigo F. Cádiz ◽  
Agustín Macaya ◽  
Manuel Cartagena ◽  
Denis Parra

Deep learning, one of the fastest-growing branches of artificial intelligence, has become one of the most relevant research and development areas of the last years, especially since 2012, when a neural network surpassed the most advanced image classification techniques of the time. This spectacular development has not been alien to the world of the arts, as recent advances in generative networks have made possible the artificial creation of high-quality content such as images, movies or music. We believe that these novel generative models propose a great challenge to our current understanding of computational creativity. If a robot can now create music that an expert cannot distinguish from music composed by a human, or create novel musical entities that were not known at training time, or exhibit conceptual leaps, does it mean that the machine is then creative? We believe that the emergence of these generative models clearly signals that much more research needs to be done in this area. We would like to contribute to this debate with two case studies of our own: TimbreNet, a variational auto-encoder network trained to generate audio-based musical chords, and StyleGAN Pianorolls, a generative adversarial network capable of creating short musical excerpts, despite the fact that it was trained with images and not musical data. We discuss and assess these generative models in terms of their creativity and we show that they are in practice capable of learning musical concepts that are not obvious based on the training data, and we hypothesize that these deep models, based on our current understanding of creativity in robots and machines, can be considered, in fact, creative.


2020 ◽  
Vol 34 (07) ◽  
pp. 10745-10753
Author(s):  
Marzieh Edraki ◽  
Nazanin Rahnavard ◽  
Mubarak Shah

Convolutional neural networks (CNNs) have become a key asset to most of fields in AI. Despite their successful performance, CNNs suffer from a major drawback. They fail to capture the hierarchy of spatial relation among different parts of an entity. As a remedy to this problem, the idea of capsules was proposed by Hinton. In this paper, we propose the SubSpace Capsule Network (SCN) that exploits the idea of capsule networks to model possible variations in the appearance or implicitly-defined properties of an entity through a group of capsule subspaces instead of simply grouping neurons to create capsules. A capsule is created by projecting an input feature vector from a lower layer onto the capsule subspace using a learnable transformation. This transformation finds the degree of alignment of the input with the properties modeled by the capsule subspace.We show that SCN is a general capsule network that can successfully be applied to both discriminative and generative models without incurring computational overhead compared to CNN during test time. Effectiveness of SCN is evaluated through a comprehensive set of experiments on supervised image classification, semi-supervised image classification and high-resolution image generation tasks using the generative adversarial network (GAN) framework. SCN significantly improves the performance of the baseline models in all 3 tasks.


Author(s):  
Shuaitao Zhang ◽  
Yuliang Liu ◽  
Lianwen Jin ◽  
Yaoxiong Huang ◽  
Songxuan Lai

A new method is proposed for removing text from natural images. The challenge is to first accurately localize text on the stroke-level and then replace it with a visually plausible background. Unlike previous methods that require image patches to erase scene text, our method, namely ensconce network (EnsNet), can operate end-to-end on a single image without any prior knowledge. The overall structure is an end-to-end trainable FCN-ResNet-18 network with a conditional generative adversarial network (cGAN). The feature of the former is first enhanced by a novel lateral connection structure and then refined by four carefully designed losses: multiscale regression loss and content loss, which capture the global discrepancy of different level features; texture loss and total variation loss, which primarily target filling the text region and preserving the reality of the background. The latter is a novel local-sensitive GAN, which attentively assesses the local consistency of the text erased regions. Both qualitative and quantitative sensitivity experiments on synthetic images and the ICDAR 2013 dataset demonstrate that each component of the EnsNet is essential to achieve a good performance. Moreover, our EnsNet can significantly outperform previous state-of-the-art methods in terms of all metrics. In addition, a qualitative experiment conducted on the SBMNet dataset further demonstrates that the proposed method can also preform well on general object (such as pedestrians) removal tasks. EnsNet is extremely fast, which can preform at 333 fps on an i5-8600 CPU device.


2020 ◽  
Vol 143 (3) ◽  
Author(s):  
Wei Chen ◽  
Faez Ahmed

Abstract Deep generative models are proven to be a useful tool for automatic design synthesis and design space exploration. When applied in engineering design, existing generative models face three challenges: (1) generated designs lack diversity and do not cover all areas of the design space, (2) it is difficult to explicitly improve the overall performance or quality of generated designs, and (3) existing models generally do not generate novel designs, outside the domain of the training data. In this article, we simultaneously address these challenges by proposing a new determinantal point process-based loss function for probabilistic modeling of diversity and quality. With this new loss function, we develop a variant of the generative adversarial network, named “performance augmented diverse generative adversarial network” (PaDGAN), which can generate novel high-quality designs with good coverage of the design space. By using three synthetic examples and one real-world airfoil design example, we demonstrate that PaDGAN can generate diverse and high-quality designs. In comparison to a vanilla generative adversarial network, on average, it generates samples with a 28% higher mean quality score with larger diversity and without the mode collapse issue. Unlike typical generative models that usually generate new designs by interpolating within the boundary of training data, we show that PaDGAN expands the design space boundary outside the training data towards high-quality regions. The proposed method is broadly applicable to many tasks including design space exploration, design optimization, and creative solution recommendation.


Sign in / Sign up

Export Citation Format

Share Document