Sound Field Reconstruction in Rooms with Deep Generative Models

2021 ◽  
Vol 263 (5) ◽  
pp. 1527-1538
Author(s):  
Xenofon Karakonstantis ◽  
Efren Fernandez Grande

The characterization of Room Impulse Responses (RIR) over an extended region in a room by means of measurements requires dense spatial with many microphones. This can often become intractable and time consuming in practice. Well established reconstruction methods such as plane wave regression show that the sound field in a room can be reconstructed from sparsely distributed measurements. However, these reconstructions usually rely on assuming physical sparsity (i.e. few waves compose the sound field) or trait in the measured sound field, making the models less generalizable and problem specific. In this paper we introduce a method to reconstruct a sound field in an enclosure with the use of a Generative Adversarial Network (GAN), which s new variants of the data distributions that it is trained upon. The goal of the proposed GAN model is to estimate the underlying distribution of plane waves in any source free region, and map these distributions from a stochastic, latent representation. A GAN is trained on a large number of synthesized sound fields represented by a random wave field and then tested on both simulated and real data sets, of lightly damped and reverberant rooms.

2019 ◽  
Vol 11 (22) ◽  
pp. 2671
Author(s):  
Simon Leminen Madsen ◽  
Anders Krogh Mortensen ◽  
Rasmus Nyholm Jørgensen ◽  
Henrik Karstoft

Lack of annotated data for training of deep learning systems is a challenge for many visual recognition tasks. This is especially true for domain-specific applications, such as plant detection and recognition, where the annotation process can be both time-consuming and error-prone. Generative models can be used to alleviate this issue by producing artificial data that mimic properties of real data. This work presents a semi-supervised generative adversarial network (GAN) model to produce artificial samples of plant seedlings. By applying the semi-supervised approach, we are able to produce visually distinct samples for nine unique plant species using a single GAN model, while still maintaining a relatively high visual variance in the produced samples for each species. Additionally, we are able to control the appearance of the generated samples with respect to rotation and size through a set of latent variables, despite these not being annotated features in the training data. The generated samples resemble the intended species with an average recognition accuracy of ∼64.3%, evaluated using an external state-of-the-art plant seedling classification model. Additionally, we explore the potential of using the GAN model’s discriminator as a quality assessment tool to remove poor representations of plant seedlings from the artificial samples.


Author(s):  
Amey Thakur ◽  
Hasan Rizvi ◽  
Mega Satish

In the present study, we propose to implement a new framework for estimating generative models via an adversarial process to extend an existing GAN framework and develop a white-box controllable image cartoonization, which can generate high-quality cartooned images/videos from real-world photos and videos. The learning purposes of our system are based on three distinct representations: surface representation, structure representation, and texture representation. The surface representation refers to the smooth surface of the images. The structure representation relates to the sparse colour blocks and compresses generic content. The texture representation shows the texture, curves, and features in cartoon images. Generative Adversarial Network (GAN) framework decomposes the images into different representations and learns from them to generate cartoon images. This decomposition makes the framework more controllable and flexible which allows users to make changes based on the required output. This approach overcomes any previous system in terms of maintaining clarity, colours, textures, shapes of images yet showing the characteristics of cartoon images.


Author(s):  
Cara Murphy ◽  
John Kerekes

The classification of trace chemical residues through active spectroscopic sensing is challenging due to the lack of physics-based models that can accurately predict spectra. To overcome this challenge, we leveraged the field of domain adaptation to translate data from the simulated to the measured domain for training a classifier. We developed the first 1D conditional generative adversarial network (GAN) to perform spectrum-to-spectrum translation of reflectance signatures. We applied the 1D conditional GAN to a library of simulated spectra and quantified the improvement in classification accuracy on real data using the translated spectra for training the classifier. Using the GAN-translated library, the average classification accuracy increased from 0.622 to 0.723 on real chemical reflectance data, including data from chemicals not included in the GAN training set.


Author(s):  
Liang Yang ◽  
Yuexue Wang ◽  
Junhua Gu ◽  
Chuan Wang ◽  
Xiaochun Cao ◽  
...  

Motivated by the capability of Generative Adversarial Network on exploring the latent semantic space and capturing semantic variations in the data distribution, adversarial learning has been adopted in network embedding to improve the robustness. However, this important ability is lost in existing adversarially regularized network embedding methods, because their embedding results are directly compared to the samples drawn from perturbation (Gaussian) distribution without any rectification from real data. To overcome this vital issue, a novel Joint Adversarial Network Embedding (JANE) framework is proposed to jointly distinguish the real and fake combinations of the embeddings, topology information and node features. JANE contains three pluggable components, Embedding module, Generator module and Discriminator module. The overall objective function of JANE is defined in a min-max form, which can be optimized via alternating stochastic gradient. Extensive experiments demonstrate the remarkable superiority of the proposed JANE on link prediction (3% gains in both AUC and AP) and node clustering (5% gain in F1 score).


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Bin Huang ◽  
Jiaqi Lin ◽  
Jinming Liu ◽  
Jie Chen ◽  
Jiemin Zhang ◽  
...  

Separating printed or handwritten characters from a noisy background is valuable for many applications including test paper autoscoring. The complex structure of Chinese characters makes it difficult to obtain the goal because of easy loss of fine details and overall structure in reconstructed characters. This paper proposes a method for separating Chinese characters based on generative adversarial network (GAN). We used ESRGAN as the basic network structure and applied dilated convolution and a novel loss function that improve the quality of reconstructed characters. Four popular Chinese fonts (Hei, Song, Kai, and Imitation Song) on real data collection were tested, and the proposed design was compared with other semantic segmentation approaches. The experimental results showed that the proposed method effectively separates Chinese characters from noisy background. In particular, our methods achieve better results in terms of Intersection over Union (IoU) and optical character recognition (OCR) accuracy.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0253868
Author(s):  
Luca Rossi ◽  
Andrea Ajmar ◽  
Marina Paolanti ◽  
Roberto Pierdicca

Vehicles’ trajectory prediction is a topic with growing interest in recent years, as there are applications in several domains ranging from autonomous driving to traffic congestion prediction and urban planning. Predicting trajectories starting from Floating Car Data (FCD) is a complex task that comes with different challenges, namely Vehicle to Infrastructure (V2I) interaction, Vehicle to Vehicle (V2V) interaction, multimodality, and generalizability. These challenges, especially, have not been completely explored by state-of-the-art works. In particular, multimodality and generalizability have been neglected the most, and this work attempts to fill this gap by proposing and defining new datasets, metrics, and methods to help understand and predict vehicle trajectories. We propose and compare Deep Learning models based on Long Short-Term Memory and Generative Adversarial Network architectures; in particular, our GAN-3 model can be used to generate multiple predictions in multimodal scenarios. These approaches are evaluated with our newly proposed error metrics N-ADE and N-FDE, which normalize some biases in the standard Average Displacement Error (ADE) and Final Displacement Error (FDE) metrics. Experiments have been conducted using newly collected datasets in four large Italian cities (Rome, Milan, Naples, and Turin), considering different trajectory lengths to analyze error growth over a larger number of time-steps. The results prove that, although LSTM-based models are superior in unimodal scenarios, generative models perform best in those where the effects of multimodality are higher. Space-time and geographical analysis are performed, to prove the suitability of the proposed methodology for real cases and management services.


2021 ◽  
Vol 8 ◽  
Author(s):  
Rodrigo F. Cádiz ◽  
Agustín Macaya ◽  
Manuel Cartagena ◽  
Denis Parra

Deep learning, one of the fastest-growing branches of artificial intelligence, has become one of the most relevant research and development areas of the last years, especially since 2012, when a neural network surpassed the most advanced image classification techniques of the time. This spectacular development has not been alien to the world of the arts, as recent advances in generative networks have made possible the artificial creation of high-quality content such as images, movies or music. We believe that these novel generative models propose a great challenge to our current understanding of computational creativity. If a robot can now create music that an expert cannot distinguish from music composed by a human, or create novel musical entities that were not known at training time, or exhibit conceptual leaps, does it mean that the machine is then creative? We believe that the emergence of these generative models clearly signals that much more research needs to be done in this area. We would like to contribute to this debate with two case studies of our own: TimbreNet, a variational auto-encoder network trained to generate audio-based musical chords, and StyleGAN Pianorolls, a generative adversarial network capable of creating short musical excerpts, despite the fact that it was trained with images and not musical data. We discuss and assess these generative models in terms of their creativity and we show that they are in practice capable of learning musical concepts that are not obvious based on the training data, and we hypothesize that these deep models, based on our current understanding of creativity in robots and machines, can be considered, in fact, creative.


2021 ◽  
Vol 11 (19) ◽  
pp. 9065
Author(s):  
Myungjin Choi ◽  
Jee-Hyeok Park ◽  
Qimeng Zhang ◽  
Byeung-Sun Hong ◽  
Chang-Hun Kim

We propose a novel method for addressing the problem of efficiently generating a highly refined normal map for screen-space fluid rendering. Because the process of filtering the normal map is crucially important to ensure the quality of the final screen-space fluid rendering, we employ a conditional generative adversarial network (cGAN) as a filter that learns a deep normal map representation, thereby refining the low-quality normal map. In particular, we have designed a novel loss function dedicated to refining the normal map information, and we use a specific set of auxiliary features to train the cGAN generator to learn features that are more robust with respect to edge details. Additionally, we constructed a dataset of six different typical scenes to enable effective demonstrations of multitype fluid simulation. Experiments indicated that our generator was able to infer clearer and more detailed features for this dataset than a basic screen-space fluid rendering method. Moreover, in some cases, the results generated by our method were even smoother than those generated by the conventional surface reconstruction method. Our method improves the fluid rendering results via the high-quality normal map while preserving the advantages of the screen-space fluid rendering methods and the traditional surface reconstruction methods, including that of the computation time being independent of the number of simulation particles and the spatial resolution being related only to image resolution.


2020 ◽  
Vol 34 (07) ◽  
pp. 10745-10753
Author(s):  
Marzieh Edraki ◽  
Nazanin Rahnavard ◽  
Mubarak Shah

Convolutional neural networks (CNNs) have become a key asset to most of fields in AI. Despite their successful performance, CNNs suffer from a major drawback. They fail to capture the hierarchy of spatial relation among different parts of an entity. As a remedy to this problem, the idea of capsules was proposed by Hinton. In this paper, we propose the SubSpace Capsule Network (SCN) that exploits the idea of capsule networks to model possible variations in the appearance or implicitly-defined properties of an entity through a group of capsule subspaces instead of simply grouping neurons to create capsules. A capsule is created by projecting an input feature vector from a lower layer onto the capsule subspace using a learnable transformation. This transformation finds the degree of alignment of the input with the properties modeled by the capsule subspace.We show that SCN is a general capsule network that can successfully be applied to both discriminative and generative models without incurring computational overhead compared to CNN during test time. Effectiveness of SCN is evaluated through a comprehensive set of experiments on supervised image classification, semi-supervised image classification and high-resolution image generation tasks using the generative adversarial network (GAN) framework. SCN significantly improves the performance of the baseline models in all 3 tasks.


2020 ◽  
Vol 143 (3) ◽  
Author(s):  
Wei Chen ◽  
Faez Ahmed

Abstract Deep generative models are proven to be a useful tool for automatic design synthesis and design space exploration. When applied in engineering design, existing generative models face three challenges: (1) generated designs lack diversity and do not cover all areas of the design space, (2) it is difficult to explicitly improve the overall performance or quality of generated designs, and (3) existing models generally do not generate novel designs, outside the domain of the training data. In this article, we simultaneously address these challenges by proposing a new determinantal point process-based loss function for probabilistic modeling of diversity and quality. With this new loss function, we develop a variant of the generative adversarial network, named “performance augmented diverse generative adversarial network” (PaDGAN), which can generate novel high-quality designs with good coverage of the design space. By using three synthetic examples and one real-world airfoil design example, we demonstrate that PaDGAN can generate diverse and high-quality designs. In comparison to a vanilla generative adversarial network, on average, it generates samples with a 28% higher mean quality score with larger diversity and without the mode collapse issue. Unlike typical generative models that usually generate new designs by interpolating within the boundary of training data, we show that PaDGAN expands the design space boundary outside the training data towards high-quality regions. The proposed method is broadly applicable to many tasks including design space exploration, design optimization, and creative solution recommendation.


Sign in / Sign up

Export Citation Format

Share Document