scholarly journals Analysis and Assessment of Controllability of an Expressive Deep Learning-Based TTS System

Informatics ◽  
2021 ◽  
Vol 8 (4) ◽  
pp. 84
Author(s):  
Noé Tits ◽  
Kevin El Haddad ◽  
Thierry Dutoit

In this paper, we study the controllability of an Expressive TTS system trained on a dataset for a continuous control. The dataset is the Blizzard 2013 dataset based on audiobooks read by a female speaker containing a great variability in styles and expressiveness. Controllability is evaluated with both an objective and a subjective experiment. The objective assessment is based on a measure of correlation between acoustic features and the dimensions of the latent space representing expressiveness. The subjective assessment is based on a perceptual experiment in which users are shown an interface for Controllable Expressive TTS and asked to retrieve a synthetic utterance whose expressiveness subjectively corresponds to that a reference utterance.

2021 ◽  
Vol 5 (1) ◽  
pp. 76
Author(s):  
Cahyo Adhi Hartanto ◽  
Laksmita Rahadianti

Many real-world situations such as bad weather may result in hazy environments. Images captured in these hazy conditions will have low image quality due to microparticles in the air. The microparticles light to scatter and absorb, resulting in hazy images with various effects. In recent years, image dehazing has been researched in depth to handle images captured in these conditions. Various methods were developed, from traditional methods to deep learning methods. Traditional methods focus more on the use of statistical prior. These statistical prior have weaknesses in certain conditions. This paper proposes a novel architecture based on PDR-Net by using a pyramid dilated convolution and pre-processing modules, processing modules, post-processing modules, and attention applications. The proposed network is trained to minimize L1 loss and perceptual loss with the O-Haze dataset. To evaluate our architecture's result, we used structural similarity index measure (SSIM), peak signal-to-noise ratio (PSNR), and color difference as an objective assessment and psychovisual experiment as a subjective assessment. Our architecture obtained better results than the previous method using the O-Haze dataset with an SSIM of 0.798, a PSNR of 25.39, but not better on the color difference. The SSIM and PSNR results were strengthened by using subjective assessments and 65 respondents, most of whom chose the results of the restoration of the image produced by our architecture.


2020 ◽  
Vol 2020 (1) ◽  
Author(s):  
Xinyi Ding ◽  
Zohreh Raziei ◽  
Eric C. Larson ◽  
Eli V. Olinick ◽  
Paul Krueger ◽  
...  

BioChem ◽  
2021 ◽  
Vol 1 (1) ◽  
pp. 36-48
Author(s):  
Ivan Jacobs ◽  
Manolis Maragoudakis

Computer-assisted de novo design of natural product mimetics offers a viable strategy to reduce synthetic efforts and obtain natural-product-inspired bioactive small molecules, but suffers from several limitations. Deep learning techniques can help address these shortcomings. We propose the generation of synthetic molecule structures that optimizes the binding affinity to a target. To achieve this, we leverage important advancements in deep learning. Our approach generalizes to systems beyond the source system and achieves the generation of complete structures that optimize the binding to a target unseen during training. Translating the input sub-systems into the latent space permits the ability to search for similar structures, and the sampling from the latent space for generation.


Cancers ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 702
Author(s):  
Nalee Kim ◽  
Jaehee Chun ◽  
Jee Suk Chang ◽  
Chang Geol Lee ◽  
Ki Chang Keum ◽  
...  

This study investigated the feasibility of deep learning-based segmentation (DLS) and continual training for adaptive radiotherapy (RT) of head and neck (H&N) cancer. One-hundred patients treated with definitive RT were included. Based on 23 organs-at-risk (OARs) manually segmented in initial planning computed tomography (CT), modified FC-DenseNet was trained for DLS: (i) using data obtained from 60 patients, with 20 matched patients in the test set (DLSm); (ii) using data obtained from 60 identical patients with 20 unmatched patients in the test set (DLSu). Manually contoured OARs in adaptive planning CT for independent 20 patients were provided as test sets. Deformable image registration (DIR) was also performed. All 23 OARs were compared using quantitative measurements, and nine OARs were also evaluated via subjective assessment from 26 observers using the Turing test. DLSm achieved better performance than both DLSu and DIR (mean Dice similarity coefficient; 0.83 vs. 0.80 vs. 0.70), mainly for glandular structures, whose volume significantly reduced during RT. Based on subjective measurements, DLS is often perceived as a human (49.2%). Furthermore, DLSm is preferred over DLSu (67.2%) and DIR (96.7%), with a similar rate of required revision to that of manual segmentation (28.0% vs. 29.7%). In conclusion, DLS was effective and preferred over DIR. Additionally, continual DLS training is required for an effective optimization and robustness in personalized adaptive RT.


Algorithms ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 39
Author(s):  
Carlos Lassance ◽  
Vincent Gripon ◽  
Antonio Ortega

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.


1998 ◽  
Vol 112 (10) ◽  
pp. 934-939 ◽  
Author(s):  
Dipak Ranjan Nayak ◽  
R. Balakrishnan ◽  
K. Deepak Murthy

AbstractThe authors have used the nasal endoscope for the precise identification of pathological abnormalities of the nasal septum in relation to the lateral nasal wall including the osteo-meatal complex and in its ultraconservative management. The aim of the study was to compare the efficacies of endoscope-aided septoplasty (EAS) over traditional septoplasty (TS) in treating the pathological septum and turbinates, performed in 30 cases each. The subjective assessment was carried out by visual analogue scores and objective assessment by nasal endoscopy. This study demonstrates the superiority and limitations of the endoscopic approach in managing a deviated nasal septum and the turbinates. The endoscope-aided technique was found to be more effective in relieving the contact areas and nasal obstruction (p = ≤0.05). The authors advocate a combined approach – an endoscopic approach for inaccessible posterior deviation and the conservative traditional technique for accessible anterior deviation of the nasal septum.


2013 ◽  
Vol 411-414 ◽  
pp. 1362-1367 ◽  
Author(s):  
Qing Lan Wei ◽  
Yuan Zhang

This paper presents the thoughts about application of saliency map to the video objective quality evaluation system. It computes the SMSE and SPSNR values as the objective assessment scores according to the saliency map, and compares with conditional objective evaluation methods as PSNR and MSE. Experimental results demonstrate that this method can well fit the subjective assessment results.


Sign in / Sign up

Export Citation Format

Share Document