scholarly journals Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding

2020 ◽  
Vol 26 (4) ◽  
pp. 434-453
Author(s):  
Milan Sečujski ◽  
Darko Pekar ◽  
Siniša Suzić ◽  
Anton Smirnov ◽  
Tijana Nosek

The paper presents a novel architecture and method for training neural networks to produce synthesized speech in a particular voice and speaking style, based on a small quantity of target speaker/style training data. The method is based on neural network embedding, i.e. mapping of discrete variables into continuous vectors in a low-dimensional space, which has been shown to be a very successful universal deep learning technique. In this particular case, different speaker/style combinations are mapped into different points in a low-dimensional space, which enables the network to capture the similarities and differences between speakers and speaking styles more efficiently. The initial model from which speaker/style adaptation was carried out was a multi-speaker/multi-style model based on 8.5 hours of American English speech data which corresponds to 16 different speaker/style combinations. The results of the experiments show that both versions of the obtained system, one using 10 minutes and the other as little as 30 seconds of target data, outperform the state of the art in parametric speaker/style-dependent speech synthesis. This opens a wide range of application of speaker/style dependent speech synthesis based on small quantities of training data, in domains ranging from customer interaction in call centers to robot-assisted medical therapy.

2008 ◽  
Vol 18 (03) ◽  
pp. 195-205 ◽  
Author(s):  
WEIBAO ZOU ◽  
ZHERU CHI ◽  
KING CHUEN LO

Image classification is a challenging problem in organizing a large image database. However, an effective method for such an objective is still under investigation. A method based on wavelet analysis to extract features for image classification is presented in this paper. After an image is decomposed by wavelet, the statistics of its features can be obtained by the distribution of histograms of wavelet coefficients, which are respectively projected onto two orthogonal axes, i.e., x and y directions. Therefore, the nodes of tree representation of images can be represented by the distribution. The high level features are described in low dimensional space including 16 attributes so that the computational complexity is significantly decreased. 2800 images derived from seven categories are used in experiments. Half of the images were used for training neural network and the other images used for testing. The features extracted by wavelet analysis and the conventional features are used in the experiments to prove the efficacy of the proposed method. The classification rate on the training data set with wavelet analysis is up to 91%, and the classification rate on the testing data set reaches 89%. Experimental results show that our proposed approach for image classification is more effective.


Author(s):  
Yang Fang ◽  
Xiang Zhao ◽  
Zhen Tan

Network Embedding (NE) is an important method to learn the representations of network via a low-dimensional space. Conventional NE models focus on capturing the structure information and semantic information of vertices while neglecting such information for edges. In this work, we propose a novel NE model named BimoNet to capture both the structure and semantic information of edges. BimoNet is composed of two parts, i.e., the bi-mode embedding part and the deep neural network part. For bi-mode embedding part, the first mode named add-mode is used to express the entity-shared features of edges and the second mode named subtract-mode is employed to represent the entity-specific features of edges. These features actually reflect the semantic information. For deep neural network part, we firstly regard the edges in a network as nodes, and the vertices as links, which will not change the overall structure of the whole network. Then we take the nodes' adjacent matrix as the input of the deep neural network as it can obtain similar representations for nodes with similar structure. Afterwards, by jointly optimizing the objective function of these two parts, BimoNet could preserve both the semantic and structure information of edges. In experiments, we evaluate BimoNet on three real-world datasets and task of relation extraction, and BimoNet is demonstrated to outperform state-of-the-art baseline models consistently and significantly.


Aviation ◽  
2013 ◽  
Vol 17 (2) ◽  
pp. 52-56 ◽  
Author(s):  
Mykola Kulyk ◽  
Sergiy Dmitriev ◽  
Oleksandr Yakushenko ◽  
Oleksandr Popov

A method of obtaining test and training data sets has been developed. These sets are intended for training a static neural network to recognise individual and double defects in the air-gas path units of a gas-turbine engine. These data are obtained by using operational process parameters of the air-gas path of a bypass turbofan engine. The method allows sets that can project some changes in the technical conditions of a gas-turbine engine to be received, taking into account errors that occur in the measurement of the gas-dynamic parameters of the air-gas path. The operation of the engine in a wide range of modes should also be taken into account.


Author(s):  
Bolin Chen ◽  
Yourui Han ◽  
Xuequn Shang ◽  
Shenggui Zhang

The identification of disease related genes plays essential roles in bioinformatics. To achieve this, many powerful machine learning methods have been proposed from various computational aspects, such as biological network analysis, classification, regression, deep learning, etc. Among them, deep learning based methods have gained big success in identifying disease related genes in terms of higher accuracy and efficiency. However, these methods rarely handle the following two issues very well, which are (1) the multifunctions of many genes; and (2) the scale-free property of biological networks. To overcome these, we propose a novel network representation method to transfer individual vertices together with their surrounding topological structures into image-like datasets. It takes each node-induced sub-network as a represented candidate, and adds its environmental characteristics to generate a low-dimensional space as its representation. This image-like datasets can be applied directly in a Convolutional Neural Network-based method for identifying cancer-related genes. The numerical experiments show that the proposed method can achieve the AUC value at 0.9256 in a single network and at 0.9452 in multiple networks, which outperforms many existing methods.


2020 ◽  
Author(s):  
Yarden Cohen ◽  
David Nicholson ◽  
Alexa Sanchioni ◽  
Emily K. Mallaber ◽  
Viktoriya Skidanova ◽  
...  

AbstractSongbirds have long been studied as a model system of sensory-motor learning. Many analyses of birdsong require time-consuming manual annotation of the individual elements of song, known as syllables or notes. Here we describe the first automated algorithm for birdsong annotation that is applicable to complex song such as canary song. We developed a neural network architecture, “TweetyNet”, that is trained with a small amount of hand-labeled data using supervised learning methods. We first show TweetyNet achieves significantly lower error on Bengalese finch song than a similar method, using less training data, and maintains low error rates across days. Applied to canary song, TweetyNet achieves fully automated annotation of canary song, accurately capturing the complex statistical structure previously discovered in a manually annotated dataset. We conclude that TweetyNet will make it possible to ask a wide range of new questions focused on complex songs where manual annotation was impractical.


Author(s):  
Felix Jimenez ◽  
Amanda Koepke ◽  
Mary Gregg ◽  
Michael Frey

A generative adversarial network (GAN) is an artificial neural network with a distinctive training architecture, designed to createexamples that faithfully reproduce a target distribution. GANs have recently had particular success in applications involvinghigh-dimensional distributions in areas such as image processing. Little work has been reported for low dimensions, where properties of GANs may be better identified and understood. We studied GAN performance in simulated low-dimensional settings, allowing us totransparently assess effects of target distribution complexity and training data sample size on GAN performance in a simpleexperiment. This experiment revealed two important forms of GAN error, tail underfilling and bridge bias, where the latter is analogousto the tunneling observed in high-dimensional GANs.


Author(s):  
Ulas Isildak ◽  
Alessandro Stella ◽  
Matteo Fumagalli

1AbstractBalancing selection is an important adaptive mechanism underpinning a wide range of phenotypes. Despite its relevance, the detection of recent balancing selection from genomic data is challenging as its signatures are qualitatively similar to those left by ongoing positive selection. In this study we developed and implemented two deep neural networks and tested their performance to predict loci under recent selection, either due to balancing selection or incomplete sweep, from population genomic data. Specifically, we generated forward-intime simulations to train and test an artificial neural network (ANN) and a convolutional neural network (CNN). ANN received as input multiple summary statistics calculated on the locus of interest, while CNN was applied directly on the matrix of haplotypes. We found that both architectures have high accuracy to identify loci under recent selection. CNN generally outperformed ANN to distinguish between signals of balancing selection and incomplete sweep and was less affected by incorrect training data. We deployed both trained networks on neutral genomic regions in European populations and demonstrated a lower false positive rate for CNN than ANN. We finally deployed CNN within the MEFV gene region and identified several common variants predicted to be under incomplete sweep in a European population. Notably, two of these variants are functional changes and could modulate susceptibility to Familial Mediterranean Fever, possibly as a consequence of past adaptation to pathogens. In conclusion, deep neural networks were able to characterise signals of selection on intermediate-frequency variants, an analysis currently inaccessible by commonly used strategies.


2020 ◽  
Vol 11 (1) ◽  
pp. 162
Author(s):  
Masoud Reyhani Hamedani ◽  
Sang-Wook Kim

One of the important tasks in a graph is to compute the similarity between two nodes; link-based similarity measures (in short, similarity measures) are well-known and conventional techniques for this task that exploit the relations between nodes (i.e., links) in the graph. Graph embedding methods (in short, embedding methods) convert nodes in a graph into vectors in a low-dimensional space by preserving social relations among nodes in the original graph. Instead of applying a similarity measure to the graph to compute the similarity between nodes a and b, we can consider the proximity between corresponding vectors of a and b obtained by an embedding method as the similarity between a and b. Although embedding methods have been analyzed in a wide range of machine learning tasks such as link prediction and node classification, they are not investigated in terms of similarity computation of nodes. In this paper, we investigate both effectiveness and efficiency of embedding methods in the task of similarity computation of nodes by comparing them with those of similarity measures. To the best of our knowledge, this is the first work that examines the application of embedding methods in this special task. Based on the results of our extensive experiments with five well-known and publicly available datasets, we found the following observations for embedding methods: (1) with all datasets, they show less effectiveness than similarity measures except for one dataset, (2) they underperform similarity measures with all datasets in terms of efficiency except for one dataset, (3) they have more parameters than similarity measures, thereby leading to a time-consuming parameter tuning process, (4) increasing the number of dimensions does not necessarily improve their effectiveness in computing the similarity of nodes.


2021 ◽  
Vol 2 (1) ◽  
Author(s):  
Thomas Martynec ◽  
Christos Karapanagiotis ◽  
Sabine H. L. Klapp ◽  
Stefan Kowarik

AbstractMachine learning is playing an increasing role in the discovery of new materials and may also facilitate the search for optimum growth conditions for crystals and thin films. Here, we perform kinetic Monte-Carlo simulations of sub-monolayer growth. We consider a generic homoepitaxial growth scenario that covers a wide range of conditions with different diffusion barriers (0.4–0.55 eV) and lateral binding energies (0.1–0.4 eV). These simulations are used as a training data set for a convolutional neural network that can predict diffusion barriers and binding energies. Specifically, a single Monte-Carlo image of the morphology is sufficient to determine the energy barriers with an accuracy of approximately 10 meV and the neural network is tolerant to images with noise and lower than atomic-scale resolution. We believe this new machine learning method will be useful for fundamental studies of growth kinetics and growth optimization through better knowledge of microscopic parameters.


Fluids ◽  
2019 ◽  
Vol 4 (2) ◽  
pp. 109 ◽  
Author(s):  
Balaji Jayaraman ◽  
S M Abdullah Al Mamun ◽  
Chen Lu

Sparse linear estimation of fluid flows using data-driven proper orthogonal decomposition (POD) basis is systematically explored in this work. Fluid flows are manifestations of nonlinear multiscale partial differential equations (PDE) dynamical systems with inherent scale separation that impact the system dimensionality. Given that sparse reconstruction is inherently an ill-posed problem, the most successful approaches require the knowledge of the underlying low-dimensional space spanning the manifold in which the system resides. In this paper, we adopt an approach that learns basis from singular value decomposition (SVD) of training data to recover sparse information. This results in a set of four design parameters for sparse recovery, namely, the choice of basis, system dimension required for sufficiently accurate reconstruction, sensor budget and their placement. The choice of design parameters implicitly determines the choice of algorithm as either l 2 minimization reconstruction or sparsity promoting l 1 minimization reconstruction. In this work, we systematically explore the implications of these design parameters on reconstruction accuracy so that practical recommendations can be identified. We observe that greedy-smart sensor placement, particularly interpolation points from the discrete empirical interpolation method (DEIM), provide the best balance of computational complexity and accurate reconstruction.


Sign in / Sign up

Export Citation Format

Share Document