scholarly journals Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning

2021 ◽  
Vol 11 (16) ◽  
pp. 7489
Author(s):  
Mohammed Salah Al-Radhi ◽  
Tamás Gábor Csapó ◽  
Géza Németh

Voice conversion (VC) transforms the speaking style of a source speaker to the speaking style of a target speaker by keeping linguistic information unchanged. Traditional VC techniques rely on parallel recordings of multiple speakers uttering the same sentences. Earlier approaches mainly find a mapping between the given source–target speakers, which contain pairs of similar utterances spoken by different speakers. However, parallel data are computationally expensive and difficult to collect. Non-parallel VC remains an interesting but challenging speech processing task. To address this limitation, we propose a method that allows a non-parallel many-to-many voice conversion by using a generative adversarial network. To the best of the authors’ knowledge, our study is the first one that employs a sinusoidal model with continuous parameters to generate converted speech signals. Our method involves only several minutes of training examples without parallel utterances or time alignment procedures, where the source–target speakers are entirely unseen by the training dataset. Moreover, empirical study is carried out on the publicly available CSTR VCTK corpus. Our conclusions indicate that the proposed method reached the state-of-the-art results in speaker similarity to the utterance produced by the target speaker, while suggesting important structural ones to be further analyzed by experts.

Author(s):  
Xinyi Li ◽  
Liqiong Chang ◽  
Fangfang Song ◽  
Ju Wang ◽  
Xiaojiang Chen ◽  
...  

This paper focuses on a fundamental question in Wi-Fi-based gesture recognition: "Can we use the knowledge learned from some users to perform gesture recognition for others?". This problem is also known as cross-target recognition. It arises in many practical deployments of Wi-Fi-based gesture recognition where it is prohibitively expensive to collect training data from every single user. We present CrossGR, a low-cost cross-target gesture recognition system. As a departure from existing approaches, CrossGR does not require prior knowledge (such as who is currently performing a gesture) of the target user. Instead, CrossGR employs a deep neural network to extract user-agnostic but gesture-related Wi-Fi signal characteristics to perform gesture recognition. To provide sufficient training data to build an effective deep learning model, CrossGR employs a generative adversarial network to automatically generate many synthetic training data from a small set of real-world examples collected from a small number of users. Such a strategy allows CrossGR to minimize the user involvement and the associated cost in collecting training examples for building an accurate gesture recognition system. We evaluate CrossGR by applying it to perform gesture recognition across 10 users and 15 gestures. Experimental results show that CrossGR achieves an accuracy of over 82.6% (up to 99.75%). We demonstrate that CrossGR delivers comparable recognition accuracy, but uses an order of magnitude less training samples collected from the end-users when compared to state-of-the-art recognition systems.


2020 ◽  
Vol 11 ◽  
Author(s):  
Luning Bi ◽  
Guiping Hu

Traditionally, plant disease recognition has mainly been done visually by human. It is often biased, time-consuming, and laborious. Machine learning methods based on plant leave images have been proposed to improve the disease recognition process. Convolutional neural networks (CNNs) have been adopted and proven to be very effective. Despite the good classification accuracy achieved by CNNs, the issue of limited training data remains. In most cases, the training dataset is often small due to significant effort in data collection and annotation. In this case, CNN methods tend to have the overfitting problem. In this paper, Wasserstein generative adversarial network with gradient penalty (WGAN-GP) is combined with label smoothing regularization (LSR) to improve the prediction accuracy and address the overfitting problem under limited training data. Experiments show that the proposed WGAN-GP enhanced classification method can improve the overall classification accuracy of plant diseases by 24.4% as compared to 20.2% using classic data augmentation and 22% using synthetic samples without LSR.


Diagnostics ◽  
2021 ◽  
Vol 11 (9) ◽  
pp. 1542
Author(s):  
Johannes Haubold ◽  
Aydin Demircioglu ◽  
Jens Matthias Theysohn ◽  
Axel Wetter ◽  
Alexander Radbruch ◽  
...  

Short tau inversion recovery (STIR) sequences are frequently used in magnetic resonance imaging (MRI) of the spine. However, STIR sequences require a significant amount of scanning time. The purpose of the present study was to generate virtual STIR (vSTIR) images from non-contrast, non-fat-suppressed T1- and T2-weighted images using a conditional generative adversarial network (cGAN). The training dataset comprised 612 studies from 514 patients, and the validation dataset comprised 141 studies from 133 patients. For validation, 100 original STIR and respective vSTIR series were presented to six senior radiologists (blinded for the STIR type) in independent A/B-testing sessions. Additionally, for 141 real or vSTIR sequences, the testers were required to produce a structured report of 15 different findings. In the A/B-test, most testers could not reliably identify the real STIR (mean error of tester 1–6: 41%; 44%; 58%; 48%; 39%; 45%). In the evaluation of the structured reports, vSTIR was equivalent to real STIR in 13 of 15 categories. In the category of the number of STIR hyperintense vertebral bodies (p = 0.08) and in the diagnosis of bone metastases (p = 0.055), the vSTIR was only slightly insignificantly equivalent. By virtually generating STIR images of diagnostic quality from T1- and T2-weighted images using a cGAN, one can shorten examination times and increase throughput.


Author(s):  
Arash Shilandari ◽  
Hossein Marvi ◽  
Hossein Khosravi

Nowadays, and with the mechanization of life, speech processing has become so crucial for the interaction between humans and machines. Deep neural networks require a database with enough data for training. The more features are extracted from the speech signal, the more samples are needed to train these networks. Adequate training of these networks can be ensured when there is access to sufficient and varied data in each class. If there is not enough data; it is possible to use data augmentation methods to obtain a database with enough samples. One of the obstacles to developing speech emotion recognition systems is the Data sparsity problem in each class for neural network training. The current study has focused on making a cycle generative adversarial network for data augmentation in a system for speech emotion recognition. For each of the five emotions employed, an adversarial generating network is designed to generate data that is very similar to the main data in that class, as well as differentiate the emotions of the other classes. These networks are taught in an adversarial way to produce feature vectors like each class in the space of the main feature, and then they add to the training sets existing in the database to train the classifier network. Instead of using the common cross-entropy error to train generative adversarial networks and to remove the vanishing gradient problem, Wasserstein Divergence has been used to produce high-quality artificial samples. The suggested network has been tested to be applied for speech emotion recognition using EMODB as training, testing, and evaluating sets, and the quality of artificial data evaluated using two Support Vector Machine (SVM) and Deep Neural Network (DNN) classifiers. Moreover, it has been revealed that extracting and reproducing high-level features from acoustic features, speech emotion recognition with separating five primary emotions has been done with acceptable accuracy.


2021 ◽  
Vol 59 (11) ◽  
pp. 838-847
Author(s):  
In-Kyu Hwang ◽  
Hyun-Ji Lee ◽  
Sang-Jun Jeong ◽  
In-Sung Cho ◽  
Hee-Soo Kim

In this study, we constructed a deep convolutional generative adversarial network (DCGAN) to generate the microstructural images that imitate the real microstructures of binary Al-Si cast alloys. We prepared four combinations of alloys, Al-6wt%Si, Al-9wt%Si, Al-12wt%Si and Al-15wt%Si for machine learning. DCGAN is composed of a generator and a discriminator. The discriminator has a typical convolutional neural network (CNN), and the generator has an inverse shaped CNN. The fake images generated using DCGAN were similar to real microstructural images. However, they showed some strange morphology, including dendrites without directionality, and deformed Si crystals. Verification with Inception V3 revealed that the fake images generated using DCGAN were well classified into the target categories. Even the visually imperfect images in the initial training iterations showed high similarity to the target. It seems that the imperfect images had enough microstructural characteristics to satisfy the classification, even though human cannot recognize the images. Cross validation was carried out using real, fake and other test images. When the training dataset had the fake images only, the real and test images showed high similarities to the target categories. When the training dataset contained both the real and fake images, the similarity at the target categories were high enough to meet the right answers. We concluded that the DCGAN developed for microstructural images in this study is highly useful for data augmentation for rare microstructures.


2020 ◽  
Vol 22 (1) ◽  
Author(s):  
Saeed Karimi-Bidhendi ◽  
Arghavan Arafati ◽  
Andrew L. Cheng ◽  
Yilei Wu ◽  
Arash Kheradvar ◽  
...  

Abstract Background For the growing patient population with congenital heart disease (CHD), improving clinical workflow, accuracy of diagnosis, and efficiency of analyses are considered unmet clinical needs. Cardiovascular magnetic resonance (CMR) imaging offers non-invasive and non-ionizing assessment of CHD patients. However, although CMR data facilitates reliable analysis of cardiac function and anatomy, clinical workflow mostly relies on manual analysis of CMR images, which is time consuming. Thus, an automated and accurate segmentation platform exclusively dedicated to pediatric CMR images can significantly improve the clinical workflow, as the present work aims to establish. Methods Training artificial intelligence (AI) algorithms for CMR analysis requires large annotated datasets, which are not readily available for pediatric subjects and particularly in CHD patients. To mitigate this issue, we devised a novel method that uses a generative adversarial network (GAN) to synthetically augment the training dataset via generating synthetic CMR images and their corresponding chamber segmentations. In addition, we trained and validated a deep fully convolutional network (FCN) on a dataset, consisting of $$64$$ 64 pediatric subjects with complex CHD, which we made publicly available. Dice metric, Jaccard index and Hausdorff distance as well as clinically-relevant volumetric indices are reported to assess and compare our platform with other algorithms including U-Net and cvi42, which is used in clinics. Results For congenital CMR dataset, our FCN model yields an average Dice metric of $$91.0\mathrm{\%}$$ 91.0 % and $$86.8\mathrm{\%}$$ 86.8 % for LV at end-diastole and end-systole, respectively, and $$84.7\mathrm{\%}$$ 84.7 % and $$80.6\mathrm{\%}$$ 80.6 % for RV at end-diastole and end-systole, respectively. Using the same dataset, the cvi42, resulted in $$73.2\mathrm{\%}$$ 73.2 % , $$71.0\mathrm{\%}$$ 71.0 % , $$54.3\mathrm{\%}$$ 54.3 % and $$53.7\mathrm{\%}$$ 53.7 % for LV and RV at end-diastole and end-systole, and the U-Net architecture resulted in $$87.4\mathrm{\%}$$ 87.4 % , $$83.9\mathrm{\%}$$ 83.9 % , $$81.8\mathrm{\%}$$ 81.8 % and $$74.8\mathrm{\%}$$ 74.8 % for LV and RV at end-diastole and end-systole, respectively. Conclusions The chambers’ segmentation results from our fully-automated method showed strong agreement with manual segmentation and no significant statistical difference was found by two independent statistical analyses. Whereas cvi42 and U-Net segmentation results failed to pass the t-test. Relying on these outcomes, it can be inferred that by taking advantage of GANs, our method is clinically relevant and can be used for pediatric and congenital CMR segmentation and analysis.


Sign in / Sign up

Export Citation Format

Share Document