scholarly journals Noise-Robust Voice Conversion Using High-Quefrency Boosting via Sub-Band Cepstrum Conversion and Fusion

2019 ◽  
Vol 10 (1) ◽  
pp. 151
Author(s):  
Xiaokong Miao ◽  
Meng Sun ◽  
Xiongwei Zhang ◽  
Yimin Wang

This paper presents a noise-robust voice conversion method with high-quefrency boosting via sub-band cepstrum conversion and fusion based on the bidirectional long short-term memory (BLSTM) neural networks that can convert parameters of vocal tracks of a source speaker into those of a target speaker. With the implementation of state-of-the-art machine learning methods, voice conversion has achieved good performance given abundant clean training data. However, the quality and similarity of the converted voice are significantly degraded compared to that of a natural target voice due to various factors, such as limited training data and noisy input speech from the source speaker. To address the problem of noisy input speech, an architecture of voice conversion with statistical filtering and sub-band cepstrum conversion and fusion is introduced. The impact of noises on the converted voice is reduced by the accurate reconstruction of the sub-band cepstrum and the subsequent statistical filtering. By normalizing the mean and variance of the converted cepstrum to those of the target cepstrum in the training phase, a cepstrum filter was constructed to further improve the quality of the converted voice. The experimental results showed that the proposed method significantly improved the naturalness and similarity of the converted voice compared to the baselines, even with the noisy inputs of source speakers.

Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 545
Author(s):  
Bor-Jiunn Hwang ◽  
Hui-Hui Chen ◽  
Chaur-Heh Hsieh ◽  
Deng-Yu Huang

Based on experimental observations, there is a correlation between time and consecutive gaze positions in visual behaviors. Previous studies on gaze point estimation usually use images as the input for model trainings without taking into account the sequence relationship between image data. In addition to the spatial features, the temporal features are considered to improve the accuracy in this paper by using videos instead of images as the input data. To be able to capture spatial and temporal features at the same time, the convolutional neural network (CNN) and long short-term memory (LSTM) network are introduced to build a training model. In this way, CNN is used to extract the spatial features, and LSTM correlates temporal features. This paper presents a CNN Concatenating LSTM network (CCLN) that concatenates spatial and temporal features to improve the performance of gaze estimation in the case of time-series videos as the input training data. In addition, the proposed model can be optimized by exploring the numbers of LSTM layers, the influence of batch normalization (BN) and global average pooling layer (GAP) on CCLN. It is generally believed that larger amounts of training data will lead to better models. To provide data for training and prediction, we propose a method for constructing datasets of video for gaze point estimation. The issues are studied, including the effectiveness of different commonly used general models and the impact of transfer learning. Through exhaustive evaluation, it has been proved that the proposed method achieves a better prediction accuracy than the existing CNN-based methods. Finally, 93.1% of the best model and 92.6% of the general model MobileNet are obtained.


Author(s):  
J. Becker ◽  
P. Böhme ◽  
A. Reckert ◽  
S. B. Eickhoff ◽  
B. E. Koop ◽  
...  

AbstractAs a contribution to the discussion about the possible effects of ethnicity/ancestry on age estimation based on DNA methylation (DNAm) patterns, we directly compared age-associated DNAm in German and Japanese donors in one laboratory under identical conditions. DNAm was analyzed by pyrosequencing for 22 CpG sites (CpGs) in the genes PDE4C, RPA2, ELOVL2, DDO, and EDARADD in buccal mucosa samples from German and Japanese donors (N = 368 and N = 89, respectively).Twenty of these CpGs revealed a very high correlation with age and were subsequently tested for differences between German and Japanese donors aged between 10 and 65 years (N = 287 and N = 83, respectively). ANCOVA was performed by testing the Japanese samples against age- and sex-matched German subsamples (N = 83 each; extracted 500 times from the German total sample). The median p values suggest a strong evidence for significant differences (p < 0.05) at least for two CpGs (EDARADD, CpG 2, and PDE4C, CpG 2) and no differences for 11 CpGs (p > 0.3).Age prediction models based on DNAm data from all 20 CpGs from German training data did not reveal relevant differences between the Japanese test samples and German subsamples. Obviously, the high number of included “robust CpGs” prevented relevant effects of differences in DNAm at two CpGs.Nevertheless, the presented data demonstrates the need for further research regarding the impact of confounding factors on DNAm in the context of ethnicity/ancestry to ensure a high quality of age estimation. One approach may be the search for “robust” CpG markers—which requires the targeted investigation of different populations, at best by collaborative research with coordinated research strategies.


2020 ◽  
Vol 25 (2) ◽  
pp. 145-152
Author(s):  
Yan Kuchin ◽  
Ravil Mukhamediev ◽  
Kirill Yakunin ◽  
Janis Grundspenkis ◽  
Adilkhan Symagulov

AbstractMachine learning (ML) methods are nowadays widely used to automate geophysical study. Some of ML algorithms are used to solve lithological classification problems during uranium mining process. One of the key aspects of using classical ML methods is causing data features and estimating their influence on the classification. This paper presents a quantitative assessment of the impact of expert opinions on the classification process. In other words, we have prepared the data, identified the experts and performed a series of experiments with and without taking into account the fact that the expert identifier is supplied to the input of the automatic classifier during training and testing. Feedforward artificial neural network (ANN) has been used as a classifier. The results of the experiments show that the “knowledge” of the ANN of which expert interpreted the data improves the quality of the automatic classification in terms of accuracy (by 5 %) and recall (by 20 %). However, due to the fact that the input parameters of the model may depend on each other, the SHapley Additive exPlanations (SHAP) method has been used to further assess the impact of expert identifier. SHAP has allowed assessing the degree of parameter influence. It has revealed that the expert ID is at least two times more influential than any of the other input parameters of the neural network. This circumstance imposes significant restrictions on the application of ANNs to solve the task of lithological classification at the uranium deposits.


Author(s):  
Yujin Yuan ◽  
Liyuan Liu ◽  
Siliang Tang ◽  
Zhongfei Zhang ◽  
Yueting Zhuang ◽  
...  

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C2SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.


Biomimetics ◽  
2021 ◽  
Vol 6 (1) ◽  
pp. 12
Author(s):  
Marvin Coto-Jiménez

Statistical parametric speech synthesis based on Hidden Markov Models has been an important technique for the production of artificial voices, due to its ability to produce results with high intelligibility and sophisticated features such as voice conversion and accent modification with a small footprint, particularly for low-resource languages where deep learning-based techniques remain unexplored. Despite the progress, the quality of the results, mainly based on Hidden Markov Models (HMM) does not reach those of the predominant approaches, based on unit selection of speech segments of deep learning. One of the proposals to improve the quality of HMM-based speech has been incorporating postfiltering stages, which pretend to increase the quality while preserving the advantages of the process. In this paper, we present a new approach to postfiltering synthesized voices with the application of discriminative postfilters, with several long short-term memory (LSTM) deep neural networks. Our motivation stems from modeling specific mapping from synthesized to natural speech on those segments corresponding to voiced or unvoiced sounds, due to the different qualities of those sounds and how HMM-based voices can present distinct degradation on each one. The paper analyses the discriminative postfilters obtained using five voices, evaluated using three objective measures, Mel cepstral distance and subjective tests. The results indicate the advantages of the discriminative postilters in comparison with the HTS voice and the non-discriminative postfilters.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Kejia Zhang ◽  
Xu Zhang ◽  
Hongtao Song ◽  
Haiwei Pan ◽  
Bangju Wang

With the continuous improvement of people’s quality of life, air quality issues have become one of the topics of daily concern. How to achieve accurate predictions of air quality in a variety of complex situations is the key to the rapid response of local governments. This paper studies two problems: (1) how to predict the air quality of any monitoring station based on the existing weather and environmental data while considering the spatiotemporal correlation among monitoring stations and (2) how to maintain the accuracy and stability of the forecast even when the available data is severely insufficient. A prediction model combining Long Short-Term Memory networks (LSTM) and Graph Attention (GAT) mechanism is proposed to solve the first problems. A metalearning algorithm for the prediction model is proposed to solve the second problem. LSTM is used to characterize the temporal correlation of historical data and GAT is used to characterize the spatial correlation among all the monitoring stations in the target city. In the case of insufficient training data, the proposed metalearning algorithm can be used to transfer knowledge from other cities with abundant training data. Through testing on public data sets, the proposed model has obvious advantages in accuracy compared with baseline models. Combining with the metalearning algorithm, it gives a much better performance in the case of insufficient training data.


2008 ◽  
Vol 16 (6) ◽  
pp. 433-437 ◽  
Author(s):  
Bethany Smith ◽  
Anna Chur-Hansen ◽  
Alice Neale ◽  
Jonathon Symon

Objectives: Cholinesterase inhibitors’ (ChEIs) impact on cognitive functioning in Alzheimer's disease has been extensively researched. The effect of ChEIs on improving day-to-day living and quality of life in conjunction with level of functioning for patients or their carers has not been investigated. Method: Five spouse dyads (patient and carer) and one additional carer were interviewed about their perceptions of ChEIs in relation to their influence on daily life for both parties. Interviews were transcribed and thematic analysis conducted. Results: Themes identified were forgetfulness, differences in long-term versus short-term memory, independence/dependence, negative emotion, no appreciable benefit, sense of hopelessness, carer as motivator, stabilization of the patient, and never regain what has been lost. Conclusions: This study suggests that ChEI medication does not enhance life for the patient or their primary caregiver. Further qualitative and quantitative research is required into the impact of ChEIs upon both the patient and their caregivers.


2020 ◽  
Vol 5 (3) ◽  
pp. 229-233
Author(s):  
Olaide Ayodeji Agbolade

This research presents a neural network based voice conversion model. While it is a known fact that voiced sounds and prosody are the most important component of the voice conversion framework, what is not known is their objective contributions particularly in a noisy and uncontrolled environment. This model uses a 3 layer feedforward neural network to map the Linear prediction analysis coefficients of a source speaker to the acoustic vector space of the target speaker with a view to objectively determine the contributions of the voiced, unvoiced and supra-segmental components of sounds to the voice conversion model. Results showed that vowels “a”, “i”, “o” have the most significant contribution in the conversion success. The voiceless sounds were also found to be most affected by the noisy training data. An average noise level of 40 dB above the noise floor were found to degrade the voice conversion success by 55.14 percent relative to the voiced sounds. The result also show that for cross-gender voice conversion, prosody conversion is more significant in scenarios where a female is the target speaker.


Symmetry ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 66
Author(s):  
Chin-Shiuh Shieh ◽  
Thanh-Tuan Nguyen ◽  
Wan-Wei Lin ◽  
Yong-Lin Huang ◽  
Mong-Fong Horng ◽  
...  

DDoS (Distributed Denial of Service) has emerged as a serious and challenging threat to computer networks and information systems’ security and integrity. Before any remedial measures can be implemented, DDoS assaults must first be detected. DDoS attacks can be identified and characterized with satisfactory achievement employing ML (Machine Learning) and DL (Deep Learning). However, new varieties of aggression arise as the technology for DDoS attacks keep evolving. This research explores the impact of a new incarnation of DDoS attack–adversarial DDoS attack. There are established works on ML-based DDoS detection and GAN (Generative Adversarial Network) based adversarial DDoS synthesis. We confirm these findings in our experiments. Experiments in this study involve the extension and application of the GAN, a machine learning framework with symmetric form having two contending neural networks. We synthesize adversarial DDoS attacks utilizing Wasserstein Generative Adversarial Networks featuring Gradient Penalty (GP-WGAN). Experiment results indicate that the synthesized traffic can traverse the detection systems such as k-Nearest Neighbor (KNN), Multi-Layer Perceptron (MLP) and Random Forest (RF) without being identified. This observation is a sobering and pessimistic wake-up call, implying that countermeasures to adversarial DDoS attacks are urgently needed. To this problem, we propose a novel DDoS detection framework featuring GAN with Dual Discriminators (GANDD). The additional discriminator is designed to identify adversary DDoS traffic. The proposed GANDD can be an effective solution to adversarial DDoS attacks, as evidenced by the experimental results. We use adversarial DDoS traffic synthesized by GP-WGAN to train GANDD and validate it alongside three other DL technologies: DNN (Deep Neural Network), LSTM (Long Short-Term Memory) and GAN. GANDD outperformed the other DL models, demonstrating its protection with a TPR of 84.3%. A more sophisticated test was also conducted to examine GANDD’s ability to handle unseen adversarial attacks. GANDD was evaluated with adversarial traffic not generated from its training data. GANDD still proved effective with a TPR around 71.3% compared to 7.4% of LSTM.


2018 ◽  
Author(s):  
Faiz Ali Shah ◽  
Kairit Sirts ◽  
Dietmar Pfahl

The quality of automatic app feature extraction from app reviews depends on various aspects, e.g. the feature extraction method, training and evaluation datasets, evaluation method etc. Annotation guidelines used to guide the annotation of training and evaluation datasets can have a considerable impact to the quality of the whole system but it is one of the aspects that has been commonly overlooked. In this study, we explore the effects of annotation guidelines to the quality of app feature extraction. As a main result, we propose several changes to the existing annotation guidelines with a goal of making the extracted app features more useful and informative to the app developers. We test the proposed changes via simulating the application of the new annotation guidelines and then evaluating the performance of the supervised machine learning models trained on datasets annotated with initial and simulated annotation guidelines. While the overall performance of automatic app feature extraction remains the same as compared to the model trained on the dataset with initial annotations, the features extracted by the model trained on the dataset with simulated new annotations are less noisy and more informative to the app developers. Secondly, we are interested in what kind of annotated training data is necessary for training an automatic app feature extraction model. In particular, we explore whether the training set should contain annotated app reviews from those apps/app categories on which the model is subsequently planned to be applied, or is it sufficient to have annotated app reviews from any app available for training, even when these apps are from very different categories compared to the test app. Our experiments show that having annotated training reviews from the test app is not necessary although including them into training set helps to improve recall. Furthermore, we test whether augmenting the training set with annotated product reviews helps to improve the performance of app feature extraction. We find that the models trained on augmented training set lead to improved recall but at the cost of the drop in precision.


Sign in / Sign up

Export Citation Format

Share Document