scholarly journals Neural Speech Synthesis with Transformer Network

Author(s):  
Naihan Li ◽  
Shujie Liu ◽  
Yanqing Liu ◽  
Sheng Zhao ◽  
Ming Liu

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency during training and inference; 2) hard to model long dependency using current recurrent neural networks (RNNs). Inspired by the success of Transformer network in neural machine translation (NMT), in this paper, we introduce and adapt the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in Tacotron2. With the help of multi-head self-attention, the hidden states in the encoder and decoder are constructed in parallel, which improves training efficiency. Meanwhile, any two inputs at different times are connected directly by a self-attention mechanism, which solves the long range dependency problem effectively. Using phoneme sequences as input, our Transformer TTS network generates mel spectrograms, followed by a WaveNet vocoder to output the final audio results. Experiments are conducted to test the efficiency and performance of our new network. For the efficiency, our Transformer TTS network can speed up the training about 4.25 times faster compared with Tacotron2. For the performance, rigorous human tests show that our proposed model achieves state-of-the-art performance (outperforms Tacotron2 with a gap of 0.048) and is very close to human quality (4.39 vs 4.44 in MOS).

Author(s):  
Xiangpeng Wei ◽  
Yue Hu ◽  
Luxi Xing ◽  
Yipeng Wang ◽  
Li Gao

The dominant neural machine translation (NMT) models that based on the encoder-decoder architecture have recently achieved the state-of-the-art performance. Traditionally, the NMT models only depend on the representations learned during training for mapping a source sentence into the target domain. However, the learned representations often suffer from implicit and inadequately informed properties. In this paper, we propose a novel bilingual topic enhanced NMT (BLTNMT) model to improve translation performance by incorporating bilingual topic knowledge into NMT. Specifically, the bilingual topic knowledge is included into the hidden states of both encoder and decoder, as well as the attention mechanism. With this new setting, the proposed BLT-NMT has access to the background knowledge implied in bilingual topics which is beyond the sequential context, and enables the attention mechanism to attend to topic-level attentions for generating accurate target words during translation. Experimental results show that the proposed model consistently outperforms the traditional RNNsearch and the previous topic-informed NMT on Chinese-English and EnglishGerman translation tasks. We also introduce the bilingual topic knowledge into the newly emerged Transformer base model on English-German translation and achieve a notable improvement.


2021 ◽  
Vol 11 (21) ◽  
pp. 10475
Author(s):  
Xiao Zhou ◽  
Zhenhua Ling ◽  
Yajun Hu ◽  
Lirong Dai

An encoder–decoder with attention has become a popular method to achieve sequence-to-sequence (Seq2Seq) acoustic modeling for speech synthesis. To improve the robustness of the attention mechanism, methods utilizing the monotonic alignment between phone sequences and acoustic feature sequences have been proposed, such as stepwise monotonic attention (SMA). However, the phone sequences derived by grapheme-to-phoneme (G2P) conversion may not contain the pauses at the phrase boundaries in utterances, which challenges the assumption of strictly stepwise alignment in SMA. Therefore, this paper proposes to insert hidden states into phone sequences to deal with the situation that pauses are not provided explicitly, and designs a semi-stepwise monotonic attention (SSMA) to model these inserted hidden states. In this method, hidden states are introduced that absorb the pause segments in utterances in an unsupervised way. Thus, the attention at each decoding frame has three options, moving forward to the next phone, staying at the same phone, or jumping to a hidden state. Experimental results show that SSMA can achieve better naturalness of synthetic speech than SMA when phrase boundaries are not available. Moreover, the pause positions derived from the alignment paths of SSMA matched the manually labeled phrase boundaries quite well.


2010 ◽  
Vol 15 (2) ◽  
pp. 121-131 ◽  
Author(s):  
Remus Ilies ◽  
Timothy A. Judge ◽  
David T. Wagner

This paper focuses on explaining how individuals set goals on multiple performance episodes, in the context of performance feedback comparing their performance on each episode with their respective goal. The proposed model was tested through a longitudinal study of 493 university students’ actual goals and performance on business school exams. Results of a structural equation model supported the proposed conceptual model in which self-efficacy and emotional reactions to feedback mediate the relationship between feedback and subsequent goals. In addition, as expected, participants’ standing on a dispositional measure of behavioral inhibition influenced the strength of their emotional reactions to negative feedback.


2001 ◽  
Vol 29 (2) ◽  
pp. 108-132 ◽  
Author(s):  
A. Ghazi Zadeh ◽  
A. Fahim

Abstract The dynamics of a vehicle's tires is a major contributor to the vehicle stability, control, and performance. A better understanding of the handling performance and lateral stability of the vehicle can be achieved by an in-depth study of the transient behavior of the tire. In this article, the transient response of the tire to a steering angle input is examined and an analytical second order tire model is proposed. This model provides a means for a better understanding of the transient behavior of the tire. The proposed model is also applied to a vehicle model and its performance is compared with a first order tire model.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2648
Author(s):  
Muhammad Aamir ◽  
Tariq Ali ◽  
Muhammad Irfan ◽  
Ahmad Shaf ◽  
Muhammad Zeeshan Azam ◽  
...  

Natural disasters not only disturb the human ecological system but also destroy the properties and critical infrastructures of human societies and even lead to permanent change in the ecosystem. Disaster can be caused by naturally occurring events such as earthquakes, cyclones, floods, and wildfires. Many deep learning techniques have been applied by various researchers to detect and classify natural disasters to overcome losses in ecosystems, but detection of natural disasters still faces issues due to the complex and imbalanced structures of images. To tackle this problem, we propose a multilayered deep convolutional neural network. The proposed model works in two blocks: Block-I convolutional neural network (B-I CNN), for detection and occurrence of disasters, and Block-II convolutional neural network (B-II CNN), for classification of natural disaster intensity types with different filters and parameters. The model is tested on 4428 natural images and performance is calculated and expressed as different statistical values: sensitivity (SE), 97.54%; specificity (SP), 98.22%; accuracy rate (AR), 99.92%; precision (PRE), 97.79%; and F1-score (F1), 97.97%. The overall accuracy for the whole model is 99.92%, which is competitive and comparable with state-of-the-art algorithms.


Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1589
Author(s):  
Yongkeun Hwang ◽  
Yanghoon Kim ◽  
Kyomin Jung

Neural machine translation (NMT) is one of the text generation tasks which has achieved significant improvement with the rise of deep neural networks. However, language-specific problems such as handling the translation of honorifics received little attention. In this paper, we propose a context-aware NMT to promote translation improvements of Korean honorifics. By exploiting the information such as the relationship between speakers from the surrounding sentences, our proposed model effectively manages the use of honorific expressions. Specifically, we utilize a novel encoder architecture that can represent the contextual information of the given input sentences. Furthermore, a context-aware post-editing (CAPE) technique is adopted to refine a set of inconsistent sentence-level honorific translations. To demonstrate the efficacy of the proposed method, honorific-labeled test data is required. Thus, we also design a heuristic that labels Korean sentences to distinguish between honorific and non-honorific styles. Experimental results show that our proposed method outperforms sentence-level NMT baselines both in overall translation quality and honorific translations.


2019 ◽  
Vol 34 (6) ◽  
pp. 429-442 ◽  
Author(s):  
Manuel London

Purpose Drawing on existing theory, a model is developed to illustrate how the interaction between leaders and followers similarity in narcissism and goal congruence may influence subgroup formation in teams, and how this interaction influences team identification and team performance. Design/methodology/approach The proposed model draws on dominance complementary, similarity attraction, faultline formation and trait activation theories. Findings Leader–follower similarity in narcissism and goal congruence may stimulate subgroup formation, possibly resulting in conformers, conspirators, outsiders and victims, especially when performance pressure on a team is high. Followers who are low in narcissism and share goals with a leader who is narcissistic are likely to become conformers. Followers who are high in narcissism and share goals with a narcissistic leader are likely to become confederates. Followers who do not share goals with a narcissistic leader will be treated by the leader and other members as outsiders if they are high in narcissism, and victimized if they are low in narcissism. In addition, the emergence of these subgroups leads to reduced team identification and lower team performance. Practical implications Higher level managers, coaches and human resource professions can assess and, if necessary, counteract low team identification and performance resulting from the narcissistic personality characteristics of leaders and followers. Originality/value The model addresses how and under what conditions narcissistic leaders and followers may influence subgroup formation and team outcomes.


2021 ◽  
pp. 146808742110692
Author(s):  
Zhenyu Shen ◽  
Yanjun Li ◽  
Nan Xu ◽  
Baozhi Sun ◽  
Yunpeng Fu ◽  
...  

Recently, the stringent international regulations on ship energy efficiency and NOx emissions from ocean-going ships make energy conservation and emission reduction be the theme of the shipping industry. Due to its fuel economy and reliability, most large commercial vessels are propelled by a low-speed two-stroke marine diesel engine, which consumes most of the fuel in the ship. In the present work, a zero-dimensional model is developed, which considers the blow-by, exhaust gas bypass, gas exchange, turbocharger, and heat transfer. Meanwhile, the model is improved by considering the heating effect of the blow-by gas on the intake gas. The proposed model is applied to a MAN B&W low-speed two-stroke marine diesel engine and validated with the engine shop test data. The simulation results are in good agreement with the experimental results. The accuracy of the model is greatly improved after considering the heating effect of blow-by gas. The model accuracy of most parameters has been improved from within 5% to within 2%, by considering the heating effect of blow-by gas. Finally, the influence of blow-by area change on engine performance is analyzed with considering and without considering the heating effect of blow-by.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Rita Shakouri ◽  
Maziar Salahi

Purpose This paper aims to apply a new approach for resource sharing and efficiency estimation of subunits in the presence of non-discretionary factors and partial impacts among inputs and outputs in the data envelopment analysis (DEA) framework. Design/methodology/approach First, inspired by the Imanirad et al.’s model (2013), the authors consider that each decision-making unit (DMU) may consist of several subunits, that each of which can be affected by non-discretionary inputs. After that, the Banker and Morey’s model (1996) is used for modeling non-discretionary factors. For measuring performance of several subunits, which can be considered as DMUs, the aggregate efficiency is suggested. At last, the overall efficiency is computed and compared with each other. Findings One of the important features of proposed model is that each output in this model applies discretionary input according to its need; therefore, the result of this study will make it easier for the managers to make better decisions. Also, it indicates that significant predictions of the development of the overall efficiency of DMUs can be based on observing the development level of subunits because of the influence of non-discretionary input. Therefore, the proposed model provides a more reasonable and encompassing measure of performance in participating non-discretionary and discretionary inputs to better efficiency. An application of the proposed model for gaining efficiency of 17 road patrols is provided. Research limitations/implications More non-discretionary and discretionary inputs can be taken into consideration for a better analysis. This study provides us with a framework for performance measures along with useful managerial insights. Focusing upon the right scope of operations may help out the management in improving their overall efficiency and performance. In the recent highway maintenance management systems, the environmental differences exist among patrols and other geotechnical services under the climate diverse. Further, in some cases, there might exist more than one non-discretionary factor that can have different effects on the subunits’ performance. Practical implications The purpose of this paper was to measure the performance of a set of the roadway maintenance crews and to analyze the impact of non-discretionary inputs on the efficiency of the roadway maintenance. The application of the proposed model, on the one hand, showed that each output in this model uses discretionary input according to its requirement, and on the other hand, the result showed that meaningful predictions of the development of the overall efficiency of DMUs can be based on observing the development level of subunits because of the impact of non-discretionary input. Originality/value Providing information on resource sharing by taking into account non-discretionary factors for each subunit can help managers to make better decisions to increase the efficiency.


2017 ◽  
Vol 46 (5) ◽  
pp. 699-715 ◽  
Author(s):  
Peter D. MacIntyre ◽  
Ben Schnare ◽  
Jessica Ross

Learning the skills to be a musician requires an enormous amount of effort and dedication, a long-term process that requires sustained motivation. Motivation for music is complex, blending relatively intrinsic and extrinsic motives. The purpose of this study is to investigate the motivation of musicians by considering how different aspects of motivational features interact. An international sample of 188 musicians was obtained through the use of an online survey. Four scales drawn from Self-Determination Theory (intrinsic, identified, introjected, and extrinsic regulation) were utilized along with other motivational constructs, including motivational intensity, desire to learn, willingness to play, perceived competence, and musical self-esteem. To integrate the variables into a proposed model, a path analysis was conducted among the motivation variables. Results showed that the intrinsic motives are playing the major role in the maintenance of the motivational system, while extrinsic motives are less influential. Support was found for a feedback loop, whereby desire to learn feeds into increased effort at learning (i.e., motivational intensity), leading to the development of perceived competence, which is then reflected back into increasing desire to learn. Increases in these variables help to create a virtuous cycle of motivation for music learning and performance.


Sign in / Sign up

Export Citation Format

Share Document