Low-Frequency Character Clustering for End-to-End ASR System

Author(s):  
Hitoshi Ito ◽  
Aiko Hagiwara ◽  
Manon Ichiki ◽  
Takeshi Kobayakawa ◽  
Takeshi Mishima ◽  
...  
Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3063
Author(s):  
Aleksandr Laptev ◽  
Andrei Andrusenko ◽  
Ivan Podluzhny ◽  
Anton Mitrofanov ◽  
Ivan Medennikov ◽  
...  

With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. For on-device speech recognition tasks, researchers and industry prefer end-to-end ASR systems as they can be made resource-efficient while maintaining a higher quality compared to hybrid systems. However, building end-to-end models requires a significant amount of speech data. Personalization, which is mainly handling out-of-vocabulary (OOV) words, is another challenging task associated with speech assistants. In this work, we consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate, embodied in Babel Turkish and Babel Georgian tasks. We propose a method of dynamic acoustic unit augmentation based on the Byte Pair Encoding with dropout (BPE-dropout) technique. The method non-deterministically tokenizes utterances to extend the token’s contexts and to regularize their distribution for the model’s recognition of unseen words. It also reduces the need for optimal subword vocabulary size search. The technique provides a steady improvement in regular and personalized (OOV-oriented) speech recognition tasks (at least 6% relative word error rate (WER) and 25% relative F-score) at no additional computational cost. Owing to the BPE-dropout use, our monolingual Turkish Conformer has achieved a competitive result with 22.2% character error rate (CER) and 38.9% WER, which is close to the best published multilingual system.


2019 ◽  
Vol 146 (4) ◽  
pp. 2959-2959
Author(s):  
Mark Thomas ◽  
Bruce Martin ◽  
Katie Kowarski ◽  
Briand Gaudet ◽  
Stan Matwin

1980 ◽  
Vol 99 (2) ◽  
pp. 383-397 ◽  
Author(s):  
Y. L. Sinai

The low-frequency character of two model problems is exploited in order to illustrate the acoustic consequences of the interactions between chemically reacting (or relaxing) inhomogeneities and flames or constrictions in ducts. The monopole of the former is associated with heat transfer in a fluid which exhibits variations in its specific heats, while in the latter there is an extension of the classical phenomenon associated with the pulsations of an inhomogeneity of the fluid compressibility. This second mechanism is found to be insignificant, but the heat-conduction source is considered to be very powerful at sufficiently low Mach numbers; in fact, to first order it is independent of the flow Mach number for laminar, as well as a certain class of turbulent, flows.


2020 ◽  
Vol 8 (11) ◽  
pp. 933
Author(s):  
Marinella Masina ◽  
Renata Archetti ◽  
Alberto Lamberti

In order to obtain a fair and reliable description of the wave amplitude and currents in harbors due to the tsunami generated by the 21 May 2003 Boumerdès earthquake (Algeria), a numerical investigation has been performed with a standard hydraulic numerical model combined with various source fault models. Seven different rupture models proposed in literature to represent high frequency seismic effects have been used to simulate tsunami generation. The tsunami wave propagation across the Western Mediterranean Sea and in bays and harbors of the Balearic Islands is simulated, and results are checked against sea level measurements. All of them resulted in a significant underestimation of the tsunami impact on the Balearic coasts. In the paper the best fitting source model is identified, justifying the energy intensification of the event to account for low frequency character of tsunami waves. A fair correspondence is pointed out between damages to boats and harbor infrastructures, reported in newspapers, and wave intensity, characterized by level extremes and current intensity. Current speed and amplitude thresholds for possible damage in harbors suggested respectively by Lynett et al., doi.org/10.1002/2013GL058680, and Muhari et al., doi.org/10.1007/s11069-015-1772-0, are confirmed by the present analysis.


2020 ◽  
Author(s):  
Abhinav Garg ◽  
Gowtham P. Vadisetti ◽  
Dhananjaya Gowda ◽  
Sichen Jin ◽  
Aditya Jayasimha ◽  
...  
Keyword(s):  

Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 1809
Author(s):  
Long Zhang ◽  
Ziping Zhao ◽  
Chunmei Ma ◽  
Linlin Shan ◽  
Huazhi Sun ◽  
...  

Advanced automatic pronunciation error detection (APED) algorithms are usually based on state-of-the-art automatic speech recognition (ASR) techniques. With the development of deep learning technology, end-to-end ASR technology has gradually matured and achieved positive practical results, which provides us with a new opportunity to update the APED algorithm. We first constructed an end-to-end ASR system based on the hybrid connectionist temporal classification and attention (CTC/attention) architecture. An adaptive parameter was used to enhance the complementarity of the connectionist temporal classification (CTC) model and the attention-based seq2seq model, further improving the performance of the ASR system. After this, the improved ASR system was used in the APED task of Mandarin, and good results were obtained. This new APED method makes force alignment and segmentation unnecessary, and it does not require multiple complex models, such as an acoustic model or a language model. It is convenient and straightforward, and will be a suitable general solution for L1-independent computer-assisted pronunciation training (CAPT). Furthermore, we find that in regards to accuracy metrics, our proposed system based on the improved hybrid CTC/attention architecture is close to the state-of-the-art ASR system based on the deep neural network–deep neural network (DNN–DNN) architecture, and has a stronger effect on the F-measure metrics, which are especially suitable for the requirements of the APED task.


2021 ◽  
Author(s):  
Zhifu Gao ◽  
Yiwu Yao ◽  
Shiliang Zhang ◽  
Jun Yang ◽  
Ming Lei ◽  
...  
Keyword(s):  

1964 ◽  
Vol 1 (9) ◽  
pp. 10 ◽  
Author(s):  
W.M. G. Van Dorn

The distribution of permanent, vertical crustal dislocations, the times and directions of early water motion in and around the generation area, and the unusual low frequency character of the tsunami record obtained from Wake Island, all suggest that the tsunami associated with the great Alaskan earthquake of March 28, 1964 was produced by a dipolar movement of the earth's crust, centered along a line running from Hinchinbrook Island (Prince William Sound) southwesterly to the Trinity Islands. The positive pole of this disturbance encompassed most of the shallow shelf bordering the Gulf of Alaska, while the negative pole lay mostly under land. Thus, the early effect was the drainage of water from the shelf into the Gulf, thus generating a long solitary wave, which radiated out over the Pacific with very little dispersion. Tilting of Prince William Sound to the northwest produced strong seiching action in the deep, narrow adjacent fjords, thus inundating inhabited places already suffering from earth shock and slumping of the deltas on which they were situated. Preliminary calculations indicate that the initial positive phase of the tsunami contained about 2.3 x 102lergs of energy, as compared with 2.7 x 1022ergs computed for the tsunami of March 9, 1957 in the Andreanof Islands.


2021 ◽  
Author(s):  
Ekaterina Egorova ◽  
Hari Krishna Vydana ◽  
Lukáš Burget ◽  
Jan Černocký
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document