Efficient corpus design for wake-word detection

Author(s):  
Delowar Hossain ◽  
Yoshinao Sato
Keyword(s):  
Corpora ◽  
2020 ◽  
Vol 15 (2) ◽  
pp. 125-140
Author(s):  
Yukiko Ohashi ◽  
Noriaki Katagiri ◽  
Katsutoshi Oka ◽  
Michiko Hanada

This paper reports on two research results: ( 1) designing an English for Specific Purposes (esp) corpus architecture complete with annotations structured by regular expressions; and ( 2) a case study to test the design to cater for creating a specific vocabulary list using the compiled corpus. The first half of this study involved designing a precisely structured esp corpus from 190 veterinary medical charts with a hierarchy of the data. The data hierarchy in the corpus consists of document types, outline elements and inline elements, such as species and breed. Perl scripts extracted the data attached to veterinary-specific categories, and the extraction led to creating wordlists. The second part of the research tested the corpus mode, creating a list of commonly observed lexical items in veterinary medicine. The coverage rate of the wordlists by General Service List (gsl) and Academic Word List (awl) was tested, with the result that 66.4 percent of all lexical items appeared in gsl and awl, whereas 33.7 percent appeared in none of those lists. The corpus compilation procedures as well as the annotation scheme introduced in this study enable the compilation of specific corpora with explicit annotations, allowing teachers to have access to data required for creating esp classroom materials.


2014 ◽  
Author(s):  
Minlie Huang ◽  
Borui Ye ◽  
Yichen Wang ◽  
Haiqiang Chen ◽  
Junjun Cheng ◽  
...  

2016 ◽  
Vol 7 (2) ◽  
pp. 76-82
Author(s):  
Hugeng Hugeng ◽  
Edbert Hansel

We have built an application of speech recognition for Indonesian geography dictionary based on Android operating system, named GAIA. This application uses a smartphone as a device to receive input in the form of a spoken word from a user. The approach used in recognition is Hidden Markov Model which is contained in the Pocketsphinx library. The phonemes used are Indonesian phonemes’ rule. The advantage of this application is that it can be used without internet access. In the application testing, word detection is done with four conditions to determine the level of accuracy. The four conditions are near silent, near noisy, far silent, and far noisy. From the testing and analysis conducted, it can be concluded that GAIA application can be built as a speech recognition application on Android for Indonesian geography dictionary; with the results in the near silent condition accuracy of word recognition reaches an average of 52.87%, in the near noisy reaches an average of 14.5%, in the far silent condition reaches an average of 23.2%, and in the far noisy condition reaches an average of 2.8%. Index Terms—speech recognition, Indonesian geography dictionary, Hidden Markov Model, Pocketsphinx, Android.


2013 ◽  
Vol 24 (5) ◽  
pp. 1051-1060 ◽  
Author(s):  
Fei CHEN ◽  
Yi-Qun LIU ◽  
Chao WEI ◽  
Yun-Liang ZHANG ◽  
Min ZHANG ◽  
...  

2004 ◽  
Author(s):  
J. Bruce Millar ◽  
Michael Wagner ◽  
Roland Goecke

Author(s):  
Jannatul Ferdousi Sohana ◽  
Ranak Jahan Rupa ◽  
Moqsadur Rahman
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document