scholarly journals OCR Challenges for a Latvian Pronunciation Dictionary

Author(s):  
Laine Strankale ◽  
Pēteris Paikens

This paper covers the devlopment of a custom OCR solution based on the Tesseract open source engine developed for digitization of a Latvian pronunciation dictionary where the pronunciation data is described using a large variety of diacritic markings not supported by standard OCR solutions. We describe our efforts in training a model for these symbols without the additional support of preexisting dictionaries and illustrate how word error rate (WER) and character error rate (CER) are affected by changes in the dataset content and size. We also provide an error analysis and postulate possible causes for common pitfalls. The resulting model achieved a CER of 2.07%, making it suitable for digitization of the whole dictionary in combination with heuristic post-processing and proofreading, resulting in a useful resource for further development of speech technology for Latvian.

2011 ◽  
Vol 37 (4) ◽  
pp. 657-688 ◽  
Author(s):  
Maja Popović ◽  
Hermann Ney

Evaluation and error analysis of machine translation output are important but difficult tasks. In this article, we propose a framework for automatic error analysis and classification based on the identification of actual erroneous words using the algorithms for computation of Word Error Rate (WER) and Position-independent word Error Rate (PER), which is just a very first step towards development of automatic evaluation measures that provide more specific information of certain translation problems. The proposed approach enables the use of various types of linguistic knowledge in order to classify translation errors in many different ways. This work focuses on one possible set-up, namely, on five error categories: inflectional errors, errors due to wrong word order, missing words, extra words, and incorrect lexical choices. For each of the categories, we analyze the contribution of various POS classes. We compared the results of automatic error analysis with the results of human error analysis in order to investigate two possible applications: estimating the contribution of each error type in a given translation output in order to identify the main sources of errors for a given translation system, and comparing different translation outputs using the introduced error categories in order to obtain more information about advantages and disadvantages of different systems and possibilites for improvements, as well as about advantages and disadvantages of applied methods for improvements. We used Arabic–English Newswire and Broadcast News and Chinese–English Newswire outputs created in the framework of the GALE project, several Spanish and English European Parliament outputs generated during the TC-Star project, and three German–English outputs generated in the framework of the fourth Machine Translation Workshop. We show that our results correlate very well with the results of a human error analysis, and that all our metrics except the extra words reflect well the differences between different versions of the same translation system as well as the differences between different translation systems.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3063
Author(s):  
Aleksandr Laptev ◽  
Andrei Andrusenko ◽  
Ivan Podluzhny ◽  
Anton Mitrofanov ◽  
Ivan Medennikov ◽  
...  

With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. For on-device speech recognition tasks, researchers and industry prefer end-to-end ASR systems as they can be made resource-efficient while maintaining a higher quality compared to hybrid systems. However, building end-to-end models requires a significant amount of speech data. Personalization, which is mainly handling out-of-vocabulary (OOV) words, is another challenging task associated with speech assistants. In this work, we consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate, embodied in Babel Turkish and Babel Georgian tasks. We propose a method of dynamic acoustic unit augmentation based on the Byte Pair Encoding with dropout (BPE-dropout) technique. The method non-deterministically tokenizes utterances to extend the token’s contexts and to regularize their distribution for the model’s recognition of unseen words. It also reduces the need for optimal subword vocabulary size search. The technique provides a steady improvement in regular and personalized (OOV-oriented) speech recognition tasks (at least 6% relative word error rate (WER) and 25% relative F-score) at no additional computational cost. Owing to the BPE-dropout use, our monolingual Turkish Conformer has achieved a competitive result with 22.2% character error rate (CER) and 38.9% WER, which is close to the best published multilingual system.


Author(s):  
Cosmin Munteanu ◽  
Gerald Penn ◽  
Ron Baecker ◽  
Elaine Toms ◽  
David James
Keyword(s):  

Author(s):  
Morgan Magnin ◽  
Guillaume Moreau ◽  
Nelle Varoquaux ◽  
Benjamin Vialle ◽  
Karen Reid ◽  
...  

A critical component of the learning process lies in the feedback that students receive on their work that validates their progress, identifies flaws in their thinking, and identifies skills that still need to be learned. Many higher-education institutions have developed an active pedagogy that gives students opportunities for different forms of assessment and feedback. This means that students have numerous lab exercises, assignments, and projects. Both instructors and students thus require effective tools to efficiently manage the submission, assessment, and individualized feedback of students’ work. The open-source web application MarkUs aims at meeting these needs: it facilitates the submission and assessment of students’ work. Students directly submit their work using MarkUs, rather than printing it, or sending it by email. The instructors or teaching assistants use MarkUs’s interface to view the students’ work, annotate it, and fill in a marking rubric. Students use the same interface to read the annotations and learn from the assessment. Managing the students’ submissions and the instructors assessments within a single online system, has led to several positive pedagogical outcomes: the number of late submissions has decreased, the assessment time has been drastically reduced, students can access their results and read the instructor’s feedback immediately after the grading process is completed. Using MarkUs has also significantly reduced the time that instructors spend collecting assignments, creating the marking schemes, passing them on to graders, handling special cases, and returning work to the students. In this paper, we introduce MarkUs’ features, and illustrate their benefits for higher education through our own teaching experiences and that of our colleagues. We also describe an important benefit of the fact that the tool itself is open-source. MarkUs has been developed entirely by students giving them a valuable learning opportunity as they work on a large software system that real users depend on. Virtuous circles indeed arise, with former users of MarkUs becoming developers and then supervisors of further development. We will conclude by drawing perspectives about forthcoming features and use, both technically and pedagogically.


2020 ◽  
Vol 6 (12) ◽  
pp. 141
Author(s):  
Abdelrahman Abdallah ◽  
Mohamed Hamada ◽  
Daniyar Nurseitov

This article considers the task of handwritten text recognition using attention-based encoder–decoder networks trained in the Kazakh and Russian languages. We have developed a novel deep neural network model based on a fully gated CNN, supported by multiple bidirectional gated recurrent unit (BGRU) and attention mechanisms to manipulate sophisticated features that achieve 0.045 Character Error Rate (CER), 0.192 Word Error Rate (WER), and 0.253 Sequence Error Rate (SER) for the first test dataset and 0.064 CER, 0.24 WER and 0.361 SER for the second test dataset. Our proposed model is the first work to handle handwriting recognition models in Kazakh and Russian languages. Our results confirm the importance of our proposed Attention-Gated-CNN-BGRU approach for training handwriting text recognition and indicate that it can lead to statistically significant improvements (p-value < 0.05) in the sensitivity (recall) over the tests dataset. The proposed method’s performance was evaluated using handwritten text databases of three languages: English, Russian, and Kazakh. It demonstrates better results on the Handwritten Kazakh and Russian (HKR) dataset than the other well-known models.


2012 ◽  
pp. 13-19
Author(s):  
Riaz Ahmad Qamar ◽  
Mohd Aizaini Maarof ◽  
Subariah Ibrahim

A quantum key distribution protocol(QKD), known as BB84, was developed in 1984 by Charles Bennett and Gilles Brassard. The protocol works in two phases which are quantum state transmission and conventional post processing. In the first phase of BB84, raw key elements are distributed between two legitimate users by sending encoded photons through quantum channel whilst in the second phase, a common secret-key is obtained from correlated raw key elements by exchanging messages through a public channel e.g.; network or internet. The secret-key so obtained is used for cryptography purpose. Reconciliation is a compulsory part of post processing and hence of quantum key distribution protocol. The performance of a reconciliation protocol depends on the generation rate of common secret-key, number of bits disclosed and the error probability in common secrete-key. These characteristics of a protocol can be achieved by using a less interactive reconciliation protocol which can handle a higher initial quantum bit error rate (QBER). In this paper, we use a simple Bose, Chaudhuri, Hocquenghem (BCH) error correction algorithm with simplified syndrome table to achieve an efficient reconciliation protocol which can handle a higher quantum bit error rate and outputs a common key with zero error probability. The proposed protocol efficient in removing errors such that it can remove all errors even if QBER is 60%. Assuming the post processing channel is an authenticated binary symmetric channel (BSC).


Sign in / Sign up

Export Citation Format

Share Document