Dual Loss for Manga Character Recognition with Imbalanced Training Data

Author(s):  
Yonggang Li ◽  
Yafeng Zhou ◽  
Yongtao Wang ◽  
Xiaoran Qin ◽  
Zhi Tang
2021 ◽  
Vol 11 (6) ◽  
pp. 2866
Author(s):  
Damheo Lee ◽  
Donghyun Kim ◽  
Seung Yun ◽  
Sanghun Kim

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.


2021 ◽  
Author(s):  
Komuravelli Prashanth ◽  
Kalidas Yeturu

<div>There are millions of scanned documents worldwide in around 4 thousand languages. Searching for information in a scanned document requires a text layer to be available and indexed. Preparation of a text layer requires recognition of character and sub-region patterns and associating with a human interpretation. Developing an optical character recognition (OCR) system for each and every language is a very difficult task if not impossible. There is a strong need for systems that add on top of the existing OCR technologies by learning from them and unifying disparate multitude of many a system. In this regard, we propose an algorithm that leverages the fact that we are dealing with scanned documents of handwritten text regions from across diverse domains and language settings. We observe that the text regions have consistent bounding box sizes and any large font or tiny font scenarios can be handled in preprocessing or postprocessing phases. The image subregions are smaller in size in scanned text documents compared to subregions formed by common objects in general purpose images. We propose and validate the hypothesis that a much simpler convolution neural network (CNN) having very few layers and less number of filters can be used for detecting individual subregion classes. For detection of several hundreds of classes, multiple such simpler models can be pooled to operate simultaneously on a document. The advantage of going by pools of subregion specific models is the ability to deal with incremental addition of hundreds of newer classes over time, without disturbing the previous models in the continual learning scenario. Such an approach has distinctive advantage over using a single monolithic model where subregions classes share and interfere via a bulky common neural network. We report here an efficient algorithm for building a subregion specific lightweight CNN models. The training data for the CNN proposed, requires engineering synthetic data points that consider both pattern of interest and non-patterns as well. We propose and validate the hypothesis that an image canvas in which optimal amount of pattern and non-pattern can be formulated using a means squared error loss function to influence filter for training from the data. The CNN hence trained has the capability to identify the character-object in presence of several other objects on a generalized test image of a scanned document. In this setting some of the key observations are in a CNN, learning a filter depends not only on the abundance of patterns of interest but also on the presence of a non-pattern context. Our experiments have led to some of the key observations - (i) a pattern cannot be over-expressed in isolation, (ii) a pattern cannot be under-xpressed as well, (iii) a non-pattern can be of salt and pepper type noise and finally (iv) it is sufficient to provide a non-pattern context to a modest representation of a pattern to result in strong individual sub-region class models. We have carried out studies and reported \textit{mean average precision} scores on various data sets including (1) MNIST digits(95.77), (2) E-MNIST capital alphabet(81.26), (3) EMNIST small alphabet(73.32) (4) Kannada digits(95.77), (5) Kannada letters(90.34), (6) Devanagari letters(100) (7) Telugu words(93.20) (8) Devanagari words(93.20) and also on medical prescriptions and observed high-performance metrics of mean average precision over 90%. The algorithm serves as a kernel in the automatic annotation of digital documents in diverse scenarios such as annotation of ancient manuscripts and hand-written health records.</div>


2016 ◽  
Vol 836 ◽  
pp. 37-41 ◽  
Author(s):  
Adlina Taufik Syamlan ◽  
Bambang Pramujati ◽  
Hendro Nurhadi

Robotics has lots of use in the industrial world and has lots of development since the industrial revolution, due to its qualities of high precision and accuracy. This paper is designed to display the qualities in a form of a writing robot. The aim of this study is to construct the system based on data gathered and to develop the control system based on the model. There are four aspects studied for this project, namely image processing, character recognition, image properties extraction and inverse kinematics. This paper served as discussion in modelling the robotic arm used for writing robot and generating theta for end effector position. Training data are generated through meshgrid, which is the fed through anfis.


2020 ◽  
Vol 8 (4) ◽  
pp. 304-310
Author(s):  
Windra Swastika ◽  
Ekky Rino Fajar Sakti ◽  
Mochamad Subianto

Low-resolution images can be reconstructed into high-resolution images using the Super-resolution Convolution Neural Network (SRCNN) algorithm. This study aims to improve the vehicle license plate number's recognition accuracy by generating a high-resolution vehicle image using the SRCNN. The recognition is carried out by two types of character recognition methods: Tesseract OCR and SPNet. The training data for SRCNN uses the DIV2K dataset consisting of 900 images, while the training data for character recognition uses the Chars74 dataset. The high-resolution images constructed using SRCNN can increase the average accuracy of vehicle license plate number recognition by 16.9 % using Tesseract and 13.8 % with SPNet.


2020 ◽  
Vol 3 (4) ◽  
pp. 1-10
Author(s):  
Dunya A. Abd Alhamza ◽  
Ammar D. Alaythawy

 The license plate recognition (LPR) is an important system. LPR is helpful in many ranges such as private or public entrance, parking lots, traffic control and theft surveillance. This paper, offers (LPR) consist of four main stages (preprocessing, license plate detection, segmentation, character recognition) the first stage takes a photo by the camera then preprocessing in this image. License plate detection search for matching of license plate in the image to crop the correct plate. Segmentation performed by divide the numbers separately. The last stage is number recognition by using KNN (K- nearest neighbors) is one of the simple algorithms of machine learning used for matching numbers with training data to provide a correct prediction. The system was implemented using python3.5, open-cv library and shows accuracy performance result equal to 90% by using 50 images.


2020 ◽  
Vol 124 ◽  
pp. 103611 ◽  
Author(s):  
Elias Martins Guerra Prado ◽  
Carlos Roberto de Souza Filho ◽  
Emmanuel John M. Carranza ◽  
João Gabriel Motta

Author(s):  
María José Castro-Bleda ◽  
Slavador España-Boquera ◽  
Francisco Zamora-Martínez

The field of off-line optical character recognition (OCR) has been a topic of intensive research for many years (Bozinovic, 1989; Bunke, 2003; Plamondon, 2000; Toselli, 2004). One of the first steps in the classical architecture of a text recognizer is preprocessing, where noise reduction and normalization take place. Many systems do not require a binarization step, so the images are maintained in gray-level quality. Document enhancement not only influences the overall performance of OCR systems, but it can also significantly improve document readability for human readers. In many cases, the noise of document images is heterogeneous, and a technique fitted for one type of noise may not be valid for the overall set of documents. One possible solution to this problem is to use several filters or techniques and to provide a classifier to select the appropriate one. Neural networks have been used for document enhancement (see (Egmont-Petersen, 2002) for a review of image processing with neural networks). One advantage of neural network filters for image enhancement and denoising is that a different neural filter can be automatically trained for each type of noise. This work proposes the clustering of neural network filters to avoid having to label training data and to reduce the number of filters needed by the enhancement system. An agglomerative hierarchical clustering algorithm of supervised classifiers is proposed to do this. The technique has been applied to filter out the background noise from an office (coffee stains and footprints on documents, folded sheets with degraded printed text, etc.).


Sign in / Sign up

Export Citation Format

Share Document