Dual Loss for Manga Character Recognition with Imbalanced Training Data

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.

Download Full-text

Algorithm for auto annotation of scanned documents based on subregion tiling and shallow networks

10.36227/techrxiv.14795592.v2 ◽

2021 ◽

Author(s):

Komuravelli Prashanth ◽

Kalidas Yeturu

Keyword(s):

Neural Network ◽

Character Recognition ◽

Optical Character Recognition ◽

High Performance ◽

Performance Metrics ◽

Training Data ◽

Mean Average Precision ◽

Average Precision ◽

Squared Error Loss Function ◽

Scanned Documents

<div>There are millions of scanned documents worldwide in around 4 thousand languages. Searching for information in a scanned document requires a text layer to be available and indexed. Preparation of a text layer requires recognition of character and sub-region patterns and associating with a human interpretation. Developing an optical character recognition (OCR) system for each and every language is a very difficult task if not impossible. There is a strong need for systems that add on top of the existing OCR technologies by learning from them and unifying disparate multitude of many a system. In this regard, we propose an algorithm that leverages the fact that we are dealing with scanned documents of handwritten text regions from across diverse domains and language settings. We observe that the text regions have consistent bounding box sizes and any large font or tiny font scenarios can be handled in preprocessing or postprocessing phases. The image subregions are smaller in size in scanned text documents compared to subregions formed by common objects in general purpose images. We propose and validate the hypothesis that a much simpler convolution neural network (CNN) having very few layers and less number of filters can be used for detecting individual subregion classes. For detection of several hundreds of classes, multiple such simpler models can be pooled to operate simultaneously on a document. The advantage of going by pools of subregion specific models is the ability to deal with incremental addition of hundreds of newer classes over time, without disturbing the previous models in the continual learning scenario. Such an approach has distinctive advantage over using a single monolithic model where subregions classes share and interfere via a bulky common neural network. We report here an efficient algorithm for building a subregion specific lightweight CNN models. The training data for the CNN proposed, requires engineering synthetic data points that consider both pattern of interest and non-patterns as well. We propose and validate the hypothesis that an image canvas in which optimal amount of pattern and non-pattern can be formulated using a means squared error loss function to influence filter for training from the data. The CNN hence trained has the capability to identify the character-object in presence of several other objects on a generalized test image of a scanned document. In this setting some of the key observations are in a CNN, learning a filter depends not only on the abundance of patterns of interest but also on the presence of a non-pattern context. Our experiments have led to some of the key observations - (i) a pattern cannot be over-expressed in isolation, (ii) a pattern cannot be under-xpressed as well, (iii) a non-pattern can be of salt and pepper type noise and finally (iv) it is sufficient to provide a non-pattern context to a modest representation of a pattern to result in strong individual sub-region class models. We have carried out studies and reported \textit{mean average precision} scores on various data sets including (1) MNIST digits(95.77), (2) E-MNIST capital alphabet(81.26), (3) EMNIST small alphabet(73.32) (4) Kannada digits(95.77), (5) Kannada letters(90.34), (6) Devanagari letters(100) (7) Telugu words(93.20) (8) Devanagari words(93.20) and also on medical prescriptions and observed high-performance metrics of mean average precision over 90%. The algorithm serves as a kernel in the automatic annotation of digital documents in diverse scenarios such as annotation of ancient manuscripts and hand-written health records.</div>

Download Full-text

Inverse Kinematics Modelling and Simulation for Upper Case Writing Robot Control Using ANFIS

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.836.37 ◽

2016 ◽

Vol 836 ◽

pp. 37-41 ◽

Cited By ~ 2

Author(s):

Adlina Taufik Syamlan ◽

Bambang Pramujati ◽

Hendro Nurhadi

Keyword(s):

Image Processing ◽

Character Recognition ◽

Inverse Kinematics ◽

Robot Control ◽

Industrial Revolution ◽

Training Data ◽

Robotic Arm ◽

End Effector ◽

Precision And Accuracy ◽

Case Writing

Robotics has lots of use in the industrial world and has lots of development since the industrial revolution, due to its qualities of high precision and accuracy. This paper is designed to display the qualities in a form of a writing robot. The aim of this study is to construct the system based on data gathered and to develop the control system based on the model. There are four aspects studied for this project, namely image processing, character recognition, image properties extraction and inverse kinematics. This paper served as discussion in modelling the robotic arm used for writing robot and generating theta for end effector position. Training data are generated through meshgrid, which is the fed through anfis.

Download Full-text

Coping with imbalanced training data for improved terrain prediction in autonomous outdoor robot navigation

2010 IEEE International Conference on Robotics and Automation ◽

10.1109/robot.2010.5509634 ◽

2010 ◽

Cited By ~ 4

Author(s):

Michael J Procopio ◽

Jane Mulligan ◽

Greg Grudic

Keyword(s):

Robot Navigation ◽

Training Data ◽

Imbalanced Training Data

Download Full-text

Vehicle images reconstruction using SRCNN for improving the recognition accuracy of vehicle license plate number

Jurnal Teknologi dan Sistem Komputer ◽

10.14710/jtsiskom.2020.13726 ◽

2020 ◽

Vol 8 (4) ◽

pp. 304-310

Author(s):

Windra Swastika ◽

Ekky Rino Fajar Sakti ◽

Mochamad Subianto

Keyword(s):

High Resolution ◽

Character Recognition ◽

Recognition Accuracy ◽

Super Resolution ◽

Training Data ◽

License Plate ◽

Plate Number ◽

Average Accuracy ◽

Number Recognition ◽

High Resolution Images

Low-resolution images can be reconstructed into high-resolution images using the Super-resolution Convolution Neural Network (SRCNN) algorithm. This study aims to improve the vehicle license plate number's recognition accuracy by generating a high-resolution vehicle image using the SRCNN. The recognition is carried out by two types of character recognition methods: Tesseract OCR and SPNet. The training data for SRCNN uses the DIV2K dataset consisting of 900 images, while the training data for character recognition uses the Chars74 dataset. The high-resolution images constructed using SRCNN can increase the average accuracy of vehicle license plate number recognition by 16.9 % using Tesseract and 13.8 % with SPNet.

Download Full-text

Development of Training Data for Optical Character Recognition using Deformed Printing Characters

Proceedings of The 6th IIAE International Conference on Industrial Application Engineering 2018 ◽

10.12792/iciae2018.030 ◽

2018 ◽

Author(s):

Ken Kariya ◽

Takahiro Fujishima ◽

Lifeng Zhang

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Training Data ◽

Optical Character

Download Full-text

IRAQI LICENSE PLATE RECOGNITION BASED ON MACHINE LEARNING

Iraqi Journal of Information & Communications Technology ◽

10.31987/ijict.3.4.94 ◽

2020 ◽

Vol 3 (4) ◽

pp. 1-10

Author(s):

Dunya A. Abd Alhamza ◽

Ammar D. Alaythawy

Keyword(s):

Machine Learning ◽

Traffic Control ◽

Character Recognition ◽

Training Data ◽

Correct Prediction ◽

License Plate ◽

License Plate Recognition ◽

K Nearest Neighbors ◽

License Plate Detection ◽

Detection Search

The license plate recognition (LPR) is an important system. LPR is helpful in many ranges such as private or public entrance, parking lots, traffic control and theft surveillance. This paper, offers (LPR) consist of four main stages (preprocessing, license plate detection, segmentation, character recognition) the first stage takes a photo by the camera then preprocessing in this image. License plate detection search for matching of license plate in the image to crop the correct plate. Segmentation performed by divide the numbers separately. The last stage is number recognition by using KNN (K- nearest neighbors) is one of the simple algorithms of machine learning used for matching numbers with training data to provide a correct prediction. The system was implemented using python3.5, open-cv library and shows accuracy performance result equal to 90% by using 50 images.

Download Full-text

Modeling of Cu-Au prospectivity in the Carajás mineral province (Brazil) through machine learning: Dealing with imbalanced training data

Ore Geology Reviews ◽

10.1016/j.oregeorev.2020.103611 ◽

2020 ◽

Vol 124 ◽

pp. 103611 ◽

Cited By ~ 1

Author(s):

Elias Martins Guerra Prado ◽

Carlos Roberto de Souza Filho ◽

Emmanuel John M. Carranza ◽

João Gabriel Motta

Keyword(s):

Machine Learning ◽

Training Data ◽

Carajás Mineral Province ◽

Imbalanced Training Data

Download Full-text

Progressive deep feature learning for manga character recognition via unlabeled training data

Proceedings of the ACM Turing Celebration Conference - China on - ACM TURC '19 ◽

10.1145/3321408.3322624 ◽

2019 ◽

Cited By ~ 1

Author(s):

Xiaoran Qin ◽

Yafeng Zhou ◽

Yonggang Li ◽

Siwei Wang ◽

Yongtao Wang ◽

...

Keyword(s):

Character Recognition ◽

Feature Learning ◽

Training Data ◽

Deep Feature ◽

Deep Feature Learning

Download Full-text

Behaviour-Based Clustering of Neural Networks

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch036 ◽

2011 ◽

pp. 231-235

Author(s):

María José Castro-Bleda ◽

Slavador España-Boquera ◽

Francisco Zamora-Martínez

Keyword(s):

Neural Network ◽

Neural Networks ◽

Character Recognition ◽

Optical Character Recognition ◽

Clustering Algorithm ◽

Training Data ◽

Document Images ◽

Supervised Classifiers ◽

Overall Performance ◽

Printed Text

The field of off-line optical character recognition (OCR) has been a topic of intensive research for many years (Bozinovic, 1989; Bunke, 2003; Plamondon, 2000; Toselli, 2004). One of the first steps in the classical architecture of a text recognizer is preprocessing, where noise reduction and normalization take place. Many systems do not require a binarization step, so the images are maintained in gray-level quality. Document enhancement not only influences the overall performance of OCR systems, but it can also significantly improve document readability for human readers. In many cases, the noise of document images is heterogeneous, and a technique fitted for one type of noise may not be valid for the overall set of documents. One possible solution to this problem is to use several filters or techniques and to provide a classifier to select the appropriate one. Neural networks have been used for document enhancement (see (Egmont-Petersen, 2002) for a review of image processing with neural networks). One advantage of neural network filters for image enhancement and denoising is that a different neural filter can be automatically trained for each type of noise. This work proposes the clustering of neural network filters to avoid having to label training data and to reduce the number of filters needed by the enhancement system. An agglomerative hierarchical clustering algorithm of supervised classifiers is proposed to do this. The technique has been applied to filter out the background noise from an office (coffee stains and footprints on documents, folded sheets with degraded printed text, etc.).

Download Full-text