Automated Text Detection and Recognition in Annotated Biomedical Publication Images

Images in biomedical publications often convey important information related to an article's content. When referenced properly, these images aid in clinical decision support. Annotations such as text labels and symbols, as provided by medical experts, are used to highlight regions of interest within the images. These annotations, if extracted automatically, could be used in conjunction with either the image caption text or the image citations (mentions) in the articles to improve biomedical information retrieval. In the current study, automatic detection and recognition of text labels in biomedical publication images was investigated. This paper presents both image analysis and feature-based approaches to extract and recognize specific regions of interest (text labels) within images in biomedical publications. Experiments were performed on 6515 characters extracted from text labels present in 200 biomedical publication images. These images are part of the data set from ImageCLEF 2010. Automated character recognition experiments were conducted using geometry-, region-, exemplar-, and profile-based correlation features and Fourier descriptors extracted from the characters. Correct recognition as high as 92.67% was obtained with a support vector machine classifier, compared to a 75.90% correct recognition rate with a benchmark Optical Character Recognition technique.

Download Full-text

Automated Text Detection and Recognition in Annotated Biomedical Publication Images

Medical Imaging ◽

10.4018/978-1-5225-0571-6.ch018 ◽

2017 ◽

pp. 457-489

Author(s):

Soumya De ◽

R. Joe Stanley ◽

Beibei Cheng ◽

Sameer Antani ◽

Rodney Long ◽

...

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Recognition Rate ◽

Clinical Decision ◽

Regions Of Interest ◽

Correct Recognition ◽

Support Vector ◽

Biomedical Publication ◽

Biomedical Publications ◽

Detection And Recognition

Download Full-text

Construction of Statistical SVM based Recognition Model for Handwritten Character Recognition

Journal of Information Technology and Digital World - September 2019 ◽

10.36548/jitdw.2021.2.003 ◽

2021 ◽

Vol 3 (2) ◽

pp. 92-107

Author(s):

Yasir Babiker Hamdan ◽

Sathish

Keyword(s):

Character Recognition ◽

Template Matching ◽

Optical Character Recognition ◽

Recognition Rate ◽

Banking System ◽

Developed Countries ◽

Support Vector ◽

Handwritten Character Recognition ◽

Research Article ◽

Handwritten Character

There are many applications of the handwritten character recognition (HCR) approach still exist. Reading postal addresses in various states contains different languages in any union government like India. Bank check amounts and signature verification is one of the important application of HCR in the automatic banking system in all developed countries. The optical character recognition of the documents is comparing with handwriting documents by a human. This OCR is used for translation purposes of characters from various types of files such as image, word document files. The main aim of this research article is to provide the solution for various handwriting recognition approaches such as touch input from the mobile screen and picture file. The recognition approaches performing with various methods that we have chosen in artificial neural networks and statistical methods so on and to address nonlinearly divisible issues. This research article consisting of various approaches to compare and recognize the handwriting characters from the image documents. Besides, the research paper is comparing statistical approach support vector machine (SVM) classifiers network method with statistical, template matching, structural pattern recognition, and graphical methods. It has proved Statistical SVM for OCR system performance that is providing a good result that is configured with machine learning approach. The recognition rate is higher than other methods mentioned in this research article. The proposed model has tested on a training section that contained various stylish letters and digits to learn with a higher accuracy level. We obtained test results of 91% of accuracy to recognize the characters from documents. Finally, we have discussed several future tasks of this research further.

Download Full-text

Moment invariant-based features for Jawi character recognition

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i3.pp1711-1719 ◽

2019 ◽

Vol 9 (3) ◽

pp. 1711 ◽

Cited By ~ 1

Author(s):

Fitri Arnia ◽

Khairun Saddami ◽

Khairul Munadi

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Recognition Rate ◽

Superior Performance ◽

Support Vector ◽

Unique Combination ◽

Moment Invariant ◽

Optical Character ◽

Arabic Characters ◽

Ancient Manuscripts

<p>Ancient manuscripts written in Malay-Arabic characters, which are known as "Jawi" characters, are mostly found in Malay world. Nowadays, many of the manuscripts have been digitalized. Unlike Roman letters, there is no optical character recognition (OCR) software for Jawi characters. This article proposes a new algorithm for Jawi character recognition based on Hu’s moment as an invariant feature that we call the tree root (TR) algorithm. The TR algorithm allows every Jawi character to have a unique combination of moment. Seven values of the Hu’s moment are calculated from all Jawi characters, which consist of 36 isolated, 27 initial, 27 middle, and 35 end characters; this makes a total of 125 characters. The TR algorithm was then applied to recognize these characters. To assess the TR algorithm, five characters that had been rotated to 90o and 180o and scaled with factors of 0.5 and 2 were used. Overall, the recognition rate of the TR algorithm was 90.4%; 113 out of 125 characters have a unique combination of moment values, while testing on rotated and scaled characters achieved 82.14% recognition rate. The proposed method showed a superior performance compared with the Support Vector Machine and Euclidian Distance as classifier.</p>

Download Full-text

A Simplified Research for Mathematical Expression Recognition and Its Conversion to Speech

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1008.0882s819 ◽

2019 ◽

Vol 8 (2S8) ◽

pp. 1033-1038

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Visually Impaired ◽

Recognition Rate ◽

Mathematical Expression ◽

Expression Recognition ◽

Advanced Mathematics ◽

Data Set ◽

Visually Impaired People ◽

Impaired People

The number of visually impaired people appearing for various examination is increasing every year while on the other hand, there are several blind aspirants who are willing to enrich their knowledge through higher studies. Mathematics is one of the key language (subject) for those who are willing to pursue higher studies in science stream. There is a lot of advanced Braille techniques and OCR to speech conversion software's made available to help visual impaired community to pursue their education but still the number of visually impaired students getting admitted to higher education is less. This is not because most of the data is on paper in the form of books and documents. So, there is a great need to convert information from the physical domain into the digital domain which would help the visually impaired people to read the advanced mathematics text independently. Optical Character Recognition (OCR) systems for mathematics have received considerable attention in recent years due to the tremendous need for the digitization of printed documents. Existing literature reveals that, most of the works concentrated on recognizing handwritten mathematical symbols and some works revolve around complex algorithms. This paper proposes a simple, yet efficient approach to develop an OCR system for mathematics and its conversion to speech. For Mathematical symbol recognition, Skin and Bone algorithm is proposed, which proved its efficiency on a variety of data set. The proposed methodology has been tested on 50 equations comprising various symbols such as integral, differential, square, square root and currently achieving recognition rate of 92%.

Download Full-text

CNN-based Rain Reduction in Street View Images

London Imaging Meeting ◽

10.2352/issn.2694-118x.2020.lim-12 ◽

2020 ◽

Vol 2020 (1) ◽

pp. 78-81

Author(s):

Simone Zini ◽

Simone Bianco ◽

Raimondo Schettini

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

State Of The Art ◽

Weather Conditions ◽

Specific Interest ◽

Optical Character ◽

Street View ◽

In The Wild ◽

Bad Weather ◽

Detection And Recognition

Rain removal from pictures taken under bad weather conditions is a challenging task that aims to improve the overall quality and visibility of a scene. The enhanced images usually constitute the input for subsequent Computer Vision tasks such as detection and classification. In this paper, we present a Convolutional Neural Network, based on the Pix2Pix model, for rain streaks removal from images, with specific interest in evaluating the results of the processing operation with respect to the Optical Character Recognition (OCR) task. In particular, we present a way to generate a rainy version of the Street View Text Dataset (R-SVTD) for "text detection and recognition" evaluation in bad weather conditions. Experimental results on this dataset show that our model is able to outperform the state of the art in terms of two commonly used image quality metrics, and that it is capable to improve the performances of an OCR model to detect and recognise text in the wild.

Download Full-text

From object detection to text detection and recognition: A brief evolution history of optical character recognition

Wiley Interdisciplinary Reviews Computational Statistics ◽

10.1002/wics.1547 ◽

2021 ◽

Author(s):

Haifeng Wang ◽

Changzai Pan ◽

Xiao Guo ◽

Chunlin Ji ◽

Ke Deng

Keyword(s):

Object Detection ◽

Character Recognition ◽

Optical Character Recognition ◽

Text Detection ◽

Optical Character ◽

History Of ◽

Detection And Recognition

Download Full-text

A Hybrid Swarm and Gravitation-based feature selection algorithm for handwritten Indic script classification problem

Complex & Intelligent Systems ◽

10.1007/s40747-020-00237-1 ◽

2021 ◽

Author(s):

Ritam Guha ◽

Manosij Ghosh ◽

Pawan Kumar Singh ◽

Ram Sarkar ◽

Mita Nasipuri

Keyword(s):

Feature Selection ◽

Character Recognition ◽

Optical Character Recognition ◽

Classification Problem ◽

Classification Model ◽

Support Vector ◽

Intermediate Step ◽

Hybrid Swarm ◽

Feature Vectors ◽

Indic Script

AbstractIn any multi-script environment, handwritten script classification is an unavoidable pre-requisite before the document images are fed to their respective Optical Character Recognition (OCR) engines. Over the years, this complex pattern classification problem has been solved by researchers proposing various feature vectors mostly having large dimensions, thereby increasing the computation complexity of the whole classification model. Feature Selection (FS) can serve as an intermediate step to reduce the size of the feature vectors by restricting them only to the essential and relevant features. In the present work, we have addressed this issue by introducing a new FS algorithm, called Hybrid Swarm and Gravitation-based FS (HSGFS). This algorithm has been applied over three feature vectors introduced in the literature recently—Distance-Hough Transform (DHT), Histogram of Oriented Gradients (HOG), and Modified log-Gabor (MLG) filter Transform. Three state-of-the-art classifiers, namely, Multi-Layer Perceptron (MLP), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM), are used to evaluate the optimal subset of features generated by the proposed FS model. Handwritten datasets at block, text line, and word level, consisting of officially recognized 12 Indic scripts, are prepared for experimentation. An average improvement in the range of 2–5% is achieved in the classification accuracy by utilizing only about 75–80% of the original feature vectors on all three datasets. The proposed method also shows better performance when compared to some popularly used FS models. The codes used for implementing HSGFS can be found in the following Github link: https://github.com/Ritam-Guha/HSGFS.

Download Full-text

A Structural Analysis Based Feature Extraction Method for OCR System For Myanmar Printed Document Images

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.2012010102 ◽

2012 ◽

Vol 2 (1) ◽

pp. 16-41 ◽

Cited By ~ 1

Author(s):

Htwe Pa Pa Win ◽

Phyo Thu Thu Khine ◽

Khin Nwe Ni Tun

Keyword(s):

Feature Extraction ◽

Structural Analysis ◽

Character Recognition ◽

Optical Character Recognition ◽

Extraction Method ◽

Recognition Performance ◽

Extraction Methods ◽

Support Vector ◽

Svm Classifier ◽

Feature Extraction Method

This paper proposes a new feature extraction method for off-line recognition of Myanmar printed documents. One of the most important factors to achieve high recognition performance in Optical Character Recognition (OCR) system is the selection of the feature extraction methods. Different types of existing OCR systems used various feature extraction methods because of the diversity of the scripts’ natures. One major contribution of the work in this paper is the design of logically rigorous coding based features. To show the effectiveness of the proposed method, this paper assumed the documents are successfully segmented into characters and extracted features from these isolated Myanmar characters. These features are extracted using structural analysis of the Myanmar scripts. The experimental results have been carried out using the Support Vector Machine (SVM) classifier and compare the pervious proposed feature extraction method.

Download Full-text

Performance Evaluation of Automatic Number Plate Recognition on Android Smartphone Platform

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v7i4.pp1973-1982 ◽

2017 ◽

Vol 7 (4) ◽

pp. 1973

Author(s):

Teddy Surya Gunawan ◽

Abdul Mutholib ◽

Mira Kartiwi

Keyword(s):

Character Recognition ◽

Template Matching ◽

Optical Character Recognition ◽

Processing Time ◽

Intelligent System ◽

Recognition Rate ◽

The Other ◽

Other Hand ◽

Additional Processing ◽

Artificial Neural Network Ann

<span>Automatic Number Plate Recognition (ANPR) is an intelligent system which has the capability to recognize the character on vehicle number plate. Previous researches implemented ANPR system on personal computer (PC) with high resolution camera and high computational capability. On the other hand, not many researches have been conducted on the design and implementation of ANPR in smartphone platforms which has limited camera resolution and processing speed. In this paper, various steps to optimize ANPR, including pre-processing, segmentation, and optical character recognition (OCR) using artificial neural network (ANN) and template matching, were described. The proposed ANPR algorithm was based on Tesseract and Leptonica libraries. For comparison purpose, the template matching based OCR will be compared to ANN based OCR. Performance of the proposed algorithm was evaluated on the developed Malaysian number plates’ image database captured by smartphone’s camera. Results showed that the accuracy and processing time of the proposed algorithm using template matching was 97.5% and 1.13 seconds, respectively. On the other hand, the traditional algorithm using template matching only obtained 83.7% recognition rate with 0.98 second processing time. It shows that our proposed ANPR algorithm improved the recognition rate with negligible additional processing time.</span>

Download Full-text

Search-Based Classification for Offline Tifinagh Alphabets Recognition

Advancements in Computer Vision Applications in Intelligent Systems and Multimedia Technologies - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-4444-0.ch013 ◽

2020 ◽

pp. 255-267

Author(s):

Mohammed Erritali ◽

Youssef Chouni ◽

Youssef Ouadid

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Processing Time ◽

Recognition Rate ◽

Main Difficulty ◽

Optical Character

The main difficulty in developing a successful optical character recognition (OCR) system lies in the confusion between the characters. In the case of Amazigh writing (Tifinagh alphabets), some characters have similarities based on rotation or scale. Most of the researchers attempted to solve this problem by combining multiple descriptors and / or classifiers which increased the recognition rate, but at the expense of processing time that becomes more prohibitive. Thus, reducing the confusion of characters and their recognition times is the major challenge of OCR systems. In this chapter, the authors present an off-line OCR system for Tifinagh characters.

Download Full-text