Connected component based approach for text extraction from color image

Text detection and segmentation system serves as important method for document analysis as it helps in many content based image analysis tasks. This research paper proposes a connected component technique for text extraction and character segmentation using maximally stable extremal regions (MSERs) for text line formation followed by connected components to determined separate characters. The system uses a cluster size of five which is selected by experimental evaluation for identifying characters. Sobel edge detector is used as it reduces the execution time but at the same time maintains quality of the results. The algorithm is tested along a set of JPEG, PNG and BMP images over varying features like font size, style, colour, background colour and text variation. Further the CPU time in execution of the algorithm with three different edge detectors namely prewitt, sobel and canny is observed. Text identification using MSER gave very good results whereas character segmentation gave on average 94.572% accuracy for the various test cases considered for this study.

Download Full-text

Extraction of Text From Image

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1391 ◽

2021 ◽

pp. 314-317

Author(s):

Akanksha Mate ◽

Megha Gurav ◽

Kajal Babar ◽

Gauri Raskar ◽

Prof. Prakash Kshirsagar

Keyword(s):

Mathematical Morphology ◽

Digital Image ◽

Image Binarization ◽

Connected Component ◽

Text Extraction ◽

Component Method ◽

Matlab Code ◽

Message Structure ◽

Complex Foundation ◽

Edge Based

Picture Text is the content data implanted or written in picture of various structure. Picture text can be found in caught pictures, filtered records, magazines, papers, banners and so on These picture messages are profoundly accessible these days and they are vital in addressing, depicting and moving data which help people groups in correspondence, tackling issues, accessibility, formation of new sorts of occupations, cost viability, efficiency, globalization and social hole and so forth The data from these picture archives would give higher proficiency and straightforward entry on the off chance that it is changed over to message structure. The cycle by which Image Text changed over into plain content is Text Extraction. Text Extraction is helpful in data recovering, looking, altering, recording, filing or detailing of picture text. In any case, variety of these writings because of contrasts in size, direction style, and arrangement, text is installed in complex hued archive pictures, corrupted reports picture, inferior quality picture, as well as low picture differentiation and complex foundation make issue text extraction incredibly troublesome what's more, testing one. Various strategies like Connected Component Method, Mathematical Morphology Method, Edged Based Method and Texture Based Method have been utilized beforehand, however those all have their own constraints when estimated by various boundaries like exactness, review and f- score. In this paper, text extraction from picture reports, utilizing blend of the two amazing techniques Connected Component and Edge Based Method, to improve execution and exactness of text extraction is talked about and execution is finished by incorporated MATLAB code with MATLAB/Simulink device and the proposed framework is tried by Digital Image Binarization Competition (DIBCO) 2017 dataset. At long last, the separated and perceived is changed over to discourse for legitimate use for outwardly hindered individuals.

Download Full-text

Caption Text Extraction from Color Image Based on Differential Operation and Morphological Processing

Advances in Computer and Computational Sciences - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-10-3773-3_48 ◽

2017 ◽

pp. 495-502

Author(s):

Li-qin Ji

Keyword(s):

Color Image ◽

Morphological Processing ◽

Text Extraction ◽

Differential Operation ◽

Caption Text

Download Full-text

Color image segmentation by a genetic algorithm based clustering and Connected Component Labeling

2012 24th International Conference on Microelectronics (ICM) ◽

10.1109/icm.2012.6471432 ◽

2012 ◽

Cited By ~ 2

Author(s):

Fatima Zohra Bellala Belahbib ◽

Feryel Souami

Keyword(s):

Genetic Algorithm ◽

Image Segmentation ◽

Color Image ◽

Color Image Segmentation ◽

Connected Component ◽

Connected Component Labeling

Download Full-text

Text Extraction from Scene Images through Color Image Segmentation and Statistical Distributions

International Journal of Computer Applications ◽

10.5120/15907-5088 ◽

2014 ◽

Vol 91 (9) ◽

pp. 5-8

Author(s):

Ranjit Ghoshal ◽

Bibhas Chanrda Dhara

Keyword(s):

Image Segmentation ◽

Color Image ◽

Color Image Segmentation ◽

Statistical Distributions ◽

Text Extraction

Download Full-text

A New Efficient Approach to Detect Skin in Color Image Using Bayesian Classifier and Connected Component Algorithm

Mathematical Problems in Engineering ◽

10.1155/2018/5754604 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Thao Nguyen-Trang

Keyword(s):

Posterior Probability ◽

Color Image ◽

False Positive Rate ◽

Classification Performance ◽

Bayesian Classifier ◽

Connected Components ◽

Skin Detection ◽

Connected Component ◽

Positive Rate ◽

Probability Threshold

Skin detection is an interesting problem in image processing and is an important preprocessing step for further techniques like face detection, objectionable image detection, etc. However, its performance has not really been high because of the high overlapped degree between “skin” and “nonskin” pixels. This paper proposes a new approach to improve the skin detection performance using the Bayesian classifier and connected component algorithm. Specifically, the Bayesian classifier is utilized to identify “true skin” pixels using the first posterior probability threshold, which is approximate to 1, and to identify "skin candidate" pixels using the second posterior probability threshold. Subsequently, the connected component algorithm is used to find all the connected components containing the “skin candidate” pixels. According to the fact that a skin pixel often connects with other skin pixels in an image, all pixels in a connected component are classified as “skin” if there is at least one “true skin” pixel in that connected component. It means that the “nonskin” pixels whose color is similar to skin are classified as “nonskin” when they have the posterior probabilities lower than the first posterior probability threshold and do not connect with any “true skin” pixel. This idea can help us to improve the skin classification performance, especially the false positive rate.

Download Full-text