Connected component based approach for text extraction from color image

Author(s):  
Kamrul Hasan Talukder ◽  
Tania Mallick
2000 ◽  
Author(s):  
WenPing Liu ◽  
Hui Su ◽  
Chang Y. Chi

2016 ◽  
Vol 7 (1) ◽  
pp. 41-57 ◽  
Author(s):  
Nitigya Sambyal ◽  
Pawanesh Abrol

Text detection and segmentation system serves as important method for document analysis as it helps in many content based image analysis tasks. This research paper proposes a connected component technique for text extraction and character segmentation using maximally stable extremal regions (MSERs) for text line formation followed by connected components to determined separate characters. The system uses a cluster size of five which is selected by experimental evaluation for identifying characters. Sobel edge detector is used as it reduces the execution time but at the same time maintains quality of the results. The algorithm is tested along a set of JPEG, PNG and BMP images over varying features like font size, style, colour, background colour and text variation. Further the CPU time in execution of the algorithm with three different edge detectors namely prewitt, sobel and canny is observed. Text identification using MSER gave very good results whereas character segmentation gave on average 94.572% accuracy for the various test cases considered for this study.


Author(s):  
Akanksha Mate ◽  
Megha Gurav ◽  
Kajal Babar ◽  
Gauri Raskar ◽  
Prof. Prakash Kshirsagar

Picture Text is the content data implanted or written in picture of various structure. Picture text can be found in caught pictures, filtered records, magazines, papers, banners and so on These picture messages are profoundly accessible these days and they are vital in addressing, depicting and moving data which help people groups in correspondence, tackling issues, accessibility, formation of new sorts of occupations, cost viability, efficiency, globalization and social hole and so forth The data from these picture archives would give higher proficiency and straightforward entry on the off chance that it is changed over to message structure. The cycle by which Image Text changed over into plain content is Text Extraction. Text Extraction is helpful in data recovering, looking, altering, recording, filing or detailing of picture text. In any case, variety of these writings because of contrasts in size, direction style, and arrangement, text is installed in complex hued archive pictures, corrupted reports picture, inferior quality picture, as well as low picture differentiation and complex foundation make issue text extraction incredibly troublesome what's more, testing one. Various strategies like Connected Component Method, Mathematical Morphology Method, Edged Based Method and Texture Based Method have been utilized beforehand, however those all have their own constraints when estimated by various boundaries like exactness, review and f- score. In this paper, text extraction from picture reports, utilizing blend of the two amazing techniques Connected Component and Edge Based Method, to improve execution and exactness of text extraction is talked about and execution is finished by incorporated MATLAB code with MATLAB/Simulink device and the proposed framework is tried by Digital Image Binarization Competition (DIBCO) 2017 dataset. At long last, the separated and perceived is changed over to discourse for legitimate use for outwardly hindered individuals.


2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Thao Nguyen-Trang

Skin detection is an interesting problem in image processing and is an important preprocessing step for further techniques like face detection, objectionable image detection, etc. However, its performance has not really been high because of the high overlapped degree between “skin” and “nonskin” pixels. This paper proposes a new approach to improve the skin detection performance using the Bayesian classifier and connected component algorithm. Specifically, the Bayesian classifier is utilized to identify “true skin” pixels using the first posterior probability threshold, which is approximate to 1, and to identify "skin candidate" pixels using the second posterior probability threshold. Subsequently, the connected component algorithm is used to find all the connected components containing the “skin candidate” pixels. According to the fact that a skin pixel often connects with other skin pixels in an image, all pixels in a connected component are classified as “skin” if there is at least one “true skin” pixel in that connected component. It means that the “nonskin” pixels whose color is similar to skin are classified as “nonskin” when they have the posterior probabilities lower than the first posterior probability threshold and do not connect with any “true skin” pixel. This idea can help us to improve the skin classification performance, especially the false positive rate.


Sign in / Sign up

Export Citation Format

Share Document