ORDER STATISTIC FILTER (OSF): A NOVEL APPROACH TO DOCUMENT ANALYSIS
Page segmentation is one of the important and basic research subjects of document analysis. There are two major kinds of page segmentation methods, i.e. hierarchical and no-hierarchical ones. Most traditional techniques such as top–down and bottom–up approaches belong to the hierarchical method. Though these two approaches have been used till now, they are not effective for processing documents with high geometric complexity and the process of splitting document needs iterative operations which is time consuming. A non-hierarchical method called the modified fractal signature (MFS) was presented in recent years. It can overcome the above weaknesses, however the MFS needs to calculate modified fractal signature which makes the theory very complex. In this thesis, we present a new page segmentation approach: Median Order Statistic Filter (MedOSF) — Maximum Order Statistic Filter (MaxOSF) approach which is more direct and much simpler. We use the MedOSF to remove the salt–pepper noise of the document and use the MaxOSF to do the page segmentation. In practice, they not only can adaptively process the documents with high geometrical complexity, but also save a lot of computing time.