Text and metadata extraction from scanned Arabic documents using support vector machines

Text information in scanned documents becomes accessible only when extracted and interpreted by a text recognizer. For a recognizer to work successfully, it must have detailed location information about the regions of the document images that it is asked to analyse. It will need focus on page regions with text skipping non-text regions that include illustrations or photographs. However, text recognizers do not work as logical analyzers. Logical layout analysis automatically determines the function of a document text region, that is, it labels each region as a title, paragraph, or caption, and so on, and thus is an essential part of a document understanding system. In the past, rule-based algorithms have been used to conduct logical layout analysis, using limited size data sets. We here instead focus on supervised learning methods for logical layout analysis. We describe LABA, a system based on multiple support vector machines to perform logical Layout Analysis of scanned Books pages in Arabic. The system detects the function of a text region based on the analysis of various images features and a voting mechanism. For a baseline comparison, we implemented an older but state-of-the-art neural network method. We evaluated LABA using a data set of scanned pages from illustrated Arabic books and obtained high recall and precision values. We also found that the F-measure of LABA is higher for five of the tested six classes compared to the state-of-the-art method.

Download Full-text

LABA: Logical Layout Analysis of Book Page Images in Arabic Using Multiple Support Vector Machines

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR) ◽

10.1109/asar.2018.8480095 ◽

2018 ◽

Author(s):

Wenda Qin ◽

Randa Elanwar ◽

Margrit Betke

Keyword(s):

Support Vector Machines ◽

Support Vector ◽

Layout Analysis ◽

Vector Machines ◽

Multiple Support ◽

Multiple Support Vector Machines

Download Full-text

Generalized Support Vector Machines (GSVMs) model for real-world time series forecasting

10.21203/rs.3.rs-180407/v1 ◽

2021 ◽

Author(s):

Mehrnaz Ahmadi ◽

Mehdi Khashei

Keyword(s):

Support Vector Machine ◽

Support Vector Machines ◽

Real World ◽

Support Vector ◽

Data Sets ◽

Fuzzy Support Vector Machine ◽

Data Set ◽

Proposed Model ◽

Vector Machines ◽

Forecasting Performance

Abstract Support vector machines (SVMs) are one of the most popular and widely-used approaches in modeling. Various kinds of SVM models have been developed in the literature of prediction and classification in order to cover different purposes. Fuzzy and crisp support vector machines are a well-known branch of modeling approaches that frequently applied for certain and uncertain modeling, respectively. However, each of these models can only be efficiently used in its specified domain and cannot yield appropriate and accurate results if the opposite situations have occurred. While the real-world systems and data sets often contain both certain and uncertain patterns that are complicatedly mixed together and need to be simultaneously modeled. In this paper, a generalized support vector machine (GSVM) is proposed that can simultaneously benefit the unique advantages of certain and uncertain versions of the traditional support vector machines in their own specialized categories. In the proposed model, the underlying data set is first categorized into two classes of certain and uncertain patterns. Then, certain patterns are modeled by a support vector machine, and uncertain patterns are modeled by a fuzzy support vector machine. After that, the function of the relationship, as well as the relative importance of each component, are estimated by another support vector machine, and subsequently, the final forecasts of the proposed model are calculated. Empirical results of wind speed forecasting indicate that the proposed method not only can achieve more accurate results than support vector machines (SVMs) and fuzzy support vector machines (FSVMs) but also can yield better forecasting performance than traditional fuzzy and nonfuzzy single models and traditional preprocessing-based hybrid models of SVMs.

Download Full-text

Deep Features for Training Support Vector Machines

Journal of Imaging ◽

10.3390/jimaging7090177 ◽

2021 ◽

Vol 7 (9) ◽

pp. 177

Author(s):

Loris Nanni ◽

Stefano Ghidoni ◽

Sheryl Brahnam

Keyword(s):

Computer Vision ◽

Support Vector Machines ◽

Vision System ◽

Image Data ◽

Support Vector ◽

Data Sets ◽

Data Set ◽

Training Support ◽

Reduction Techniques ◽

Vector Machines

Features play a crucial role in computer vision. Initially designed to detect salient elements by means of handcrafted algorithms, features now are often learned using different layers in convolutional neural networks (CNNs). This paper develops a generic computer vision system based on features extracted from trained CNNs. Multiple learned features are combined into a single structure to work on different image classification tasks. The proposed system was derived by testing several approaches for extracting features from the inner layers of CNNs and using them as inputs to support vector machines that are then combined by sum rule. Several dimensionality reduction techniques were tested for reducing the high dimensionality of the inner layers so that they can work with SVMs. The empirically derived generic vision system based on applying a discrete cosine transform (DCT) separately to each channel is shown to significantly boost the performance of standard CNNs across a large and diverse collection of image data sets. In addition, an ensemble of different topologies taking the same DCT approach and combined with global mean thresholding pooling obtained state-of-the-art results on a benchmark image virus data set.

Download Full-text

Constructing Multiple Support Vector Machines Ensemble Based on Fuzzy Integral and Rough Reducts

2007 2nd IEEE Conference on Industrial Electronics and Applications ◽

10.1109/iciea.2007.4318607 ◽

2007 ◽

Cited By ~ 6

Author(s):

Yi-Zhuo Zhang ◽

Chun-Mei Liu ◽

Liang-Kuan Zhu ◽

Qing-Lei Hu

Keyword(s):

Support Vector Machines ◽

Support Vector ◽

Fuzzy Integral ◽

Vector Machines ◽

Multiple Support ◽

Multiple Support Vector Machines

Download Full-text

Contact Lens Classification by Using Segmented Lens Boundary Features

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v11.i3.pp1129-1135 ◽

2018 ◽

Vol 11 (3) ◽

pp. 1129

Author(s):

Nur Ariffin Mohd Zin ◽

Hishammuddin Asmuni ◽

Haza Nuzly Abdul Hamed ◽

Razib M. Othman ◽

Shahreen Kasim ◽

...

Keyword(s):

Support Vector Machines ◽

Contact Lens ◽

State Of The Art ◽

Classification Method ◽

Support Vector ◽

Local Descriptors ◽

Iris Image ◽

Vector Machines ◽

False Reject Rate ◽

Better Than

Recent studies have shown that the wearing of soft lens may lead to performance degradation with the increase of false reject rate. However, detecting the presence of soft lens is a non-trivial task as its texture that almost indiscernible. In this work, we proposed a classification method to identify the existence of soft lens in iris image. Our proposed method starts with segmenting the lens boundary on top of the sclera region. Then, the segmented boundary is used as features and extracted by local descriptors. These features are then trained and classified using Support Vector Machines. This method was tested on Notre Dame Cosmetic Contact Lens 2013 database. Experiment showed that the proposed method performed better than state of the art methods.

Download Full-text

Multiple Support Vector Machines for Binary Text Classification Based on Sliding Window Technique

Communications in Computer and Information Science - Data Mining ◽

10.1007/978-981-13-6661-1_2 ◽

2019 ◽

pp. 17-29

Author(s):

Aisha Rashed Albqmi ◽

Yuefeng Li ◽

Yue Xu

Keyword(s):

Support Vector Machines ◽

Text Classification ◽

Sliding Window ◽

Support Vector ◽

Window Technique ◽

Vector Machines ◽

Multiple Support ◽

Multiple Support Vector Machines

Download Full-text

Linear Support Vector Machines for Prediction of Student Performance in School-Based Education

Mathematical Problems in Engineering ◽

10.1155/2020/4761468 ◽

2020 ◽

Vol 2020 ◽

pp. 1-7

Author(s):

Nalindren Naicker ◽

Timothy Adeliyi ◽

Jeanette Wing

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Student Performance ◽

State Of The Art ◽

Learning Algorithms ◽

The State ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Support Vector ◽

Vector Machines

Educational Data Mining (EDM) is a rich research field in computer science. Tools and techniques in EDM are useful to predict student performance which gives practitioners useful insights to develop appropriate intervention strategies to improve pass rates and increase retention. The performance of the state-of-the-art machine learning classifiers is very much dependent on the task at hand. Investigating support vector machines has been used extensively in classification problems; however, the extant of literature shows a gap in the application of linear support vector machines as a predictor of student performance. The aim of this study was to compare the performance of linear support vector machines with the performance of the state-of-the-art classical machine learning algorithms in order to determine the algorithm that would improve prediction of student performance. In this quantitative study, an experimental research design was used. Experiments were set up using feature selection on a publicly available dataset of 1000 alpha-numeric student records. Linear support vector machines benchmarked with ten categorical machine learning algorithms showed superior performance in predicting student performance. The results of this research showed that features like race, gender, and lunch influence performance in mathematics whilst access to lunch was the primary factor which influences reading and writing performance.

Download Full-text