Preselection of support vector candidates by relative neighborhood graph for large-scale character recognition

AbstractTraining machine learning models on classical computers is usually a time and compute intensive process. With Moore’s law nearing its inevitable end and an ever-increasing demand for large-scale data analysis using machine learning, we must leverage non-conventional computing paradigms like quantum computing to train machine learning models efficiently. Adiabatic quantum computers can approximately solve NP-hard problems, such as the quadratic unconstrained binary optimization (QUBO), faster than classical computers. Since many machine learning problems are also NP-hard, we believe adiabatic quantum computers might be instrumental in training machine learning models efficiently in the post Moore’s law era. In order to solve problems on adiabatic quantum computers, they must be formulated as QUBO problems, which is very challenging. In this paper, we formulate the training problems of three machine learning models—linear regression, support vector machine (SVM) and balanced k-means clustering—as QUBO problems, making them conducive to be trained on adiabatic quantum computers. We also analyze the computational complexities of our formulations and compare them to corresponding state-of-the-art classical approaches. We show that the time and space complexities of our formulations are better (in case of SVM and balanced k-means clustering) or equivalent (in case of linear regression) to their classical counterparts.

Download Full-text

A Hybrid Swarm and Gravitation-based feature selection algorithm for handwritten Indic script classification problem

Complex & Intelligent Systems ◽

10.1007/s40747-020-00237-1 ◽

2021 ◽

Author(s):

Ritam Guha ◽

Manosij Ghosh ◽

Pawan Kumar Singh ◽

Ram Sarkar ◽

Mita Nasipuri

Keyword(s):

Feature Selection ◽

Character Recognition ◽

Optical Character Recognition ◽

Classification Problem ◽

Classification Model ◽

Support Vector ◽

Intermediate Step ◽

Hybrid Swarm ◽

Feature Vectors ◽

Indic Script

AbstractIn any multi-script environment, handwritten script classification is an unavoidable pre-requisite before the document images are fed to their respective Optical Character Recognition (OCR) engines. Over the years, this complex pattern classification problem has been solved by researchers proposing various feature vectors mostly having large dimensions, thereby increasing the computation complexity of the whole classification model. Feature Selection (FS) can serve as an intermediate step to reduce the size of the feature vectors by restricting them only to the essential and relevant features. In the present work, we have addressed this issue by introducing a new FS algorithm, called Hybrid Swarm and Gravitation-based FS (HSGFS). This algorithm has been applied over three feature vectors introduced in the literature recently—Distance-Hough Transform (DHT), Histogram of Oriented Gradients (HOG), and Modified log-Gabor (MLG) filter Transform. Three state-of-the-art classifiers, namely, Multi-Layer Perceptron (MLP), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM), are used to evaluate the optimal subset of features generated by the proposed FS model. Handwritten datasets at block, text line, and word level, consisting of officially recognized 12 Indic scripts, are prepared for experimentation. An average improvement in the range of 2–5% is achieved in the classification accuracy by utilizing only about 75–80% of the original feature vectors on all three datasets. The proposed method also shows better performance when compared to some popularly used FS models. The codes used for implementing HSGFS can be found in the following Github link: https://github.com/Ritam-Guha/HSGFS.

Download Full-text

Large-Scale Landslide Displacement Rate Prediction Based on Multi-Factor Support Vector Regression Machine

Applied Sciences ◽

10.3390/app11041381 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1381

Author(s):

Xiuzhen Li ◽

Shengwei Li

Keyword(s):

Support Vector Regression ◽

Water Level ◽

Large Scale ◽

Displacement Rate ◽

Support Vector ◽

Single Factor ◽

Reservoir Water ◽

Reservoir Water Level ◽

Depth Analysis ◽

Three Factor

Forecasting the development of large-scale landslides is a contentious and complicated issue. In this study, we put forward the use of multi-factor support vector regression machines (SVRMs) for predicting the displacement rate of a large-scale landslide. The relative relationships between the main monitoring factors were analyzed based on the long-term monitoring data of the landslide and the grey correlation analysis theory. We found that the average correlation between landslide displacement and rainfall is 0.894, and the correlation between landslide displacement and reservoir water level is 0.338. Finally, based on an in-depth analysis of the basic characteristics, influencing factors, and development of landslides, three main factors (i.e., the displacement rate, reservoir water level, and rainfall) were selected to build single-factor, two-factor, and three-factor SVRM models. The key parameters of the models were determined using a grid-search method, and the models showed high accuracies. Moreover, the accuracy of the two-factor SVRM model (displacement rate and rainfall) is the highest with the smallest standard error (RMSE) of 0.00614; it is followed by the three-factor and single-factor SVRM models, the latter of which has the lowest prediction accuracy, with the largest RMSE of 0.01644.

Download Full-text

Large-Scale Twin Parametric Support Vector Machine Using Pinball Loss Function

IEEE Transactions on Systems Man and Cybernetics Systems ◽

10.1109/tsmc.2019.2896642 ◽

2019 ◽

pp. 1-17 ◽

Cited By ~ 4

Author(s):

Sweta Sharma ◽

Reshma Rastogi ◽

Suresh Chandra

Keyword(s):

Support Vector Machine ◽

Loss Function ◽

Large Scale ◽

Support Vector ◽

Pinball Loss

Download Full-text

A Structural Analysis Based Feature Extraction Method for OCR System For Myanmar Printed Document Images

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.2012010102 ◽

2012 ◽

Vol 2 (1) ◽

pp. 16-41 ◽

Cited By ~ 1

Author(s):

Htwe Pa Pa Win ◽

Phyo Thu Thu Khine ◽

Khin Nwe Ni Tun

Keyword(s):

Feature Extraction ◽

Structural Analysis ◽

Character Recognition ◽

Optical Character Recognition ◽

Extraction Method ◽

Recognition Performance ◽

Extraction Methods ◽

Support Vector ◽

Svm Classifier ◽

Feature Extraction Method

This paper proposes a new feature extraction method for off-line recognition of Myanmar printed documents. One of the most important factors to achieve high recognition performance in Optical Character Recognition (OCR) system is the selection of the feature extraction methods. Different types of existing OCR systems used various feature extraction methods because of the diversity of the scripts’ natures. One major contribution of the work in this paper is the design of logically rigorous coding based features. To show the effectiveness of the proposed method, this paper assumed the documents are successfully segmented into characters and extracted features from these isolated Myanmar characters. These features are extracted using structural analysis of the Myanmar scripts. The experimental results have been carried out using the Support Vector Machine (SVM) classifier and compare the pervious proposed feature extraction method.

Download Full-text

Land Cover Classification of Nine Perennial Crops Using Sentinel-1 and -2 Data

Remote Sensing ◽

10.3390/rs12010096 ◽

2019 ◽

Vol 12 (1) ◽

pp. 96 ◽

Cited By ~ 6

Author(s):

James Brinkhoff ◽

Justin Vardanega ◽

Andrew J. Robson

Keyword(s):

Land Cover ◽

Large Scale ◽

Satellite Image ◽

Resource Planning ◽

Support Vector ◽

Perennial Crop ◽

Perennial Crops ◽

Object Based ◽

Rbf Kernel

Land cover mapping of intensive cropping areas facilitates an enhanced regional response to biosecurity threats and to natural disasters such as drought and flooding. Such maps also provide information for natural resource planning and analysis of the temporal and spatial trends in crop distribution and gross production. In this work, 10 meter resolution land cover maps were generated over a 6200 km2 area of the Riverina region in New South Wales (NSW), Australia, with a focus on locating the most important perennial crops in the region. The maps discriminated between 12 classes, including nine perennial crop classes. A satellite image time series (SITS) of freely available Sentinel-1 synthetic aperture radar (SAR) and Sentinel-2 multispectral imagery was used. A segmentation technique grouped spectrally similar adjacent pixels together, to enable object-based image analysis (OBIA). K-means unsupervised clustering was used to filter training points and classify some map areas, which improved supervised classification of the remaining areas. The support vector machine (SVM) supervised classifier with radial basis function (RBF) kernel gave the best results among several algorithms trialled. The accuracies of maps generated using several combinations of the multispectral and radar bands were compared to assess the relative value of each combination. An object-based post classification refinement step was developed, enabling optimization of the tradeoff between producers’ accuracy and users’ accuracy. Accuracy was assessed against randomly sampled segments, and the final map achieved an overall count-based accuracy of 84.8% and area-weighted accuracy of 90.9%. Producers’ accuracies for the perennial crop classes ranged from 78 to 100%, and users’ accuracies ranged from 63 to 100%. This work develops methods to generate detailed and large-scale maps that accurately discriminate between many perennial crops and can be updated frequently.

Download Full-text

A Compound Wind Power Forecasting Strategy Based on Clustering, Two-Stage Decomposition, Parameter Optimization, and Optimal Combination of Multiple Machine Learning Approaches

Energies ◽

10.3390/en12183586 ◽

2019 ◽

Vol 12 (18) ◽

pp. 3586 ◽

Cited By ~ 1

Author(s):

Sizhou Sun ◽

Jingqi Fu ◽

Ang Li

Keyword(s):

Wind Power ◽

Large Scale ◽

Wind Farm ◽

Search Algorithm ◽

Interpolation Method ◽

Least Square ◽

Support Vector ◽

Two Stage ◽

Wind Power Forecasting ◽

Power Forecasting

Given the large-scale exploitation and utilization of wind power, the problems caused by the high stochastic and random characteristics of wind speed make researchers develop more reliable and precise wind power forecasting (WPF) models. To obtain better predicting accuracy, this study proposes a novel compound WPF strategy by optimal integration of four base forecasting engines. In the forecasting process, density-based spatial clustering of applications with noise (DBSCAN) is firstly employed to identify meaningful information and discard the abnormal wind power data. To eliminate the adverse influence of the missing data on the forecasting accuracy, Lagrange interpolation method is developed to get the corrected values of the missing points. Then, the two-stage decomposition (TSD) method including ensemble empirical mode decomposition (EEMD) and wavelet transform (WT) is utilized to preprocess the wind power data. In the decomposition process, the empirical wind power data are disassembled into different intrinsic mode functions (IMFs) and one residual (Res) by EEMD, and the highest frequent time series IMF1 is further broken into different components by WT. After determination of the input matrix by a partial autocorrelation function (PACF) and normalization into [0, 1], these decomposed components are used as the input variables of all the base forecasting engines, including least square support vector machine (LSSVM), wavelet neural networks (WNN), extreme learning machine (ELM) and autoregressive integrated moving average (ARIMA), to make the multistep WPF. To avoid local optima and improve the forecasting performance, the parameters in LSSVM, ELM, and WNN are tuned by backtracking search algorithm (BSA). On this basis, BSA algorithm is also employed to optimize the weighted coefficients of the individual forecasting results that produced by the four base forecasting engines to generate an ensemble of the forecasts. In the end, case studies for a certain wind farm in China are carried out to assess the proposed forecasting strategy.

Download Full-text

Preselection of support vector candidates by relative neighborhood graph for large-scale character recognition

Analyzing the Distribution of a Large-Scale Character Pattern Set Using Relative Neighborhood Graph

Visualizing the Distribution of a Large-Scale Pattern Set using Compressed Relative Neighborhood Graph

LocaToR: Locating Passive RFID Tags with the Relative Neighborhood Graph

QUBO formulations for training machine learning models

A Hybrid Swarm and Gravitation-based feature selection algorithm for handwritten Indic script classification problem

Large-Scale Landslide Displacement Rate Prediction Based on Multi-Factor Support Vector Regression Machine

Large-Scale Twin Parametric Support Vector Machine Using Pinball Loss Function

A Structural Analysis Based Feature Extraction Method for OCR System For Myanmar Printed Document Images

Land Cover Classification of Nine Perennial Crops Using Sentinel-1 and -2 Data

A Compound Wind Power Forecasting Strategy Based on Clustering, Two-Stage Decomposition, Parameter Optimization, and Optimal Combination of Multiple Machine Learning Approaches

Export Citation Format