Multi-choice wavelet thresholding based binary classification method

Data mining is one of the most effective statistical methodologies to investigate a variety of problems in areas including pattern recognition, machine learning, bioinformatics, chemometrics, and statistics. In particular, statistically-sophisticated procedures that emphasize on reliability of results and computational efficiency are required for the analysis of high-dimensional data. Optimization principles can play a significant role in the rationalization and validation of specialized data mining procedures. This paper presents a novel methodology which is Multi-Choice Wavelet Thresholding (MCWT) based three-step methodology consists of three processes: perception (dimension reduction), decision (feature ranking), and cognition (model selection). In these steps three concepts known as wavelet thresholding, support vector machines for classification and information complexity are integrated to evaluate learning models. Three published data sets are used to illustrate the proposed methodology. Additionally, performance comparisons with recent and widely applied methods are shown.

Download Full-text

Differential Privacy Principal Component Analysis for Support Vector Machines

Security and Communication Networks ◽

10.1155/2021/5542283 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Yuxian Huang ◽

Geng Yang ◽

Yahong Xu ◽

Hao Zhou

Keyword(s):

Principal Component Analysis ◽

Classification Accuracy ◽

Differential Privacy ◽

Principal Component ◽

Component Analysis ◽

High Dimensional ◽

Support Vector ◽

Data Sets ◽

Vector Machines ◽

Fast Classification

In big data era, massive and high-dimensional data is produced at all times, increasing the difficulty of analyzing and protecting data. In this paper, in order to realize dimensionality reduction and privacy protection of data, principal component analysis (PCA) and differential privacy (DP) are combined to handle these data. Moreover, support vector machine (SVM) is used to measure the availability of processed data in our paper. Specifically, we introduced differential privacy mechanisms at different stages of the algorithm PCA-SVM and obtained the algorithms DPPCA-SVM and PCADP-SVM. Both algorithms satisfy ε , 0 -DP while achieving fast classification. In addition, we evaluate the performance of two algorithms in terms of noise expectation and classification accuracy from the perspective of theoretical proof and experimental verification. To verify the performance of DPPCA-SVM, we also compare our DPPCA-SVM with other algorithms. Results show that DPPCA-SVM provides excellent utility for different data sets despite guaranteeing stricter privacy.

Download Full-text

Two New Decomposition Algorithms for Training Bound-Constrained Support Vector Machines*

Foundations of Computing and Decision Sciences ◽

10.1515/fcds-2015-0005 ◽

2015 ◽

Vol 40 (1) ◽

pp. 67-86 ◽

Cited By ~ 1

Author(s):

Lingfeng Niu ◽

Ruizhi Zhou ◽

Xi Zhao ◽

Yong Shi

Keyword(s):

Binary Classification ◽

Gradient Algorithm ◽

Support Vector ◽

Data Sets ◽

Decomposition Algorithms ◽

Public Data ◽

Vector Machines ◽

Working Set ◽

Derivative Information ◽

New Algorithms

Abstract Bound-constrained Support Vector Machine(SVM) is one of the stateof- art model for binary classification. The decomposition method is currently one of the major methods for training SVMs, especially when the nonlinear kernel is used. In this paper, we proposed two new decomposition algorithms for training bound-constrained SVMs. Projected gradient algorithm and interior point method are combined together to solve the quadratic subproblem effciently. The main difference between the two algorithms is the way of choosing working set. The first one only uses first order derivative information of the model for simplicity. The second one incorporate part of second order information into the process of working set selection, besides the gradient. Both algorithms are proved to be global convergent in theory. New algorithms is compared with the famous package BSVM. Numerical experiments on several public data sets validate the effciency of the proposed methods.

Download Full-text

Effectiveness of Support Vector Machines in Medical Data mining

Journal of Communications Software and Systems ◽

10.24138/jcomss.v11i1.114 ◽

2015 ◽

Vol 11 (1) ◽

pp. 25 ◽

Cited By ~ 14

Author(s):

Padmavathi Janardhanan ◽

Heena L. ◽

Fathima Sabika

Keyword(s):

Data Mining ◽

Medical Data ◽

Support Vector ◽

Svm Classifier ◽

Data Sets ◽

Medical Data Mining ◽

Rbf Network ◽

Vector Machines ◽

Hidden Knowledge ◽

Using Data

The idea of medical data mining is to extract hidden knowledge in medical field using data mining techniques. One of the positive aspects is to discover the important patterns. It is possible to identify patterns even if we do not have fully understood the casual mechanisms behind those patterns. In this case, data mining prepares the ability of research and discovery that may not have been evident. This paper analyzes the effectiveness of SVM, the most popular classification techniques in classifying medical datasets. This paper analyses the performance of the Naïve Bayes classifier, RBF network and SVM Classifier. The performance of predictive model is analysed with different medical datasets in predicting diseases is recorded and compared. The datasets were of binary class and each dataset had different number of attributes. The datasets include heart datasets, cancer and diabetes datasets. It is observed that SVM classifier produces better percentage of accuracy in classification. The work has been implemented in WEKA environment and obtained results show that SVM is the most robust and effective classifier for medical data sets.

Download Full-text

Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines

Information Sciences ◽

10.1016/j.ins.2014.07.015 ◽

2014 ◽

Vol 286 ◽

pp. 228-246 ◽

Cited By ~ 143

Author(s):

Sebastián Maldonado ◽

Richard Weber ◽

Fazel Famili

Keyword(s):

Feature Selection ◽

Support Vector Machines ◽

Imbalanced Data ◽

High Dimensional ◽

Support Vector ◽

Data Sets ◽

Imbalanced Data Sets ◽

Vector Machines ◽

Selection For

Download Full-text

A Preliminary Framework to Fight Tax Evasion in the Home Renovation Market

Advances in Data Mining and Database Management - Intelligent Analytics With Advanced Multi-Industry Applications ◽

10.4018/978-1-7998-4963-6.ch015 ◽

2021 ◽

pp. 304-325

Author(s):

Cataldo Zuccaro ◽

Michel Plaisent ◽

Prosper Bernard

Keyword(s):

Neural Network ◽

Data Mining ◽

Predictive Modeling ◽

Tax Evasion ◽

Predictive Models ◽

Support Vector ◽

Government Agencies ◽

Data Sets ◽

Network Support ◽

Vector Machines

This chapter presents a preliminary framework to tackle tax evasion in the field of residential renovation. This industry plays a major role in economic development and employment growth. Tax evasion and fraud are extremely difficult to combat in the industry since it is characterized by a large number of stakeholders (manufacturers, retailers, tradesmen, and households) generating complex transactional dynamics that often defy attempts to deploy transactional analytics to detect anomalies, fraud, and tax evasion. This chapter proposes a framework to apply transactional analytics and data mining to develop standard measures and predictive models to detect fraud and tax evasion. Combining big data sets, cross-referencing, and predictive modeling (i.e., anomaly detection, artificial neural network support vector machines, Bayesian network, and association rules) can assist government agencies to combat highly stealth tax evasion and fraud in the residential renovation.

Download Full-text

CLASSIFICATION OF HIGH-DIMENSIONAL MICROARRAY DATA WITH A TWO-STEP PROCEDURE VIA A WILCOXON CRITERION AND MULTILAYER PERCEPTRON

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026811002969 ◽

2011 ◽

Vol 10 (01) ◽

pp. 1-14

Author(s):

VLADIMIR NIKULIN ◽

TIAN-HSIANG HUANG ◽

GEOFFREY J. MCLACHLAN

Keyword(s):

Data Mining ◽

Feature Selection ◽

High Dimensional ◽

Second Step ◽

Support Vector ◽

Step Procedure ◽

Leave One Out ◽

Natural Combination ◽

Feature Selection Techniques

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.

Download Full-text

A comparison study: Support vector machines for binary classification in machine learning

2011 4th International Conference on Biomedical Engineering and Informatics (BMEI) ◽

10.1109/bmei.2011.6098517 ◽

2011 ◽

Cited By ~ 4

Author(s):

Wencai Zeng ◽

Jiong Jia ◽

Zhonglong Zheng ◽

Chenmao Xie ◽

Li Guo

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Binary Classification ◽

Support Vector ◽

Comparison Study ◽

Vector Machines ◽

Study Support

Download Full-text

Nondegenerate Piecewise Linear Systems: A Finite Newton Algorithm and Applications in Machine Learning

Neural Computation ◽

10.1162/neco_a_00241 ◽

2012 ◽

Vol 24 (4) ◽

pp. 1047-1084 ◽

Cited By ~ 2

Author(s):

Xiao-Tong Yuan ◽

Shuicheng Yan

Keyword(s):

Linear Systems ◽

Optimization Problems ◽

Piecewise Linear ◽

Optimization Methods ◽

Coefficient Matrix ◽

Learning Problems ◽

Support Vector ◽

Data Sets ◽

Piecewise Linear Systems ◽

Vector Machines

We investigate Newton-type optimization methods for solving piecewise linear systems (PLSs) with nondegenerate coefficient matrix. Such systems arise, for example, from the numerical solution of linear complementarity problem, which is useful to model several learning and optimization problems. In this letter, we propose an effective damped Newton method, PLS-DN, to find the exact (up to machine precision) solution of nondegenerate PLSs. PLS-DN exhibits provable semiiterative property, that is, the algorithm converges globally to the exact solution in a finite number of iterations. The rate of convergence is shown to be at least linear before termination. We emphasize the applications of our method in modeling, from a novel perspective of PLSs, some statistical learning problems such as box-constrained least squares, elitist Lasso (Kowalski & Torreesani, 2008 ), and support vector machines (Cortes & Vapnik, 1995 ). Numerical results on synthetic and benchmark data sets are presented to demonstrate the effectiveness and efficiency of PLS-DN on these problems.

Download Full-text

Gaussian Processes for Classification: Mean-Field Algorithms

Neural Computation ◽

10.1162/089976600300014881 ◽

2000 ◽

Vol 12 (11) ◽

pp. 2655-2684 ◽

Cited By ~ 91

Author(s):

Manfred Opper ◽

Ole Winther

Keyword(s):

Support Vector Machines ◽

Gaussian Processes ◽

Disordered Systems ◽

Binary Classification ◽

Computational Cost ◽

Mean Field ◽

Strong Support ◽

Support Vector ◽

Vector Machines ◽

Leave One Out

We derive a mean-field algorithm for binary classification with gaussian processes that is based on the TAP approach originally proposed in statistical physics of disordered systems. The theory also yields an approximate leave-one-out estimator for the generalization error, which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler “naive” mean-field theory and support vector machines (SVMs) as limiting cases. For both mean-field algorithms and support vector machines, simulation results for three small benchmark data sets are presented. They show that one may get state-of-the-art performance by using the leave-one-out estimator for model selection and the built-in leave-one-out estimators are extremely precise when compared to the exact leave-one-out estimate. The second result is taken as strong support for the internal consistency of the mean-field approach.

Download Full-text