scholarly journals Multi-choice wavelet thresholding based binary classification method

Methodology ◽  
2020 ◽  
Vol 16 (2) ◽  
pp. 127-146 ◽  
Author(s):  
Seung Hyun Baek ◽  
Alberto Garcia-Diaz ◽  
Yuanshun Dai

Data mining is one of the most effective statistical methodologies to investigate a variety of problems in areas including pattern recognition, machine learning, bioinformatics, chemometrics, and statistics. In particular, statistically-sophisticated procedures that emphasize on reliability of results and computational efficiency are required for the analysis of high-dimensional data. Optimization principles can play a significant role in the rationalization and validation of specialized data mining procedures. This paper presents a novel methodology which is Multi-Choice Wavelet Thresholding (MCWT) based three-step methodology consists of three processes: perception (dimension reduction), decision (feature ranking), and cognition (model selection). In these steps three concepts known as wavelet thresholding, support vector machines for classification and information complexity are integrated to evaluate learning models. Three published data sets are used to illustrate the proposed methodology. Additionally, performance comparisons with recent and widely applied methods are shown.

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yuxian Huang ◽  
Geng Yang ◽  
Yahong Xu ◽  
Hao Zhou

In big data era, massive and high-dimensional data is produced at all times, increasing the difficulty of analyzing and protecting data. In this paper, in order to realize dimensionality reduction and privacy protection of data, principal component analysis (PCA) and differential privacy (DP) are combined to handle these data. Moreover, support vector machine (SVM) is used to measure the availability of processed data in our paper. Specifically, we introduced differential privacy mechanisms at different stages of the algorithm PCA-SVM and obtained the algorithms DPPCA-SVM and PCADP-SVM. Both algorithms satisfy ε , 0 -DP while achieving fast classification. In addition, we evaluate the performance of two algorithms in terms of noise expectation and classification accuracy from the perspective of theoretical proof and experimental verification. To verify the performance of DPPCA-SVM, we also compare our DPPCA-SVM with other algorithms. Results show that DPPCA-SVM provides excellent utility for different data sets despite guaranteeing stricter privacy.


2015 ◽  
Vol 40 (1) ◽  
pp. 67-86 ◽  
Author(s):  
Lingfeng Niu ◽  
Ruizhi Zhou ◽  
Xi Zhao ◽  
Yong Shi

Abstract Bound-constrained Support Vector Machine(SVM) is one of the stateof- art model for binary classification. The decomposition method is currently one of the major methods for training SVMs, especially when the nonlinear kernel is used. In this paper, we proposed two new decomposition algorithms for training bound-constrained SVMs. Projected gradient algorithm and interior point method are combined together to solve the quadratic subproblem effciently. The main difference between the two algorithms is the way of choosing working set. The first one only uses first order derivative information of the model for simplicity. The second one incorporate part of second order information into the process of working set selection, besides the gradient. Both algorithms are proved to be global convergent in theory. New algorithms is compared with the famous package BSVM. Numerical experiments on several public data sets validate the effciency of the proposed methods.


2015 ◽  
Vol 11 (1) ◽  
pp. 25 ◽  
Author(s):  
Padmavathi Janardhanan ◽  
Heena L. ◽  
Fathima Sabika

The idea of medical data mining is to extract hidden knowledge in medical field using data mining techniques. One of the positive aspects is to discover the important patterns. It is possible to identify patterns even if we do not have fully understood the casual mechanisms behind those patterns. In this case, data mining prepares the ability of research and discovery that may not have been evident. This paper analyzes the effectiveness of SVM, the most popular classification techniques in classifying medical datasets. This paper analyses the performance of the Naïve Bayes classifier, RBF network and SVM Classifier. The performance of predictive model is analysed with different medical datasets in predicting diseases is recorded and compared. The datasets were of binary class and each dataset had different number of attributes. The datasets include heart datasets, cancer and diabetes datasets. It is observed that SVM classifier produces better percentage of accuracy in classification. The work has been implemented in WEKA environment and obtained results show that SVM is the most robust and effective classifier for medical data sets.


Author(s):  
Cataldo Zuccaro ◽  
Michel Plaisent ◽  
Prosper Bernard

This chapter presents a preliminary framework to tackle tax evasion in the field of residential renovation. This industry plays a major role in economic development and employment growth. Tax evasion and fraud are extremely difficult to combat in the industry since it is characterized by a large number of stakeholders (manufacturers, retailers, tradesmen, and households) generating complex transactional dynamics that often defy attempts to deploy transactional analytics to detect anomalies, fraud, and tax evasion. This chapter proposes a framework to apply transactional analytics and data mining to develop standard measures and predictive models to detect fraud and tax evasion. Combining big data sets, cross-referencing, and predictive modeling (i.e., anomaly detection, artificial neural network support vector machines, Bayesian network, and association rules) can assist government agencies to combat highly stealth tax evasion and fraud in the residential renovation.


Author(s):  
VLADIMIR NIKULIN ◽  
TIAN-HSIANG HUANG ◽  
GEOFFREY J. MCLACHLAN

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.


2012 ◽  
Vol 24 (4) ◽  
pp. 1047-1084 ◽  
Author(s):  
Xiao-Tong Yuan ◽  
Shuicheng Yan

We investigate Newton-type optimization methods for solving piecewise linear systems (PLSs) with nondegenerate coefficient matrix. Such systems arise, for example, from the numerical solution of linear complementarity problem, which is useful to model several learning and optimization problems. In this letter, we propose an effective damped Newton method, PLS-DN, to find the exact (up to machine precision) solution of nondegenerate PLSs. PLS-DN exhibits provable semiiterative property, that is, the algorithm converges globally to the exact solution in a finite number of iterations. The rate of convergence is shown to be at least linear before termination. We emphasize the applications of our method in modeling, from a novel perspective of PLSs, some statistical learning problems such as box-constrained least squares, elitist Lasso (Kowalski & Torreesani, 2008 ), and support vector machines (Cortes & Vapnik, 1995 ). Numerical results on synthetic and benchmark data sets are presented to demonstrate the effectiveness and efficiency of PLS-DN on these problems.


2000 ◽  
Vol 12 (11) ◽  
pp. 2655-2684 ◽  
Author(s):  
Manfred Opper ◽  
Ole Winther

We derive a mean-field algorithm for binary classification with gaussian processes that is based on the TAP approach originally proposed in statistical physics of disordered systems. The theory also yields an approximate leave-one-out estimator for the generalization error, which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler “naive” mean-field theory and support vector machines (SVMs) as limiting cases. For both mean-field algorithms and support vector machines, simulation results for three small benchmark data sets are presented. They show that one may get state-of-the-art performance by using the leave-one-out estimator for model selection and the built-in leave-one-out estimators are extremely precise when compared to the exact leave-one-out estimate. The second result is taken as strong support for the internal consistency of the mean-field approach.


Sign in / Sign up

Export Citation Format

Share Document