scholarly journals Maximum Neighborhood Margin Discriminant Projection for Classification

2014 ◽  
Vol 2014 ◽  
pp. 1-16 ◽  
Author(s):  
Jianping Gou ◽  
Yongzhao Zhan ◽  
Min Wan ◽  
Xiangjun Shen ◽  
Jinfu Chen ◽  
...  

We develop a novel maximum neighborhood margin discriminant projection (MNMDP) technique for dimensionality reduction of high-dimensional data. It utilizes both the local information and class information to model the intraclass and interclass neighborhood scatters. By maximizing the margin between intraclass and interclass neighborhoods of all points, MNMDP cannot only detect the true intrinsic manifold structure of the data but also strengthen the pattern discrimination among different classes. To verify the classification performance of the proposed MNMDP, it is applied to the PolyU HRF and FKP databases, the AR face database, and the UCI Musk database, in comparison with the competing methods such as PCA and LDA. The experimental results demonstrate the effectiveness of our MNMDP in pattern classification.

2014 ◽  
Vol 2014 ◽  
pp. 1-5 ◽  
Author(s):  
Fuding Xie ◽  
Yutao Fan ◽  
Ming Zhou

Dimensionality reduction is the transformation of high-dimensional data into a meaningful representation of reduced dimensionality. This paper introduces a dimensionality reduction technique by weighted connections between neighborhoods to improveK-Isomap method, attempting to preserve perfectly the relationships between neighborhoods in the process of dimensionality reduction. The validity of the proposal is tested by three typical examples which are widely employed in the algorithms based on manifold. The experimental results show that the local topology nature of dataset is preserved well while transforming dataset in high-dimensional space into a new dataset in low-dimensionality by the proposed method.


Author(s):  
BO YANG ◽  
SONGCAN CHEN

Many locality-based unsupervised dimensionality reduction (DR) algorithms have recently been proposed and demonstrated to be effective to a certain degree in some classification tasks. In this paper, we aim to show that: (1) a discriminant disposal is intentionally or unintentionally induced from the construction of locality in these unsupervised algorithms, however, such a discrimination is often inconsistent with the actual class information, so here called disguised discrimination; (2) sensitivities of these algorithms to local neighbor parameters stem from the inconsistency between the disguised discrimination and the actual class information; (3) how such inconsistency impacts the classification performance of these algorithms. The experiments on the benchmark face datasets testify our statements that are expected to provide some insight into the unsupervised leaning based on locality.


2019 ◽  
Vol 2019 ◽  
pp. 1-10 ◽  
Author(s):  
Zhibo Guo ◽  
Ying Zhang

It is very difficult to process and analyze high-dimensional data directly. Therefore, it is necessary to learn a potential subspace of high-dimensional data through excellent dimensionality reduction algorithms to preserve the intrinsic structure of high-dimensional data and abandon the less useful information. Principal component analysis (PCA) and linear discriminant analysis (LDA) are two popular dimensionality reduction methods for high-dimensional sensor data preprocessing. LDA contains two basic methods, namely, classic linear discriminant analysis and FS linear discriminant analysis. In this paper, a new method, called similar distribution discriminant analysis (SDDA), is proposed based on the similarity of samples’ distribution. Furthermore, the method of solving the optimal discriminant vector is given. These discriminant vectors are orthogonal and nearly statistically uncorrelated. The disadvantages of PCA and LDA are overcome, and the extracted features are more effective by using SDDA. The recognition performance of SDDA exceeds PCA and LDA largely. Some experiments on the Yale face database, FERET face database, and UCI multiple features dataset demonstrate that the proposed method is effective. The results reveal that SDDA obtains better performance than comparison dimensionality reduction methods.


2020 ◽  
Vol 2020 ◽  
pp. 1-16
Author(s):  
Yujia Sun ◽  
Jan Platoš

This study focuses on high-dimensional text data clustering, given the inability of K-means to process high-dimensional data and the need to specify the number of clusters and randomly select the initial centers. We propose a Stacked-Random Projection dimensionality reduction framework and an enhanced K-means algorithm DPC-K-means based on the improved density peaks algorithm. The improved density peaks algorithm determines the number of clusters and the initial clustering centers of K-means. Our proposed algorithm is validated using seven text datasets. Experimental results show that this algorithm is suitable for clustering of text data by correcting the defects of K-means.


2021 ◽  
Author(s):  
◽  
Wenbin Pei

<p><b>Class imbalance and high dimensionality have been acknowledged as two tough issues in classification. Learning from unbalanced data, the constructed classifiers are often biased towards the majority class, and thereby perform poorly on the minority class. Unfortunately, the minority class is often the class of interest in many real-world applications, such as medical diagnosis and fault detection. High dimensionality often makes it more difficult to handle the class imbalance issue. To date, most existing works attempt to address one single issue, without consideration of solving the other. These works could not be effectively applied to some challenging classification tasks that suffer from both of the two issues.</b></p> <p>Genetic programming (GP) is one of the most popular techniques from evolutionary computation, which has been widely applied to classification tasks. The built-in feature selection ability of GP makes it very powerful for use in classification with high-dimensional data. However, if the class imbalance issue is not well addressed, the constructed GP classifiers are often biased towards the majority class. Accordingly, this thesis aims to address the joint effects of class imbalance and high dimensionality by developing new GP based classification approaches, with the goal of improving classification performance.</p> <p>To effectively and efficiently address the performance bias issue of GP, this thesis develops a fitness function that considers two criteria, namely the approximation of area under the curve (AUC) and classification clarity (i.e. how well a program can separate the two classes). To further improve the efficiency, a new program reuse mechanism is designed to reuse previous effective GP individuals. According to experimental results, GP with the new fitness function and the program reuse mechanism achieves good performance and significantly saves training time. However, this method treats the two criteria equally, which is not always reasonable.</p> <p>To avoid manually weighing the two criteria in the fitness evaluation process, we propose a novel two-criterion fitness evaluation method, where the obtained values on the two criteria are combined in pairs, instead of summing them together. Then, a three-criterion tournament selection is designed to effectively identify and select good programs to be used by genetic operators for generating better offspring during the evolutionary learning process. Experimental results show that the proposed GP method achieves better classification performance than compared methods.</p> <p>Cost-sensitive learning is a popular approach to addressing the problem of class imbalance for many classification algorithms in machine learning. However, cost-sensitive algorithms are dependent on cost matrices that are usually designed manually. Unfortunately, it is often not easy for humans, even experts, to accurately specify misclassification costs for different mistakes due to the lack or incompleteness of domain knowledge related to actual situations in many complex tasks. As a result, these cost-sensitive algorithms cannot be directly applied. This thesis develops new GP based approaches to developing cost-sensitive classifiers without requiring cost matrices from humans. The newly developed cost-sensitive GP methods are able to construct classifiers and learn cost values or intervals automatically and simultaneously. The experimental results show that the new cost-sensitive GP methods outperform compared methods for high-dimensional unbalanced classification in almost all comparisons.</p> <p>Cost-sensitive GP classifiers treat the minority class as being more important than the majority class, but this may cause an accuracy decrease in the overlapping areas where the prior probabilities of the two classes are about the same. In the thesis, we propose a neighborhood method to detect overlapping areas, and then use GP to develop cost-sensitive classifiers that employ different classification strategies to classify instances from the overlapping areas or the non-overlapping areas.</p>


2010 ◽  
Vol 7 (1) ◽  
pp. 127-138 ◽  
Author(s):  
Zhao Zhang ◽  
Ye Ning

Dimensionality reduction is an important preprocessing step in high-dimensional data analysis without losing intrinsic information. The problem of semi-supervised nonlinear dimensionality reduction called KNDR is considered for wood defects recognition. In this setting, domain knowledge in forms of pairs constraints are used to specify whether pairs of instances belong to the same class or different classes. KNDR can project the data onto a set of 'useful' features and preserve the structure of labeled and unlabeled data as well as the constraints defined in the embedding space, under which the projections of the original data can be effectively partitioned from each other. We demonstrate the practical usefulness of KNDR for data visualization and wood defects recognition through extensive experiments. Experimental results show it achieves similar or even higher performances than some existing methods.


Author(s):  
Iwan Syarif

Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicatedespecially when the number of possible different combinations of variables is so high. In this research, we evaluate the performance of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) as feature selection algorithms when applied to high dimensional datasets.Our experiments show that in terms of dimensionality reduction, PSO is much better than GA. PSO has successfully reduced the number of attributes of 8 datasets to 13.47% on average while GA is only 31.36% on average. In terms of classification performance, GA is slightly better than PSO. GA‐ reduced datasets have better performance than their original ones on 5 of 8 datasets while PSO is only 3 of 8 datasets.Keywords: feature selection, dimensionality reduction, Genetic Algorithm (GA), Particle Swarm Optmization (PSO).


2021 ◽  
pp. 1-12
Author(s):  
Heming Jia ◽  
Chunbo Lang

Salp swarm algorithm (SSA) is a meta-heuristic algorithm proposed in recent years, which shows certain advantages in solving some optimization tasks. However, with the increasing difficulty of solving the problem (e.g. multi-modal, high-dimensional), the convergence accuracy and stability of SSA algorithm decrease. In order to overcome the drawbacks, salp swarm algorithm with crossover scheme and Lévy flight (SSACL) is proposed. The crossover scheme and Lévy flight strategy are used to improve the movement patterns of salp leader and followers, respectively. Experiments have been conducted on various test functions, including unimodal, multimodal, and composite functions. The experimental results indicate that the proposed SSACL algorithm outperforms other advanced algorithms in terms of precision, stability, and efficiency. Furthermore, the Wilcoxon’s rank sum test illustrates the advantages of proposed method in a statistical and meaningful way.


Sign in / Sign up

Export Citation Format

Share Document