A Novel Model-Based Approach in Classification Using Extension Distance

2014 ◽  
Vol 644-650 ◽  
pp. 2009-2012 ◽  
Author(s):  
Hai Tao Zhang ◽  
Bin Jun Wang

In order to solve the low efficiency problem of KNN or K-Means like algorithms in classification, a novel extension distance of interval is proposed to measure the similarity between testing data and the class domain. The method constructs representatives for data points in shorter time than traditional methods which replace original dataset to serve as the basis of classification. Virtually, the construction of the model containing representatives makes classification faster. Experimental results from two benchmark data sets, verify the effectiveness and applicability of the proposed work. The model based method using extension distance can effectively build data models to represent whole training data, and thus a high cost of classifying new instances problem is solved.

2014 ◽  
Vol 574 ◽  
pp. 728-733
Author(s):  
Shu Xia Lu ◽  
Cai Hong Jiao ◽  
Le Tong ◽  
Yang Fan Zhou

Core Vector Machine (CVM) can be used to deal with large data sets by find minimum enclosing ball (MEB), but one drawback is that CVM is very sensitive to the outliers. To tackle this problem, we propose a novel Position Regularized Core Vector Machine (PCVM).In the proposed PCVM, the data points are regularized by assigning a position-based weighting. Experimental results on several benchmark data sets show that the performance of PCVM is much better than CVM.


2021 ◽  
Vol 87 (6) ◽  
pp. 445-455
Author(s):  
Yi Ma ◽  
Zezhong Zheng ◽  
Yutang Ma ◽  
Mingcang Zhu ◽  
Ran Huang ◽  
...  

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.


2012 ◽  
Vol 532-533 ◽  
pp. 1373-1377 ◽  
Author(s):  
Ai Ping Deng ◽  
Ben Xiao ◽  
Hui Yong Yuan

In allusion to the disadvantage of having to obtain the number of clusters in advance and the sensitivity to selecting initial clustering centers in the K-means algorithm, an improved K-means algorithm is proposed, that the cluster centers and the number of clusters are dynamically changing. The new algorithm determines the cluster centers by calculating the density of data points and shared nearest neighbor similarity, and controls the clustering categories by using the average shared nearest neighbor self-similarity.The experimental results of IRIS testing data set show that the algorithm can select the cluster cennters and can distinguish between different types of cluster efficiently.


2016 ◽  
Vol 12 (4) ◽  
pp. 448-476 ◽  
Author(s):  
Amir Hosein Keyhanipour ◽  
Behzad Moshiri ◽  
Maryam Piroozmand ◽  
Farhad Oroumchian ◽  
Ali Moeini

Purpose Learning to rank algorithms inherently faces many challenges. The most important challenges could be listed as high-dimensionality of the training data, the dynamic nature of Web information resources and lack of click-through data. High dimensionality of the training data affects effectiveness and efficiency of learning algorithms. Besides, most of learning to rank benchmark datasets do not include click-through data as a very rich source of information about the search behavior of users while dealing with the ranked lists of search results. To deal with these limitations, this paper aims to introduce a novel learning to rank algorithm by using a set of complex click-through features in a reinforcement learning (RL) model. These features are calculated from the existing click-through information in the data set or even from data sets without any explicit click-through information. Design/methodology/approach The proposed ranking algorithm (QRC-Rank) applies RL techniques on a set of calculated click-through features. QRC-Rank is as a two-steps process. In the first step, Transformation phase, a compact benchmark data set is created which contains a set of click-through features. These feature are calculated from the original click-through information available in the data set and constitute a compact representation of click-through information. To find most effective click-through feature, a number of scenarios are investigated. The second phase is Model-Generation, in which a RL model is built to rank the documents. This model is created by applying temporal difference learning methods such as Q-Learning and SARSA. Findings The proposed learning to rank method, QRC-rank, is evaluated on WCL2R and LETOR4.0 data sets. Experimental results demonstrate that QRC-Rank outperforms the state-of-the-art learning to rank methods such as SVMRank, RankBoost, ListNet and AdaRank based on the precision and normalized discount cumulative gain evaluation criteria. The use of the click-through features calculated from the training data set is a major contributor to the performance of the system. Originality/value In this paper, we have demonstrated the viability of the proposed features that provide a compact representation for the click through data in a learning to rank application. These compact click-through features are calculated from the original features of the learning to rank benchmark data set. In addition, a Markov Decision Process model is proposed for the learning to rank problem using RL, including the sets of states, actions, rewarding strategy and the transition function.


2018 ◽  
Vol 13 (3) ◽  
pp. 408-428 ◽  
Author(s):  
Phu Vo Ngoc

We have already survey many significant approaches for many years because there are many crucial contributions of the sentiment classification which can be applied in everyday life, such as in political activities, commodity production, and commercial activities. We have proposed a novel model using a Latent Semantic Analysis (LSA) and a Dennis Coefficient (DNC) for big data sentiment classification in English. Many LSA vectors (LSAV) have successfully been reformed by using the DNC. We use the DNC and the LSAVs to classify 11,000,000 documents of our testing data set to 5,000,000 documents of our training data set in English. This novel model uses many sentiment lexicons of our basis English sentiment dictionary (bESD). We have tested the proposed model in both a sequential environment and a distributed network system. The results of the sequential system are not as good as that of the parallel environment. We have achieved 88.76% accuracy of the testing data set, and this is better than the accuracies of many previous models of the semantic analysis. Besides, we have also compared the novel model with the previous models, and the experiments and the results of our proposed model are better than that of the previous model. Many different fields can widely use the results of the novel model in many commercial applications and surveys of the sentiment classification.


Author(s):  
Rina Refianti ◽  
Achmad Benny Mutiara ◽  
Asep Juarna ◽  
Adang Suhendra

In recent years, two new data clustering algorithms have been proposed. One of them isAffinity Propagation (AP). AP is a new data clustering technique that use iterative message passing and consider all data points as potential exemplars. Two important inputs of AP are a similarity matrix (SM) of the data and the parameter ”preference” p. Although the original AP algorithm has shown much success in data clustering, it still suffer from one limitation: it is not easy to determine the value of the parameter ”preference” p which can result an optimal clustering solution. To resolve this limitation, we propose a new model of the parameter ”preference” p, i.e. it is modeled based on the similarity distribution. Having the SM and p, Modified Adaptive AP (MAAP) procedure is running. MAAP procedure means that we omit the adaptive p-scanning algorithm as in original Adaptive-AP (AAP) procedure. Experimental results on random non-partition and partition data sets show that (i) the proposed algorithm, MAAP-DDP, is slower than original AP for random non-partition dataset, (ii) for random 4-partition dataset and real datasets the proposed algorithm has succeeded to identify clusters according to the number of dataset’s true labels with the execution times that are comparable with those original AP. Beside that the MAAP-DDP algorithm demonstrates more feasible and effective than original AAP procedure.


2017 ◽  
Vol 8 (3) ◽  
pp. 24-36 ◽  
Author(s):  
Rabindra K. Barik ◽  
Rojalina Priyadarshini ◽  
Nilamadhab Dash

The paper contains an extensive experimental study which focuses on a major idea on Target Optimization (TO) prior to the training process of artificial machines. Generally, during training process of an artificial machine, output is computed from two important parameters i.e. input and target. In general practice input is taken from the training data and target is randomly chosen, which may not be relevant to the corresponding training data. Hence, the overall training of the neural network becomes inefficient. The present study tries to put forward TO as an efficient methodology which may be helpful in addressing the said problem. The proposed work tries to implement the concept of TO and compares the outcomes with the conventional classifiers. In this regard, different benchmark data sets are used to compare the effect of TO on data classification by using Particle Swarm Optimization (PSO) and Gravitational Search Algorithm (GSA) optimization techniques.


2015 ◽  
Vol 41 (2) ◽  
pp. 293-336 ◽  
Author(s):  
Li Dong ◽  
Furu Wei ◽  
Shujie Liu ◽  
Ming Zhou ◽  
Ke Xu

We present a statistical parsing framework for sentence-level sentiment classification in this article. Unlike previous works that use syntactic parsing results for sentiment analysis, we develop a statistical parser to directly analyze the sentiment structure of a sentence. We show that complicated phenomena in sentiment analysis (e.g., negation, intensification, and contrast) can be handled the same way as simple and straightforward sentiment expressions in a unified and probabilistic way. We formulate the sentiment grammar upon Context-Free Grammars (CFGs), and provide a formal description of the sentiment parsing framework. We develop the parsing model to obtain possible sentiment parse trees for a sentence, from which the polarity model is proposed to derive the sentiment strength and polarity, and the ranking model is dedicated to selecting the best sentiment tree. We train the parser directly from examples of sentences annotated only with sentiment polarity labels but without any syntactic annotations or polarity annotations of constituents within sentences. Therefore we can obtain training data easily. In particular, we train a sentiment parser, s.parser, from a large amount of review sentences with users' ratings as rough sentiment polarity labels. Extensive experiments on existing benchmark data sets show significant improvements over baseline sentiment classification approaches.


2013 ◽  
Vol 380-384 ◽  
pp. 2811-2816
Author(s):  
Kai Lei ◽  
Yi Fan Zeng

Query-oriented multi-document summarization (QMDS) attempts to generate a concise piece of text byextracting sentences from a target document collection, with the aim of not only conveying the key content of that corpus, also, satisfying the information needs expressed by that query. Due to its great applicable value, QMDS has been intensively studied in recent decades. Three properties are supposed crucial for a good summary, i.e., relevance, prestige and low redundancy (orso-called diversity). Unfortunately, most existing work either disregarded the concern of diversity, or handled it with non-optimized heuristics, usually based on greedy sentences election. Inspired by the manifold-ranking process, which deals with query-biased prestige, and DivRank algorithm which captures query-independent diversity ranking, in this paper, we propose a novel biased diversity ranking model, named ManifoldDivRank, for query-sensitive summarization tasks. The top-ranked sentences discovered by our algorithm not only enjoy query-oriented high prestige, more importantly, they are dissimilar with each other. Experimental results on DUC2005and DUC2006 benchmark data sets demonstrate the effectiveness of our proposal.


2014 ◽  
Vol 21 (1) ◽  
pp. 67-74 ◽  
Author(s):  
Mohamed Marzouk ◽  
Mohamed Alaraby

This paper presents a fuzzy subtractive modelling technique to predict the weight of telecommunication towers which is used to estimate their respective costs. This is implemented through the utilization of data from previously installed telecommunication towers considering four input parameters: a) tower height; b) allowed tilt or deflection; c) antenna subjected area loading; and d) wind load. Telecommunication towers are classified according to designated code (TIA-222-F and TIA-222-G standards) and structures type (Self-Supporting Tower (SST) and Roof Top (RT)). As such, four fuzzy subtractive models are developed to represent the four classes. To build the fuzzy models, 90% of data are utilized and fed to Matlab software as training data. The remaining 10% of the data are utilized to test model performance. Sugeno-Type first order is used to optimize model performance in predicting tower weights. Errors are estimated using Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) for both training and testing data sets. Sensitivity analysis is carried to validate the model and observe the effect of clusters’ radius on models performance.


Sign in / Sign up

Export Citation Format

Share Document