A Novel Approach to Feature Selection Based on Quality Estimation Metrics

Author(s):  
Jean-Charles Lamirel ◽  
Pascal Cuxac ◽  
Kafil Hajlaoui
2017 ◽  
Vol 108 (1) ◽  
pp. 307-318 ◽  
Author(s):  
Eleftherios Avramidis

AbstractA deeper analysis on Comparative Quality Estimation is presented by extending the state-of-the-art methods with adequacy and grammatical features from other Quality Estimation tasks. The previously used linear method, unable to cope with the augmented features, is replaced with a boosting classifier assisted by feature selection. The methods indicated show improved performance for 6 language pairs, when applied on the output from MT systems developed over 7 years. The improved models compete better with reference-aware metrics.Notable conclusions are reached through the examination of the contribution of the features in the models, whereas it is possible to identify common MT errors that are captured by the features. Many grammatical/fluency features have a good contribution, few adequacy features have some contribution, whereas source complexity features are of no use. The importance of many fluency and adequacy features is language-specific.


Author(s):  
Ch. Sanjeev Kumar Dash ◽  
Ajit Kumar Behera ◽  
Sarat Chandra Nayak

This chapter presents a novel approach for classification of dataset by suitably tuning the parameters of radial basis function networks with an additional cost of feature selection. Inputting optimal and relevant set of features to a radial basis function may greatly enhance the network efficiency (in terms of accuracy) at the same time compact its size. In this chapter, the authors use information gain theory (a kind of filter approach) for reducing the features and differential evolution for tuning center and spread of radial basis functions. Different feature selection methods, handling missing values and removal of inconsistency to improve the classification accuracy of the proposed model are emphasized. The proposed approach is validated with a few benchmarking highly skewed and balanced dataset retrieved from University of California, Irvine (UCI) repository. The experimental study is encouraging to pursue further extensive research in highly skewed data.


2018 ◽  
Vol 5 (3) ◽  
pp. 1-20 ◽  
Author(s):  
Sharmila Subudhi ◽  
Suvasini Panigrahi

This article presents a novel approach for fraud detection in automobile insurance claims by applying various data mining techniques. Initially, the most relevant attributes are chosen from the original dataset by using an evolutionary algorithm based feature selection method. A test set is then extracted from the selected attribute set and the remaining dataset is subjected to the Possibilistic Fuzzy C-Means (PFCM) clustering technique for the undersampling approach. The 10-fold cross validation method is then used on the balanced dataset for training and validating a group of Weighted Extreme Learning Machine (WELM) classifiers generated from various combinations of WELM parameters. Finally, the test set is applied on the best performing model for classification purpose. The efficacy of the proposed system is illustrated by conducting several experiments on a real-world automobile insurance defraud dataset. Besides, a comparative analysis with another approach justifies the superiority of the proposed system.


2017 ◽  
Vol 44 (3) ◽  
pp. 314-330 ◽  
Author(s):  
Fatemeh Shafiee ◽  
Mehrnoush Shamsfard

Automatic text summarisation is the process of creating a summary from one or more documents by eliminating the details and preserving the worthwhile information. This article presents a single/multi-document summariser using a novel clustering method for creating summaries. First, a feature selection phase is employed. Then, FarsNet, the Persian WordNet, is utilised to extract the semantic information of words. Therefore, the input sentences are categorised into three main clusters: similarity, relatedness and coherency. Each similarity cluster contains similar sentences to its core, while each relatedness cluster contains sentences that are related (but not similar) to its core. The coherency clusters show the sentences that should be kept together to preserve the coherency of the summary. Finally, the centroid of each similarity cluster having the most feature score is added to an empty summary. The summary is enlarged by including related sentences from relatedness clusters and excluding similar sentences to its content iteratively. Coherency clusters are applied to the created summary in the last step. The proposed method has been compared with three known existing text summarisation systems and techniques for the Persian language: FarsiSum, Parsumist and Ijaz. Our proposed method leads to improvement in experimental results on different measurements including precision, recall, F-measure, ROUGE-N and ROUGE-L.


Sign in / Sign up

Export Citation Format

Share Document