scholarly journals Class balancing in customer segments classification using support vector machine rule extraction and ensemble learning

Author(s):  
Suncica Rogic ◽  
Ljiljana Kascelan

An objective and data-based market segmentation is a precondition for efficient targeting in direct marketing campaigns. The role of customer segments classification in direct marketing is to predict the segment of most valuable customers who is likely to respond to a campaign based on previous purchasing behavior. A good-performing predictive model can significantly increase revenue, but also, reduce unnecessary marketing campaign costs. As this segment of customers is generally the smallest, most classification methods lead to misclassification of the minor class. To overcome this problem, this paper proposes a class balancing approach based on Support Vector Machine-Rule Extraction (SVM-RE) and ensemble learning. Additionally, this approach allows for rule extraction, which can describe and explain different customer segments. Using a customer base from a company?s direct marketing campaigns, the proposed approach is compared to other data balancing methods in terms of overall prediction accuracy, recall and precision for the minor class, as well as profitability of the campaign. It was found that the method performs better than other compared class balancing methods in terms of all mentioned criteria. Finally, the results confirm the superiority of the ensemble SVM method as a preprocessor, which effectively balances data in the process of customer segments classification

2020 ◽  
Vol 27 (4) ◽  
pp. 329-336 ◽  
Author(s):  
Lei Xu ◽  
Guangmin Liang ◽  
Baowen Chen ◽  
Xu Tan ◽  
Huaikun Xiang ◽  
...  

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.


2011 ◽  
Vol 130-134 ◽  
pp. 2047-2050 ◽  
Author(s):  
Hong Chun Qu ◽  
Xie Bin Ding

SVM(Support Vector Machine) is a new artificial intelligence methodolgy, basing on structural risk mininization principle, which has better generalization than the traditional machine learning and SVM shows powerfulability in learning with limited samples. To solve the problem of lack of engine fault samples, FLS-SVM theory, an improved SVM, which is a method is applied. 10 common engine faults are trained and recognized in the paper.The simulated datas are generated from PW4000-94 engine influence coefficient matrix at cruise, and the results show that the diagnostic accuracy of FLS-SVM is better than LS-SVM.


2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
Bin Zhang ◽  
Jinke Gong ◽  
Wenhua Yuan ◽  
Jun Fu ◽  
Yi Huang

In order to effectively predict the sieving efficiency of a vibrating screen, experiments to investigate the sieving efficiency were carried out. Relation between sieving efficiency and other working parameters in a vibrating screen such as mesh aperture size, screen length, inclination angle, vibration amplitude, and vibration frequency was analyzed. Based on the experiments, least square support vector machine (LS-SVM) was established to predict the sieving efficiency, and adaptive genetic algorithm and cross-validation algorithm were used to optimize the parameters in LS-SVM. By the examination of testing points, the prediction performance of least square support vector machine is better than that of the existing formula and neural network, and its average relative error is only 4.2%.


Author(s):  
Fatima Mushtaq ◽  
Khalid Mahmood ◽  
Mohammad Chaudhry Hamid ◽  
Rahat Tufail

The advent of technological era, the scientists and researchers develop machine learning classification techniques to classify land cover accurately. Researches prove that these classification techniques perform better than previous traditional techniques. In this research main objective is to identify suitable land cover classification method to extract land cover information of Lahore district. Two supervised classification techniques i.e., Maximum Likelihood Classifier (MLC) (based on neighbourhood function) and Support Vector Machine (SVM) (based on optimal hyper-plane function) are compared by using Sentinel-2 data. For this optimization, four land cover classes have been selected. Field based training samples have been collected and prepared through a survey of the study area at four spatial levels. Accuracy for each of the classifier has been assessed using error matrix and kappa statistics. Results show that SVM performs better than MLC. Overall accuracies of SVM and MLC are 95.20% and 88.80% whereas their kappa co-efficient are 0.93 and 0.84 respectively.  


2017 ◽  
Vol 9 (4) ◽  
pp. 416 ◽  
Author(s):  
Nelly Indriani Widiastuti ◽  
Ednawati Rainarli ◽  
Kania Evita Dewi

Classification is the process of grouping objects that have the same features or characteristics into several classes. The automatic documents classification use words frequency that appears on training data as features. The large number of documents cause the number of words that appears as a feature will increase. Therefore, summaries are chosen to reduce the number of words that used in classification. The classification uses multiclass Support Vector Machine (SVM) method. SVM was considered to have a good reputation in the classification. This research tests the effect of summary as selection features into documents classification. The summaries reduce text into 50%. A result obtained that the summaries did not affect value accuracy of classification of documents that use SVM. But, summaries improve the accuracy of Simple Logistic Classifier. The classification testing shows that the accuracy of Naïve Bayes Multinomial (NBM) better than SVM


Author(s):  
Zhao Hailong ◽  
Yi Junyan

In recent years, automatic ear recognition has become a popular research. Effective feature extraction is one of the most important steps in Content-based ear image retrieval applications. In this paper, the authors proposed a new vectors construction method for ear retrieval based on Block Discriminative Common Vector. According to this method, the ear image is divided into 16 blocks firstly and the features are extracted by applying DCV to the sub-images. Furthermore, Support Vector Machine is used as classifier to make decision. The experimental results show that the proposed method performs better than classical PCA+LDA, so it is an effective human ear recognition method.


2013 ◽  
Vol 16 (5) ◽  
pp. 973-988 ◽  
Author(s):  
Xiao-Li Li ◽  
Haishen Lü ◽  
Robert Horton ◽  
Tianqing An ◽  
Zhongbo Yu

An accurate and real-time flood forecast is a crucial nonstructural step to flood mitigation. A support vector machine (SVM) is based on the principle of structural risk minimization and has a good generalization capability. The ensemble Kalman filter (EnKF) is a proven method with the capability of handling nonlinearity in a computationally efficient manner. In this paper, a type of SVM model is established to simulate the rainfall–runoff (RR) process. Then, a coupling model of SVM and EnKF (SVM + EnKF) is used for RR simulation. The impact of the assimilation time scale on the SVM + EnKF model is also studied. A total of four different combinations of the SVM and EnKF models are studied in the paper. The Xinanjiang RR model is employed to evaluate the SVM and the SVM + EnKF models. The study area is located in the Luo River Basin, Guangdong Province, China, during a nine-year period from 1994 to 2002. Compared to SVM, the SVM + EnKF model substantially improves the accuracy of flood prediction, and the Xinanjiang RR model also performs better than the SVM model. The simulated result for the assimilation time scale of 5 days is better than the results for the other cases.


Sign in / Sign up

Export Citation Format

Share Document