The Research of Feature Selection of Text Classification Based on Integrated Learning Algorithm

Author(s):  
Xia Huosong ◽  
Liu Jian
2014 ◽  
Vol 543-547 ◽  
pp. 1896-1900
Author(s):  
Deng Zhou ◽  
Wen Huang He ◽  
Tao Tao Wu

This Compared with the traditional text classification model, the Tibetan text classification based on N-Gram model has adopted N-Gram model in terms of the level of word. In other words, during the text classification, word segmentation is not required. Also, feature selection and abundant pre-treatment processes are avoided. This paper not only carried out profound research on N-Gram models, but also discusses the selection of parameter N in the model by adopting Naïve Bayes Multinomial classifier.


2012 ◽  
Vol 57 (3) ◽  
pp. 829-835 ◽  
Author(s):  
Z. Głowacz ◽  
J. Kozik

The paper describes a procedure for automatic selection of symptoms accompanying the break in the synchronous motor armature winding coils. This procedure, called the feature selection, leads to choosing from a full set of features describing the problem, such a subset that would allow the best distinguishing between healthy and damaged states. As the features the spectra components amplitudes of the motor current signals were used. The full spectra of current signals are considered as the multidimensional feature spaces and their subspaces are tested. Particular subspaces are chosen with the aid of genetic algorithm and their goodness is tested using Mahalanobis distance measure. The algorithm searches for such a subspaces for which this distance is the greatest. The algorithm is very efficient and, as it was confirmed by research, leads to good results. The proposed technique is successfully applied in many other fields of science and technology, including medical diagnostics.


2021 ◽  
pp. 100572
Author(s):  
Malek Alzaqebah ◽  
Khaoula Briki ◽  
Nashat Alrefai ◽  
Sami Brini ◽  
Sana Jawarneh ◽  
...  

2021 ◽  
Vol 11 (9) ◽  
pp. 3836
Author(s):  
Valeri Gitis ◽  
Alexander Derendyaev ◽  
Konstantin Petrov ◽  
Eugene Yurkov ◽  
Sergey Pirogov ◽  
...  

Prostate cancer is the second most frequent malignancy (after lung cancer). Preoperative staging of PCa is the basis for the selection of adequate treatment tactics. In particular, an urgent problem is the classification of indolent and aggressive forms of PCa in patients with the initial stages of the tumor process. To solve this problem, we propose to use a new binary classification machine-learning method. The proposed method of monotonic functions uses a model in which the disease’s form is determined by the severity of the patient’s condition. It is assumed that the patient’s condition is the easier, the less the deviation of the indicators from the normal values inherent in healthy people. This assumption means that the severity (form) of the disease can be represented by monotonic functions from the values of the deviation of the patient’s indicators beyond the normal range. The method is used to solve the problem of classifying patients with indolent and aggressive forms of prostate cancer according to pretreatment data. The learning algorithm is nonparametric. At the same time, it allows an explanation of the classification results in the form of a logical function. To do this, you should indicate to the algorithm either the threshold value of the probability of successful classification of patients with an indolent form of PCa, or the threshold value of the probability of misclassification of patients with an aggressive form of PCa disease. The examples of logical rules given in the article show that they are quite simple and can be easily interpreted in terms of preoperative indicators of the form of the disease.


2020 ◽  
Vol 11 (1) ◽  
pp. 96
Author(s):  
Wen-Lan Wu ◽  
Meng-Hua Lee ◽  
Hsiu-Tao Hsu ◽  
Wen-Hsien Ho ◽  
Jing-Min Liang

Background: In this study, an automatic scoring system for the functional movement screen (FMS) was developed. Methods: Thirty healthy adults fitted with full-body inertial measurement unit sensors completed six FMS exercises. The system recorded kinematics data, and a professional athletic trainer graded each participant. To reduce the number of input variables for the predictive model, ordinal logistic regression was used for subset feature selection. The ensemble learning algorithm AdaBoost.M1 was used to construct classifiers. Accuracy and F score were used for classification model evaluation. The consistency between automatic and manual scoring was assessed using a weighted kappa statistic. Results: When all the features were used, the predict model presented moderate to high accuracy, with kappa values between fair to very good agreement. After feature selection, model accuracy decreased about 10%, with kappa values between poor to moderate agreement. Conclusions: The results indicate that higher prediction accuracy was achieved using the full feature set compared with using the reduced feature set.


Sign in / Sign up

Export Citation Format

Share Document