A random forest method with feature selection for developing medical prediction models with clustered and longitudinal data

This article discusses the algorithms that can be used in the study and analysis of symbols to determine the genre of texts. There are differences in defining the genre of texts. Algorithm is also defined by describing the text, removing unnecessary characters, leaving only the text, and comparing it with the database. The article describes a practical method of automatic recognition of the text genre based on all parameters. Comparing the logistics regression, solution tree, random forest, MLPClassifier, AdaBoostClassifier, svm, GaussianNB algorithms, the choice of the most important parameters for the texts was considered. Defining the genre of texts is now relevant in all areas of the information society.

Download Full-text

Probabilistic Feature Selection for Interpretable Random Forest Model

Advances in Intelligent Systems and Computing - Advances in Information and Communication ◽

10.1007/978-3-030-73103-8_50 ◽

2021 ◽

pp. 707-718

Author(s):

Sandeep Tandra ◽

Alireza Manashty

Keyword(s):

Feature Selection ◽

Random Forest ◽

Random Forest Model ◽

Forest Model ◽

Selection For

Download Full-text

Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier

Computerized Medical Imaging and Graphics ◽

10.1016/j.compmedimag.2016.12.002 ◽

2017 ◽

Vol 60 ◽

pp. 42-49 ◽

Cited By ~ 26

Author(s):

Desbordes Paul ◽

Ruan Su ◽

Modzelewski Romain ◽

Vauclin Sébastien ◽

Vera Pierre ◽

...

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Random Forest ◽

Oesophageal Cancer ◽

Outcome Prediction ◽

Random Forest Classifier ◽

Selection For

Download Full-text

Random Forest Feature Selection for Data Coming from Evaluation Sheets of Subjects with ASDs

Proceedings of the 2016 Federated Conference on Computer Science and Information Systems ◽

10.15439/2016f274 ◽

2016 ◽

Cited By ~ 2

Author(s):

Krzysztof Pancerz ◽

Wiesław Paja ◽

Jerzy Gomuła

Keyword(s):

Feature Selection ◽

Random Forest ◽

Selection For

Download Full-text

Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph182010670 ◽

2021 ◽

Vol 18 (20) ◽

pp. 10670

Author(s):

Nahúm Cueto López ◽

María Teresa García-Ordás ◽

Facundo Vitelli-Storelli ◽

Pablo Fernández-Navarro ◽

Camilo Palazuelos ◽

...

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Random Forest ◽

Cancer Risk ◽

Risk Prediction ◽

Prediction Models ◽

Public Health Problem ◽

Healthy Population ◽

Major Public Health Problem ◽

The Stability

This study evaluates several feature ranking techniques together with some classifiers based on machine learning to identify relevant factors regarding the probability of contracting breast cancer and improve the performance of risk prediction models for breast cancer in a healthy population. The dataset with 919 cases and 946 controls comes from the MCC-Spain study and includes only environmental and genetic features. Breast cancer is a major public health problem. Our aim is to analyze which factors in the cancer risk prediction model are the most important for breast cancer prediction. Likewise, quantifying the stability of feature selection methods becomes essential before trying to gain insight into the data. This paper assesses several feature selection algorithms in terms of performance for a set of predictive models. Furthermore, their robustness is quantified to analyze both the similarity between the feature selection rankings and their own stability. The ranking provided by the SVM-RFE approach leads to the best performance in terms of the area under the ROC curve (AUC) metric. Top-47 ranked features obtained with this approach fed to the Logistic Regression classifier achieve an AUC = 0.616. This means an improvement of 5.8% in comparison with the full feature set. Furthermore, the SVM-RFE ranking technique turned out to be highly stable (as well as Random Forest), whereas relief and the wrapper approaches are quite unstable. This study demonstrates that the stability and performance of the model should be studied together as Random Forest and SVM-RFE turned out to be the most stable algorithms, but in terms of model performance SVM-RFE outperforms Random Forest.

Download Full-text