recursive feature elimination
Recently Published Documents


TOTAL DOCUMENTS

388
(FIVE YEARS 281)

H-INDEX

22
(FIVE YEARS 8)

Author(s):  
Touria Hamim ◽  
Faouzia Benabbou ◽  
Nawal Sael

The student profile has become an important component of education systems. Many systems objectives, as e-recommendation, e-orientation, e-recruitment and dropout prediction are essentially based on the profile for decision support. Machine learning plays an important role in this context and several studies have been carried out either for classification, prediction or clustering purpose. In this paper, the authors present a comparative study between different boosting algorithms which have been used successfully in many fields and for many purposes. In addition, the authors applied feature selection methods Fisher Score, Information Gain combined with Recursive Feature Elimination to enhance the preprocessing task and models’ performances. Using multi-label dataset predict the class of the student performance in mathematics, this article results show that the Light Gradient Boosting Machine (LightGBM) algorithm achieved the best performance when using Information gain with Recursive Feature Elimination method compared to the other boosting algorithms.


2022 ◽  
Vol 3 (2) ◽  
pp. 1-16
Author(s):  
Md Juber Rahman ◽  
Bashir I. Morshed

Artificial Intelligence-enabled applications on edge devices have the potential to revolutionize disease detection and monitoring in future smart health (sHealth) systems. In this study, we investigated a minimalist approach for the severity classification, severity estimation, and progression monitoring of obstructive sleep apnea (OSA) in a home environment using wearables. We used the recursive feature elimination technique to select the best feature set of 70 features from a total of 200 features extracted from polysomnogram. We used a multi-layer perceptron model to investigate the performance of OSA severity classification with all the ranked features to a subset of features available from either Electroencephalography or Heart Rate Variability (HRV) and time duration of SpO2 level. The results indicate that using only computationally inexpensive features from HRV and SpO2, an area under the curve of 0.91 and an accuracy of 83.97% can be achieved for the severity classification of OSA. For estimation of the apnea-hypopnea index, the accuracy of RMSE = 4.6 and R-squared value = 0.71 have been achieved in the test set using only ranked HRV and SpO2 features. The Wilcoxon-signed-rank test indicates a significant change (p < 0.05) in the selected feature values for a progression in the disease over 2.5 years. The method has the potential for integration with edge computing for deployment on everyday wearables. This may facilitate the preliminary severity estimation, monitoring, and management of OSA patients and reduce associated healthcare costs as well as the prevalence of untreated OSA.


2022 ◽  
Vol 3 ◽  
Author(s):  
Luís Vinícius de Moura ◽  
Christian Mattjie ◽  
Caroline Machado Dartora ◽  
Rodrigo C. Barros ◽  
Ana Maria Marques da Silva

Both reverse transcription-PCR (RT-PCR) and chest X-rays are used for the diagnosis of the coronavirus disease-2019 (COVID-19). However, COVID-19 pneumonia does not have a defined set of radiological findings. Our work aims to investigate radiomic features and classification models to differentiate chest X-ray images of COVID-19-based pneumonia and other types of lung patterns. The goal is to provide grounds for understanding the distinctive COVID-19 radiographic texture features using supervised ensemble machine learning methods based on trees through the interpretable Shapley Additive Explanations (SHAP) approach. We use 2,611 COVID-19 chest X-ray images and 2,611 non-COVID-19 chest X-rays. After segmenting the lung in three zones and laterally, a histogram normalization is applied, and radiomic features are extracted. SHAP recursive feature elimination with cross-validation is used to select features. Hyperparameter optimization of XGBoost and Random Forest ensemble tree models is applied using random search. The best classification model was XGBoost, with an accuracy of 0.82 and a sensitivity of 0.82. The explainable model showed the importance of the middle left and superior right lung zones in classifying COVID-19 pneumonia from other lung patterns.


2022 ◽  
Vol 12 (1) ◽  
pp. 114
Author(s):  
Chao Lu ◽  
Jiayin Song ◽  
Hui Li ◽  
Wenxing Yu ◽  
Yangquan Hao ◽  
...  

Osteoarthritis (OA) is the most common joint disease associated with pain and disability. OA patients are at a high risk for venous thrombosis (VTE). Here, we developed an interpretable machine learning (ML)-based model to predict VTE risk in patients with OA. To establish a prediction model, we used six ML algorithms, of which 35 variables were employed. Recursive feature elimination (RFE) was used to screen the most related clinical variables associated with VTE. SHapley additive exPlanations (SHAP) were applied to interpret the ML mode and determine the importance of the selected features. Overall, 3169 patients with OA (average age: 66.52 ± 7.28 years) were recruited from Xi’an Honghui Hospital. Of these, 352 and 2817 patients were diagnosed with and without VTE, respectively. The XGBoost algorithm showed the best performance. According to the RFE algorithms, 15 variables were retained for further modeling with the XGBoost algorithm. The top three predictors were Kellgren–Lawrence grade, age, and hypertension. Our study showed that the XGBoost model with 15 variables has a high potential to predict VTE risk in patients with OA.


Algorithms ◽  
2022 ◽  
Vol 15 (1) ◽  
pp. 21
Author(s):  
Consolata Gakii ◽  
Paul O. Mireji ◽  
Richard Rimiru

Analysis of high-dimensional data, with more features () than observations () (), places significant demand in cost and memory computational usage attributes. Feature selection can be used to reduce the dimensionality of the data. We used a graph-based approach, principal component analysis (PCA) and recursive feature elimination to select features for classification from RNAseq datasets from two lung cancer datasets. The selected features were discretized for association rule mining where support and lift were used to generate informative rules. Our results show that the graph-based feature selection improved the performance of sequential minimal optimization (SMO) and multilayer perceptron classifiers (MLP) in both datasets. In association rule mining, features selected using the graph-based approach outperformed the other two feature-selection techniques at a support of 0.5 and lift of 2. The non-redundant rules reflect the inherent relationships between features. Biological features are usually related to functions in living systems, a relationship that cannot be deduced by feature selection and classification alone. Therefore, the graph-based feature-selection approach combined with rule mining is a suitable way of selecting and finding associations between features in high-dimensional RNAseq data.


2022 ◽  
Author(s):  
Si-tong Liu ◽  
You Zhang ◽  
Xin-gui Wu ◽  
Chang-xing Lu ◽  
Qi-Ping Hu

Abstract Background: Stroke is the second most common cause of death worldwide and the leading cause of long-term severe disability with neurological impairment worsening within hours after stroke onset and being especially involved with motor function. So far, there are no established and reliable biomarkers to prognose stroke. Early detection of biomarkers that can prognose stroke is of great importance for clinical intervention and prevention of clinical deterioration of stroke.Methods: TGSE119121 dataset was retrieved from the Gene Expression Integrated Database (Gene Expression Omnibus, GEO) and weighted gene co-expression network analysis (WGCNA) was conducted to identify the key modules that could regulate disease progression. Moreover, functional enrichment analysis was conducted to study the biological functions of the key module genes. The GSE16561 dataset was further analyzed by the Support Vector Machines coupled with Recursive Feature Elimination (SVM-RFE )algorithm to identify the top genes regulating disease progression. The hub genes revealed by WGCNA were associated with disease progression using the receiver operating characteristic curve (ROC) analysis. Subsequently, functional enrichment of the hub genes was performed by deploying gene set variation analysis (GSVA). The changes at gene level were transformed into the changes at pathway level to identify the biological function of each sample. Finally, the expression level of the hub gene in the rat infarction model of MCAO was measured using RT-qPCR for validation. Results: WGCNA analysis revealed four hub genes: DEGS1, HSDL2, ST8SIA4 and STK3. The result of GSVA showed that the hub genes were involved in stroke progression by regulating the p53 signal pathway, the PI3K signal pathway, and the inflammatory response pathway. The results of RT-qPCR indicated that the expression of the four HUB genes was increased significantly in the rat model of MCAO.Conclusion: Several genes, such as DEGS, HSDL2, ST8SIA4 and STK3, were identified and associated with the progression of the disease. Moreover, it was hypothesized that these genes may be involved in the progression stroke by regulating the P53 signal, the PI3K signal, and the inflammatory response pathway, respectively. These genes have potential prognostic value and may serve as biomarkers for predicting stroke progression. The early identification of the patients at risk of progression is essential to prevent clinical deterioration and provide a reference for future research.


Diagnostics ◽  
2022 ◽  
Vol 12 (1) ◽  
pp. 116
Author(s):  
Vijendra Singh ◽  
Vijayan K. Asari ◽  
Rajkumar Rajasekaran

Diabetes and high blood pressure are the primary causes of Chronic Kidney Disease (CKD). Glomerular Filtration Rate (GFR) and kidney damage markers are used by researchers around the world to identify CKD as a condition that leads to reduced renal function over time. A person with CKD has a higher chance of dying young. Doctors face a difficult task in diagnosing the different diseases linked to CKD at an early stage in order to prevent the disease. This research presents a novel deep learning model for the early detection and prediction of CKD. This research objectives to create a deep neural network and compare its performance to that of other contemporary machine learning techniques. In tests, the average of the associated features was used to replace all missing values in the database. After that, the neural network’s optimum parameters were fixed by establishing the parameters and running multiple trials. The foremost important features were selected by Recursive Feature Elimination (RFE). Hemoglobin, Specific Gravity, Serum Creatinine, Red Blood Cell Count, Albumin, Packed Cell Volume, and Hypertension were found as key features in the RFE. Selected features were passed to machine learning models for classification purposes. The proposed Deep neural model outperformed the other four classifiers (Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Logistic regression, Random Forest, and Naive Bayes classifier) by achieving 100% accuracy. The proposed approach could be a useful tool for nephrologists in detecting CKD.


BioMed ◽  
2022 ◽  
Vol 2 (1) ◽  
pp. 13-26
Author(s):  
Avishek Chatterjee ◽  
Guus Wilmink ◽  
Henry Woodruff ◽  
Philippe Lambin

We conducted a systematic survey of COVID-19 endpoint prediction literature to: (a) identify publications that include data that adhere to FAIR (findability, accessibility, interoperability, and reusability) principles and (b) develop and reuse mortality prediction models that best generalize to these datasets. The largest such cohort data we knew of was used for model development. The associated published prediction model was subjected to recursive feature elimination to find a minimal logistic regression model which had statistically and clinically indistinguishable predictive performance. This model could still not be applied to the four external validation sets that were identified, due to complete absence of needed model features in some external sets. Thus, a generalizable model (GM) was built which could be applied to all four external validation sets. An age-only model was used as a benchmark, as it is the simplest, effective, and robust predictor of mortality currently known in COVID-19 literature. While the GM surpassed the age-only model in three external cohorts, for the fourth external cohort, there was no statistically significant difference. This study underscores: (1) the paucity of FAIR data being shared by researchers despite the glut of COVID-19 prediction models and (2) the difficulty of creating any model that consistently outperforms an age-only model due to the cohort diversity of available datasets.


2022 ◽  
Vol 11 ◽  
Author(s):  
Yanghua Fan ◽  
Panpan Liu ◽  
Yiping Li ◽  
Feng Liu ◽  
Yu He ◽  
...  

BackgroundAccurate preoperative differentiation of intracranial hemangiopericytoma and angiomatous meningioma can greatly assist operation plan making and prognosis prediction. In this study, a clini-radiomic model combining radiomic and clinical features was used to distinguish intracranial hemangiopericytoma and hemangioma meningioma preoperatively.MethodsA total of 147 patients with intracranial hemangiopericytoma and 73 patients with angiomatous meningioma from the Tiantan Hospital were retrospectively reviewed and randomly assigned to training and validation sets. Radiomic features were extracted from MR images, the elastic net and recursive feature elimination algorithms were applied to select radiomic features for constructing a fusion radiomic model. Subsequently, multivariable logistic regression analysis was used to construct a clinical model, then a clini-radiomic model incorporating the fusion radiomic model and clinical features was constructed for individual predictions. The calibration, discriminating capacity, and clinical usefulness were also evaluated.ResultsSix significant radiomic features were selected to construct a fusion radiomic model that achieved an area under the curve (AUC) value of 0.900 and 0.900 in the training and validation sets, respectively. A clini-radiomic model that incorporated the radiomic model and clinical features was constructed and showed good discrimination and calibration, with an AUC of 0.920 in the training set and 0.910 in the validation set. The analysis of the decision curve showed that the fusion radiomic model and clini-radiomic model were clinically useful.ConclusionsOur clini-radiomic model showed great performance and high sensitivity in the differential diagnosis of intracranial hemangiopericytoma and angiomatous meningioma, and could contribute to non-invasive development of individualized diagnosis and treatment for these patients.


2022 ◽  
Vol 12 ◽  
Author(s):  
Bin Zhu ◽  
Jianlei Zhao ◽  
Mingnan Cao ◽  
Wanliang Du ◽  
Liuqing Yang ◽  
...  

Background: Thrombolysis with r-tPA is recommended for patients after acute ischemic stroke (AIS) within 4.5 h of symptom onset. However, only a few patients benefit from this therapeutic regimen. Thus, we aimed to develop an interpretable machine learning (ML)–based model to predict the thrombolysis effect of r-tPA at the super-early stage.Methods: A total of 353 patients with AIS were divided into training and test data sets. We then used six ML algorithms and a recursive feature elimination (RFE) method to explore the relationship among the clinical variables along with the NIH stroke scale score 1 h after thrombolysis treatment. Shapley additive explanations and local interpretable model–agnostic explanation algorithms were applied to interpret the ML models and determine the importance of the selected features.Results: Altogether, 353 patients with an average age of 63.0 (56.0–71.0) years were enrolled in the study. Of these patients, 156 showed a favorable thrombolysis effect and 197 showed an unfavorable effect. A total of 14 variables were enrolled in the modeling, and 6 ML algorithms were used to predict the thrombolysis effect. After RFE screening, seven variables under the gradient boosting decision tree (GBDT) model (area under the curve = 0.81, specificity = 0.61, sensitivity = 0.9, and F1 score = 0.79) demonstrated the best performance. Of the seven variables, activated partial thromboplastin clotting time (time), B-type natriuretic peptide, and fibrin degradation products were the three most important clinical characteristics that might influence r-tPA efficiency.Conclusion: This study demonstrated that the GBDT model with the seven variables could better predict the early thrombolysis effect of r-tPA.


Sign in / Sign up

Export Citation Format

Share Document