scholarly journals Radiomics side experiments and DAFIT approach in identifying pulmonary hypertension using Cardiac MRI derived radiomics based machine learning models

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sarv Priya ◽  
Tanya Aggarwal ◽  
Caitlin Ward ◽  
Girish Bathla ◽  
Mathews Jacob ◽  
...  

AbstractSide experiments are performed on radiomics models to improve their reproducibility. We measure the impact of myocardial masks, radiomic side experiments and data augmentation for information transfer (DAFIT) approach to differentiate patients with and without pulmonary hypertension (PH) using cardiac MRI (CMRI) derived radiomics. Feature extraction was performed from the left ventricle (LV) and right ventricle (RV) myocardial masks using CMRI in 82 patients (42 PH and 40 controls). Various side study experiments were evaluated: Original data without and with intraclass correlation (ICC) feature-filtering and DAFIT approach (without and with ICC feature-filtering). Multiple machine learning and feature selection strategies were evaluated. Primary analysis included all PH patients with subgroup analysis including PH patients with preserved LVEF (≥ 50%). For both primary and subgroup analysis, DAFIT approach without feature-filtering was the highest performer (AUC 0.957–0.958). ICC approaches showed poor performance compared to DAFIT approach. The performance of combined LV and RV masks was superior to individual masks alone. There was variation in top performing models across all approaches (AUC 0.862–0.958). DAFIT approach with features from combined LV and RV masks provide superior performance with poor performance of feature filtering approaches. Model performance varies based upon the feature selection and model combination.

2021 ◽  
Author(s):  
Sarv Priya ◽  
Tanya Aggarwal ◽  
Caitlin Ward ◽  
Girish Bathla ◽  
Mathews Jacob ◽  
...  

Abstract Side experiments are performed on radiomics models to improve their reproducibility. We measure the impact of myocardial masks, radiomic side experiments and data augmentation for information transfer (DAFIT) approach to differentiate patients with and without pulmonary hypertension (PH) using cardiac MRI (CMRI) derived radiomics. Feature extraction was performed from the left ventricle (LV) and right ventricle (RV) myocardial masks using CMRI in 82 patients (42 PH and 40 controls). Various side study experiments were evaluated: Original data without and with intraclass correlation (ICC) feature-filtering and DAFIT approach (without and with ICC feature-filtering). Multiple machine learning and feature selection strategies were evaluated. Primary analysis included all PH patients with subgroup analysis including PH patients with preserved LVEF (≥ 50%). For both primary and subgroup analysis, DAFIT approach without feature-filtering was the highest performer (AUC 0.957–0.958). ICC approaches showed poor performance compared to DAFIT approach. The performance of combined LV and RV masks was superior to individual masks alone. There was variation in top performing models across all approaches (AUC 0.862–0.958). DAFIT approach with features from combined LV and RV masks provide superior performance with poor performance of feature filtering approaches. Model performance varies based upon the feature selection and model combination.


Electronics ◽  
2021 ◽  
Vol 10 (17) ◽  
pp. 2099
Author(s):  
Paweł Ziemba ◽  
Jarosław Becker ◽  
Aneta Becker ◽  
Aleksandra Radomska-Zalas ◽  
Mateusz Pawluk ◽  
...  

One of the important research problems in the context of financial institutions is the assessment of credit risk and the decision to whether grant or refuse a loan. Recently, machine learning based methods are increasingly employed to solve such problems. However, the selection of appropriate feature selection technique, sampling mechanism, and/or classifiers for credit decision support is very challenging, and can affect the quality of the loan recommendations. To address this challenging task, this article examines the effectiveness of various data science techniques in issue of credit decision support. In particular, processing pipeline was designed, which consists of methods for data resampling, feature discretization, feature selection, and binary classification. We suggest building appropriate decision models leveraging pertinent methods for binary classification, feature selection, as well as data resampling and feature discretization. The selected models’ feasibility analysis was performed through rigorous experiments on real data describing the client’s ability for loan repayment. During experiments, we analyzed the impact of feature selection on the results of binary classification, and the impact of data resampling with feature discretization on the results of feature selection and binary classification. After experimental evaluation, we found that correlation-based feature selection technique and random forest classifier yield the superior performance in solving underlying problem.


2021 ◽  
Vol 10 (9) ◽  
pp. 1921
Author(s):  
Sarv Priya ◽  
Tanya Aggarwal ◽  
Caitlin Ward ◽  
Girish Bathla ◽  
Mathews Jacob ◽  
...  

The role of reliable, non-invasive imaging-based recognition of pulmonary hypertension (PH) remains a diagnostic challenge. The aim of the current pilot radiomics study was to assess the diagnostic performance of cardiac MRI (cMRI)-based texture features to accurately predict PH. The study involved IRB-approved retrospective analysis of cMRIs from 72 patients (42 PH and 30 healthy controls) for the primary analysis. A subgroup analysis was performed including patients from the PH group with left ventricle ejection fraction ≥ 50%. Texture features were generated from mid-left ventricle myocardium using balanced steady-state free precession (bSSFP) cine short-axis imaging. Forty-five different combinations of classifier models and feature selection techniques were evaluated. Model performance was assessed using receiver operating characteristic curves. A multilayer perceptron model fitting using full feature sets was the best classifier model for both the primary analysis (AUC 0.862, accuracy 78%) and the subgroup analysis (AUC 0.918, accuracy 80%). Model performance demonstrated considerable variation between the models (AUC 0.523–0.918) based on the chosen model–feature selection combination. Cardiac MRI-based radiomics recognition of PH using texture features is feasible, even with preserved left ventricular ejection fractions.


Text mining utilizes machine learning (ML) and natural language processing (NLP) for text implicit knowledge recognition, such knowledge serves many domains as translation, media searching, and business decision making. Opinion mining (OM) is one of the promised text mining fields, which are used for polarity discovering via text and has terminus benefits for business. ML techniques are divided into two approaches: supervised and unsupervised learning, since we herein testified an OM feature selection(FS)using four ML techniques. In this paper, we had implemented number of experiments via four machine learning techniques on the same three Arabic language corpora. This paper aims at increasing the accuracy of opinion highlighting on Arabic language, by using enhanced feature selection approaches. FS proposed model is adopted for enhancing opinion highlighting purpose. The experimental results show the outperformance of the proposed approaches in variant levels of supervisory,i.e. different techniques via distinct data domains. Multiple levels of comparison are carried out and discussed for further understanding of the impact of proposed model on several ML techniques.


2021 ◽  
Vol 42 (Supplement_1) ◽  
Author(s):  
A Briasoulis ◽  
S Moustakidis ◽  
A Tzani ◽  
I Doulamis ◽  
P Kampaktsis

Abstract Background Models based on traditional statistics for the prediction of outcomes after heart transplantation (HT) have moderate accuracy. We sought to develop and validate state-of-the-art machine learning (ML) models to predict mortality and acute rejection after contemporary HT. Methods We included adult HT recipients from the UNOS database between 2010–2018 using solely pre-transplant clinical and laboratory variables. The study cohort was randomly split in a derivation and a validation cohort with a 3:1 ratio. An effective feature selection algorithm was used to identify strong predictors of 1-year mortality and rejection in the training cohort. Results were used to train the ML models, which were then internally tested using the validation cohort. LIME explainability analysis was used for the best performing ML model. A similar subgroup analysis was performed for 3- and 5-year survival. Results The study cohort comprised of 18,625 patients (53±13 years, 73% males). At 1-year after cardiac transplant, there were 2,334 (12.5%) deaths. Out of a total of 134 pre-transplant variables, 39 and 27 were selected as highly predictive of 1-year mortality and acute rejection respectively, and were used in the ML models. Areas under the curve for the prediction of 1-year survival were 0.689, 0.642, 0.649, 0.637, 0.526 for the Adaboost, Logistic Regression, Decision Tree, Support Vector Machine and K-nearest neighbor models respectively, whereas the IMPACT score had an AUC of 0.569. For the prediction of 1-year acute rejection, Adaboost achieved the highest predictive performance (AUC 0.629). LIME explainability analysis identified the relative impact of the 10 strongest predictors of 1-year mortality and acute rejection. Subgroup analysis using a similar methodology for 3- and 5-year survival yielded AUC of 0.609 and 0.610 using 31 and 91 selected variables respectively. Conclusion ML models created and validated using a contemporary cohort of the UNOS database showed improved accuracy in predicting survival and acute rejection after HT. FUNDunding Acknowledgement Type of funding sources: None.


Author(s):  
A. B Yusuf ◽  
R. M Dima ◽  
S. K Aina

Breast cancer is the second most commonly diagnosed cancer in women throughout the world. It is on the rise, especially in developing countries, where the majority of cases are discovered late. Breast cancer develops when cancerous tumors form on the surface of the breast cells. The absence of accurate prognostic models to assist physicians recognize symptoms early makes it difficult to develop a treatment plan that would help patients live longer. However, machine learning techniques have recently been used to improve the accuracy and speed of breast cancer diagnosis. If the accuracy is flawless, the model will be more efficient, and the solution to breast cancer diagnosis will be better. Nevertheless, the primary difficulty for systems developed to detect breast cancer using machine-learning models is attaining the greatest classification accuracy and picking the most predictive feature useful for increasing accuracy. As a result, breast cancer prognosis remains a difficulty in today's society. This research seeks to address a flaw in an existing technique that is unable to enhance classification of continuous-valued data, particularly its accuracy and the selection of optimal features for breast cancer prediction. In order to address these issues, this study examines the impact of outliers and feature reduction on the Wisconsin Diagnostic Breast Cancer Dataset, which was tested using seven different machine learning algorithms. The results show that Logistic Regression, Random Forest, and Adaboost classifiers achieved the greatest accuracy of 99.12%, on removal of outliers from the dataset. Also, this filtered dataset with feature selection, on the other hand, has the greatest accuracy of 100% and 99.12% with Random Forest and Gradient boost classifiers, respectively. When compared to other state-of-the-art approaches, the two suggested strategies outperformed the unfiltered data in terms of accuracy. The suggested architecture might be a useful tool for radiologists to reduce the number of false negatives and positives. As a result, the efficiency of breast cancer diagnosis analysis will be increased.


Author(s):  
Anita Ramachandran ◽  
Adarsh Ramesh ◽  
Aditya Sukhlecha ◽  
Avtansh Pandey ◽  
Anupama Karuppiah

The application of machine learning techniques to detect and classify falls is a prominent area of research in the domain of intelligent assisted living systems. Machine learning (ML) based solutions for fall detection systems built on wearable devices use various sources of information such inertial motion units (IMU), vital signs, acoustic or channel state information parameters. Most existing research rely on only one of these sources; however, a need to do more experimenation to observe the efficiency of the ML classifiers while coupling features from diverse sources, was felt. In addition, fall detection systems based on wearable devices, require intelligent feature engineering and selection for dimensionality reduction, so as to reduce the computational complexity of the devices. In this paper we do a comprehensive performance analysis of ML classifiers for fall detection, on a dataset we collected. The analysis includes the impact of the following aspects on the performance of ML classifiers for fall detection: (i) using a combination of features from 2 sensors-an IMU sensor and a heart rate sensor, (ii) feature engineering and feature selection based on statistical methods, and (iii) using ensemble techniques for fall detection. We find that the inclusion of heart rate along with IMU sensor parameters improves the accuracy of fall detection. The conclusions from our experimentations on feature selection and ensemble analysis can serve as inputs for researchers designing wearable device-based fall detection systems.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254720
Author(s):  
Maritza Mera-Gaona ◽  
Ursula Neumann ◽  
Rubiel Vargas-Canas ◽  
Diego M. López

Handling missing values is a crucial step in preprocessing data in Machine Learning. Most available algorithms for analyzing datasets in the feature selection process and classification or estimation process analyze complete datasets. Consequently, in many cases, the strategy for dealing with missing values is to use only instances with full data or to replace missing values with a mean, mode, median, or a constant value. Usually, discarding missing samples or replacing missing values by means of fundamental techniques causes bias in subsequent analyzes on datasets. Aim: Demonstrate the positive impact of multivariate imputation in the feature selection process on datasets with missing values. Results: We compared the effects of the feature selection process using complete datasets, incomplete datasets with missingness rates between 5 and 50%, and imputed datasets by basic techniques and multivariate imputation. The feature selection algorithms used are well-known methods. The results showed that the datasets imputed by multivariate imputation obtained the best results in feature selection compared to datasets imputed by basic techniques or non-imputed incomplete datasets. Conclusions: Considering the results obtained in the evaluation, applying multivariate imputation by MICE reduces bias in the feature selection process.


2019 ◽  
Vol 35 (20) ◽  
pp. 3989-3995 ◽  
Author(s):  
Hongjian Li ◽  
Jiangjun Peng ◽  
Pavel Sidorov ◽  
Yee Leung ◽  
Kwong-Sak Leung ◽  
...  

Abstract Motivation Studies have shown that the accuracy of random forest (RF)-based scoring functions (SFs), such as RF-Score-v3, increases with more training samples, whereas that of classical SFs, such as X-Score, does not. Nevertheless, the impact of the similarity between training and test samples on this matter has not been studied in a systematic manner. It is therefore unclear how these SFs would perform when only trained on protein-ligand complexes that are highly dissimilar or highly similar to the test set. It is also unclear whether SFs based on machine learning algorithms other than RF can also improve accuracy with increasing training set size and to what extent they learn from dissimilar or similar training complexes. Results We present a systematic study to investigate how the accuracy of classical and machine-learning SFs varies with protein-ligand complex similarities between training and test sets. We considered three types of similarity metrics, based on the comparison of either protein structures, protein sequences or ligand structures. Regardless of the similarity metric, we found that incorporating a larger proportion of similar complexes to the training set did not make classical SFs more accurate. In contrast, RF-Score-v3 was able to outperform X-Score even when trained on just 32% of the most dissimilar complexes, showing that its superior performance owes considerably to learning from dissimilar training complexes to those in the test set. In addition, we generated the first SF employing Extreme Gradient Boosting (XGBoost), XGB-Score, and observed that it also improves with training set size while outperforming the rest of SFs. Given the continuous growth of training datasets, the development of machine-learning SFs has become very appealing. Availability and implementation https://github.com/HongjianLi/MLSF Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document