Abstract WMP101: Prediction of Clinical Outcome in Supratentorial Intracerebral Hemorrhage: Application of Baseline Ct Scan Radiomics Feature Extraction and Machine Learning Classifiers

Stroke ◽  
2020 ◽  
Vol 51 (Suppl_1) ◽  
Author(s):  
Seyedmehdi Payabvash ◽  
Julian Acosta ◽  
Stefan Haider ◽  
Rommell Noche ◽  
Elayna Kirsch ◽  
...  

Aim: Radiomics refers to automatic extraction of numerous quantitative features from medical images to supplement visual assessment. Machine-learning algorithms provide a suitable statistical methodology for devising predictive classifiers based on large radiomics datasets. We aimed to predict intracerebral hemorrhage (ICH) outcome by applying machine-learning classifiers to both clinical data and hematoma radiomics features. Methods: Patients enrolled in the Yale Longitudinal Study of ICH were included if they had (1) spontaneous supratentorial ICH, (2) baseline CT scan, (3) known admission Glasgow Coma Scale (GCS), and (4) 3-month modified Rankin Scale (mRS). A total of 1134 radiomics features related to the intensity, shape, texture, and waveform were extracted from manually segmented ICH lesions on baseline CT. Clinical variables were patients’ age, gender, GCS, presence of intraventricular hemorrhage, and thalamic ICH. We calculated the averaged receiver operating characteristics (ROC) area under curve (AUC) in outcome prediction among 100 repeats of 5-fold cross-validation (x500 iterations) for different combinations of feature selection and machine-learning algorithms. Results: A total of 119 ICH patients were included, of whom 60 had poor outcome (mRS ≥4). Among different combinations, lasso regression feature selection and partial least square (PLS) classification model yielded the highest accuracy in outcome prediction (Figure), with an averaged (95% confidence interval) ROC AUC of 0.86 (0.83 - 0.89) using clinical variables “only”, versus 0.92 (0.89 - 0.95) using combination of clinical variables and 54 radiomics features selected by lasso regression. Among radiomics features selected by lasso regression, ICH lesion flatness had the highest variable importance and was the only shape feature selected. Conclusion: Addition of ICH lesion radiomics to clinical variables using machine-learning models can improve outcome prediction.

Author(s):  
Soundariya R.S. ◽  
◽  
Tharsanee R.M. ◽  
Vishnupriya B ◽  
Ashwathi R ◽  
...  

Corona virus disease (Covid - 19) has started to promptly spread worldwide from April 2020 till date, leading to massive death and loss of lives of people across various countries. In accordance to the advices of WHO, presently the diagnosis is implemented by Reverse Transcription Polymerase Chain Reaction (RT- PCR) testing, that incurs four to eight hours’ time to process test samples and adds 48 hours to categorize whether the samples are positive or negative. It is obvious that laboratory tests are time consuming and hence a speedy and prompt diagnosis of the disease is extremely needed. This can be attained through several Artificial Intelligence methodologies for prior diagnosis and tracing of corona diagnosis. Those methodologies are summarized into three categories: (i) Predicting the pandemic spread using mathematical models (ii) Empirical analysis using machine learning models to forecast the global corona transition by considering susceptible, infected and recovered rate. (iii) Utilizing deep learning architectures for corona diagnosis using the input data in the form of X-ray images and CT scan images. When X-ray and CT scan images are taken into account, supplementary data like medical signs, patient history and laboratory test results can also be considered while training the learning model and to advance the testing efficacy. Thus the proposed investigation summaries the several mathematical models, machine learning algorithms and deep learning frameworks that can be executed on the datasets to forecast the traces of COVID-19 and detect the risk factors of coronavirus.


Author(s):  
Harsha A K

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.


2020 ◽  
Vol 9 (9) ◽  
pp. 507
Author(s):  
Sanjiwana Arjasakusuma ◽  
Sandiaga Swahyu Kusuma ◽  
Stuart Phinn

Machine learning has been employed for various mapping and modeling tasks using input variables from different sources of remote sensing data. For feature selection involving high- spatial and spectral dimensionality data, various methods have been developed and incorporated into the machine learning framework to ensure an efficient and optimal computational process. This research aims to assess the accuracy of various feature selection and machine learning methods for estimating forest height using AISA (airborne imaging spectrometer for applications) hyperspectral bands (479 bands) and airborne light detection and ranging (lidar) height metrics (36 metrics), alone and combined. Feature selection and dimensionality reduction using Boruta (BO), principal component analysis (PCA), simulated annealing (SA), and genetic algorithm (GA) in combination with machine learning algorithms such as multivariate adaptive regression spline (MARS), extra trees (ET), support vector regression (SVR) with radial basis function, and extreme gradient boosting (XGB) with trees (XGbtree and XGBdart) and linear (XGBlin) classifiers were evaluated. The results demonstrated that the combinations of BO-XGBdart and BO-SVR delivered the best model performance for estimating tropical forest height by combining lidar and hyperspectral data, with R2 = 0.53 and RMSE = 1.7 m (18.4% of nRMSE and 0.046 m of bias) for BO-XGBdart and R2 = 0.51 and RMSE = 1.8 m (15.8% of nRMSE and −0.244 m of bias) for BO-SVR. Our study also demonstrated the effectiveness of BO for variables selection; it could reduce 95% of the data to select the 29 most important variables from the initial 516 variables from lidar metrics and hyperspectral data.


2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Jiamei Liu ◽  
Cheng Xu ◽  
Weifeng Yang ◽  
Yayun Shu ◽  
Weiwei Zheng ◽  
...  

Abstract Binary classification is a widely employed problem to facilitate the decisions on various biomedical big data questions, such as clinical drug trials between treated participants and controls, and genome-wide association studies (GWASs) between participants with or without a phenotype. A machine learning model is trained for this purpose by optimizing the power of discriminating samples from two groups. However, most of the classification algorithms tend to generate one locally optimal solution according to the input dataset and the mathematical presumptions of the dataset. Here we demonstrated from the aspects of both disease classification and feature selection that multiple different solutions may have similar classification performances. So the existing machine learning algorithms may have ignored a horde of fishes by catching only a good one. Since most of the existing machine learning algorithms generate a solution by optimizing a mathematical goal, it may be essential for understanding the biological mechanisms for the investigated classification question, by considering both the generated solution and the ignored ones.


Sign in / Sign up

Export Citation Format

Share Document