scholarly journals Texture Analysis of Fat-Suppressed T2-Weighted Magnetic Resonance Imaging and Use of Machine Learning to Discriminate Nasal and Paranasal Sinus Small Round Malignant Cell Tumors

2021 ◽  
Vol 11 ◽  
Author(s):  
Chen Chen ◽  
Yuhui Qin ◽  
Junying Cheng ◽  
Fabao Gao ◽  
Xiaoyue Zhou

ObjectiveWe used texture analysis and machine learning (ML) to classify small round cell malignant tumors (SRCMTs) and Non-SRCMTs of nasal and paranasal sinus on fat-suppressed T2 weighted imaging (Fs-T2WI).MaterialsPreoperative MRI scans of 164 patients from 1 January 2018 to 1 January 2021 diagnosed with SRCMTs and Non-SRCMTs were included in this study. A total of 271 features were extracted from each regions of interest. Datasets were randomly divided into two sets, including a training set (∼70%) and a test set (∼30%). The Pearson correlation coefficient (PCC) and principal component analysis (PCA) methods were performed to reduce dimensions, and the Analysis of Variance (ANOVA), Kruskal-Wallis (KW), and Recursive Feature Elimination (RFE) and Relief were performed for feature selections. Classifications were performed using 10 ML classifiers. Results were evaluated using a leave one out cross-validation analysis.ResultsWe compared the AUC of all pipelines on the validation dataset with FeAture Explorer (FAE) software. The pipeline using a PCC dimension reduction, relief feature selection, and gaussian process (GP) classifier yielded the highest area under the curve (AUC) using 15 features. When the “one-standard error” rule was used, FAE also produced a simpler model with 13 features, including S(5,-5)SumAverg, S(3,0)InvDfMom, Skewness, WavEnHL_s-3, Horzl_GlevNonU, Horzl_RLNonUni, 135dr_GlevNonU, WavEnLL_s-3, Teta4, Teta2, S(5,5)DifVarnc, Perc.01%, and WavEnLH_s-2. The AUCs of the training/validation/test datasets were 1.000/0.965/0.979, and the accuracies, sensitivities, and specificities were 0.890, 0.880, and 0.920, respectively. The best algorithm was GP whose AUCs of the training/validation/test datasets by the two-dimensional reduction methods and four feature selection methods were greater than approximately 0.800. Especially, the AUCs of different datasets were greater than approximately 0.900 using the PCC, RFE/Relief, and GP algorithms.ConclusionsWe demonstrated the feasibility of combining artificial intelligence and the radiomics from Fs-T2WI to differentially diagnose SRCMTs and Non-SRCMTs. This non-invasive approach could be very promising in clinical oncology.

2021 ◽  
Author(s):  
Isaac Shiri ◽  
Yazdan Salimi ◽  
Abdollah Saberi ◽  
Masoumeh Pakbin ◽  
Ghasem Hajianfar ◽  
...  

AbstractPurposeTo derive and validate an effective radiomics-based model for differentiation of COVID-19 pneumonia from other lung diseases using a very large cohort of patients.MethodsWe collected 19 private and 5 public datasets, accumulating to 26,307 individual patient images (15,148 COVID-19; 9,657 with other lung diseases e.g. non-COVID-19 pneumonia, lung cancer, pulmonary embolism; 1502 normal cases). Images were automatically segmented using a validated deep learning (DL) model and the results carefully reviewed. Images were first cropped into lung-only region boxes, then resized to 296×216 voxels. Voxel dimensions was resized to 1×1×1mm3 followed by 64-bin discretization. The 108 extracted features included shape, first-order histogram and texture features. Univariate analysis was first performed using simple logistic regression. The thresholds were fixed in the training set and then evaluation performed on the test set. False discovery rate (FDR) correction was applied to the p-values. Z-Score normalization was applied to all features. For multivariate analysis, features with high correlation (R2>0.99) were eliminated first using Pearson correlation. We tested 96 different machine learning strategies through cross-combining 4 feature selectors or 8 dimensionality reduction techniques with 8 classifiers. We trained and evaluated our models using 3 different datasets: 1) the entire dataset (26,307 patients: 15,148 COVID-19; 11,159 non-COVID-19); 2) excluding normal patients in non-COVID-19, and including only RT-PCR positive COVID-19 cases in the COVID-19 class (20,697 patients including 12,419 COVID-19, and 8,278 non-COVID-19)); 3) including only non-COVID-19 pneumonia patients and a random sample of COVID-19 patients (5,582 patients: 3,000 COVID-19, and 2,582 non-COVID-19) to provide balanced classes. Subsequently, each of these 3 datasets were randomly split into 70% and 30% for training and testing, respectively. All various steps, including feature preprocessing, feature selection, and classification, were performed separately in each dataset. Classification algorithms were optimized during training using grid search algorithms. The best models were chosen by a one-standard-deviation rule in 10-fold cross-validation and then were evaluated on the test sets.ResultsIn dataset #1, Relief feature selection and RF classifier combination resulted in the highest performance (Area under the receiver operating characteristic curve (AUC) = 0.99, sensitivity = 0.98, specificity = 0.94, accuracy = 0.96, positive predictive value (PPV) = 0.96, and negative predicted value (NPV) = 0.96). In dataset #2, Recursive Feature Elimination (RFE) feature selection and Random Forest (RF) classifier combination resulted in the highest performance (AUC = 0.99, sensitivity = 0.98, specificity = 0.95, accuracy = 0.97, PPV = 0.96, and NPV = 0.98). In dataset #3, the ANOVA feature selection and RF classifier combination resulted in the highest performance (AUC = 0.98, sensitivity = 0.96, specificity = 0.93, accuracy = 0.94, PPV = 0.93, NPV = 0.96).ConclusionRadiomic features extracted from entire lung combined with machine learning algorithms can enable very effective, routine diagnosis of COVID-19 pneumonia from CT images without the use of any other diagnostic test.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Nicholas Nuechterlein ◽  
Beibin Li ◽  
Abdullah Feroze ◽  
Eric C Holland ◽  
Linda Shapiro ◽  
...  

Abstract Background Combined whole-exome sequencing (WES) and somatic copy number alteration (SCNA) information can separate isocitrate dehydrogenase (IDH)1/2-wildtype glioblastoma into two prognostic molecular subtypes, which cannot be distinguished by epigenetic or clinical features. The potential for radiographic features to discriminate between these molecular subtypes has yet to be established. Methods Radiologic features (n = 35 340) were extracted from 46 multisequence, pre-operative magnetic resonance imaging (MRI) scans of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive (TCIA), all of whom have corresponding WES/SCNA data. We developed a novel feature selection method that leverages the structure of extracted MRI features to mitigate the dimensionality challenge posed by the disparity between a large number of features and the limited patients in our cohort. Six traditional machine learning classifiers were trained to distinguish molecular subtypes using our feature selection method, which was compared to least absolute shrinkage and selection operator (LASSO) feature selection, recursive feature elimination, and variance thresholding. Results We were able to classify glioblastomas into two prognostic subgroups with a cross-validated area under the curve score of 0.80 (±0.03) using ridge logistic regression on the 15-dimensional principle component analysis (PCA) embedding of the features selected by our novel feature selection method. An interrogation of the selected features suggested that features describing contours in the T2 signal abnormality region on the T2-weighted fluid-attenuated inversion recovery (FLAIR) MRI sequence may best distinguish these two groups from one another. Conclusions We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups.


2021 ◽  
Vol 14 (S1) ◽  
Author(s):  
Zishuang Zhang ◽  
Zhi-Ping Liu

Abstract Background Hepatocellular carcinoma (HCC) is one of the most common cancers. The discovery of specific genes severing as biomarkers is of paramount significance for cancer diagnosis and prognosis. The high-throughput omics data generated by the cancer genome atlas (TCGA) consortium provides a valuable resource for the discovery of HCC biomarker genes. Numerous methods have been proposed to select cancer biomarkers. However, these methods have not investigated the robustness of identification with different feature selection techniques. Methods We use six different recursive feature elimination methods to select the gene signiatures of HCC from TCGA liver cancer data. The genes shared in the six selected subsets are proposed as robust biomarkers. Akaike information criterion (AIC) is employed to explain the optimization process of feature selection, which provides a statistical interpretation for the feature selection in machine learning methods. And we use several methods to validate the screened biomarkers. Results In this paper, we propose a robust method for discovering biomarker genes for HCC from gene expression data. Specifically, we implement recursive feature elimination cross-validation (RFE-CV) methods based on six different classication algorithms. The overlaps in the discovered gene sets via different methods are referred as the identified biomarkers. We give an interpretation of the feature selection process based on machine learning using AIC in statistics. Furthermore, the features selected by the backward logistic stepwise regression via AIC minimum theory are completely contained in the identified biomarkers. Through the classification results, the superiority of interpretable robust biomarker discovery method is verified. Conclusions It is found that overlaps among gene subsets contain different quantitative features selected by the RFE-CV of 6 classifiers. The AIC values in the model selection provide a theoretical foundation for the feature selection process of biomarker discovery via machine learning. What’s more, genes containing in more optimally selected subsets make better biological sense and implication. The quality of feature selection is improved by the intersections of biomarkers selected from different classifiers. This is a general method suitable for screening biomarkers of complex diseases from high-throughput data.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Tarun Dhar Diwan ◽  
Siddartha Choubey ◽  
H. S. Hota ◽  
S. B Goyal ◽  
Sajjad Shaukat Jamal ◽  
...  

Identification of anomaly and malicious traffic in the Internet of things (IoT) network is essential for IoT security. Tracking and blocking unwanted traffic flows in the IoT network is required to design a framework for the identification of attacks more accurately, quickly, and with less complexity. Many machine learning (ML) algorithms proved their efficiency to detect intrusion in IoT networks. But this ML algorithm suffers many misclassification problems due to inappropriate and irrelevant feature size. In this paper, an in-depth study is presented to address such issues. We have presented lightweight low-cost feature selection IoT intrusion detection techniques with low complexity and high accuracy due to their low computational time. A novel feature selection technique was proposed with the integration of rank-based chi-square, Pearson correlation, and score correlation to extract relevant features out of all available features from the dataset. Then, feature entropy estimation was applied to validate the relationship among all extracted features to identify malicious traffic in IoT networks. Finally, an extreme gradient ensemble boosting approach was used to classify the features in relevant attack types. The simulation is performed on three datasets, i.e., NSL-KDD, USNW-NB15, and CCIDS2017, and results are presented on different test sets. It was observed that on the NSL-KDD dataset, accuracy was approx. 97.48%. Similarly, the accuracy of USNW-NB15 and CCIDS2017 was approx. 99.96% and 99.93%, respectively. Along with that, state-of-the-art comparison is also presented with existing techniques.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Brandi Patrice Smith ◽  
Loretta Sue Auvil ◽  
Michael Welge ◽  
Colleen Bannon Bushell ◽  
Rohit Bhargava ◽  
...  

Abstract Screening agrochemicals and pharmaceuticals for potential liver toxicity is required for regulatory approval and is an expensive and time-consuming process. The identification and utilization of early exposure gene signatures and robust predictive models in regulatory toxicity testing has the potential to reduce time and costs substantially. In this study, comparative supervised machine learning approaches were applied to the rat liver TG-GATEs dataset to develop feature selection and predictive testing. We identified ten gene biomarkers using three different feature selection methods that predicted liver necrosis with high specificity and selectivity in an independent validation dataset from the Microarray Quality Control (MAQC)-II study. Nine of the ten genes that were selected with the supervised methods are involved in metabolism and detoxification (Car3, Crat, Cyp39a1, Dcd, Lbp, Scly, Slc23a1, and Tkfc) and transcriptional regulation (Ablim3). Several of these genes are also implicated in liver carcinogenesis, including Crat, Car3 and Slc23a1. Our biomarker gene signature provides high statistical accuracy and a manageable number of genes to study as indicators to potentially accelerate toxicity testing based on their ability to induce liver necrosis and, eventually, liver cancer.


2021 ◽  
Vol 13 (14) ◽  
pp. 2833
Author(s):  
Xing Wei ◽  
Marcela A. Johnson ◽  
David B. Langston ◽  
Hillary L. Mehl ◽  
Song Li

Hyperspectral sensors combined with machine learning are increasingly utilized in agricultural crop systems for diverse applications, including plant disease detection. This study was designed to identify the most important wavelengths to discriminate between healthy and diseased peanut (Arachis hypogaea L.) plants infected with Athelia rolfsii, the causal agent of peanut stem rot, using in-situ spectroscopy and machine learning. In greenhouse experiments, daily measurements were conducted to inspect disease symptoms visually and to collect spectral reflectance of peanut leaves on lateral stems of plants mock-inoculated and inoculated with A. rolfsii. Spectrum files were categorized into five classes based on foliar wilting symptoms. Five feature selection methods were compared to select the top 10 ranked wavelengths with and without a custom minimum distance of 20 nm. Recursive feature elimination methods outperformed the chi-square and SelectFromModel methods. Adding the minimum distance of 20 nm into the top selected wavelengths improved classification performance. Wavelengths of 501–505, 690–694, 763 and 884 nm were repeatedly selected by two or more feature selection methods. These selected wavelengths can be applied in designing optical sensors for automated stem rot detection in peanut fields. The machine-learning-based methodology can be adapted to identify spectral signatures of disease in other plant-pathogen systems.


Author(s):  
Kristiawan Kristiawan ◽  
Andreas Widjaja

Abstract  — The application of machine learning technology in various industrial fields is currently developing rapidly, including in the retail industry. This study aims to find the most accurate algorithmic model so that it can be used to help retailers choose a store location more precisely. By using several methods such as Pearson Correlation, Chi-Square Features, Recursive Feature Elimination and Tree-based to select features (predictive variables). These features are then used to train and build models using 6 different classification algorithms such as Logistic Regression, K Nearest Neighbor (KNN), Decision Tree, Random Forest, Support Vector Machine (SVM) and Neural Network to classify whether a location is recommended or not as a new store location. Keywords— Application of Machine Learning, Pearson Correlation, Random Forest, Neural Network, Logistic Regression.


Author(s):  
Pooja Rani ◽  
Rajneesh Kumar ◽  
Anurag Jain ◽  
Sunil Kumar Chawla

Machine learning has become an integral part of our life in today's world. Machine learning when applied to real-world applications suffers from the problem of high dimensional data. Data can have unnecessary and redundant features. These unnecessary features affect the performance of classification systems used in prediction. Selection of important features is the first step in developing any decision support system. In this paper, the authors have proposed a hybrid feature selection method GARFE by integrating GA (genetic algorithm) and RFE (recursive feature elimination) algorithms. Efficiency of proposed method is analyzed using support vector machine classifier on the scale of accuracy, sensitivity, specificity, precision, F-measure, and execution time parameters. Proposed GARFE method is also compared to eight other feature selection methods. Results demonstrate that the proposed GARFE method has increased the performance of classification systems by removing irrelevant and redundant features.


2020 ◽  
Vol 493 (2) ◽  
pp. 1842-1854
Author(s):  
Haitao Lin ◽  
Xiangru Li ◽  
Ziying Luo

ABSTRACT It is an active topic to investigate the schemes based on machine learning (ML) methods for detecting pulsars as the data volume growing exponentially in modern surveys. To improve the detection performance, input features into an ML model should be investigated specifically. In the existing pulsar detection researches based on ML methods, there are mainly two kinds of feature designs: the empirical features and statistical features. Due to the combinational effects from multiple features, however, there exist some redundancies and even irrelevant components in the available features, which can reduce the accuracy of a pulsar detection model. Therefore, it is essential to select a subset of relevant features from a set of available candidate features and known as feature selection. In this work, two feature selection algorithms –Grid Search (GS) and Recursive Feature Elimination (RFE) – are proposed to improve the detection performance by removing the redundant and irrelevant features. The algorithms were evaluated on the Southern High Time Resolution University survey (HTRU-S) with five pulsar detection models. The experimental results verify the effectiveness and efficiency of our proposed feature selection algorithms. By the GS, a model with only two features reach a recall rate as high as 99 per cent and a false positive rate (FPR) as low as 0.65 per cent; by the RFE, another model with only three features achieves a recall rate of 99 per cent and an FPR of 0.16 per cent in pulsar candidates classification. Furthermore, this work investigated the number of features required as well as the misclassified pulsars by our models.


2020 ◽  
Vol 62 (12) ◽  
pp. 1649-1656 ◽  
Author(s):  
Renato Cuocolo ◽  
Lorenzo Ugga ◽  
Domenico Solari ◽  
Sergio Corvino ◽  
Alessandra D’Amico ◽  
...  

Abstract Purpose Pituitary macroadenoma consistency can influence the ease of lesion removal during surgery, especially when using a transsphenoidal approach. Unfortunately, it is not assessable on standard qualitative MRI. Radiomic texture analysis could help in extracting mineable quantitative tissue characteristics. We aimed to assess the accuracy of texture analysis combined with machine learning in the preoperative evaluation of pituitary macroadenoma consistency in patients undergoing endoscopic endonasal surgery. Methods Data of 89 patients (68 soft and 21 fibrous macroadenomas) who underwent MRI and transsphenoidal surgery at our institution were retrospectively reviewed. After manual segmentation, radiomic texture features were extracted from original and filtered MR images. Feature stability analysis and a multistep feature selection were performed. After oversampling to balance the classes, 80% of the data was used for hyperparameter tuning via stratified 5-fold cross-validation, while a 20% hold-out set was employed for its final testing, using an Extra Trees ensemble meta-algorithm. The reference standard was based on surgical findings. Results A total of 1118 texture features were extracted, of which 741 were stable. After removal of low variance (n = 4) and highly intercorrelated (n = 625) parameters, recursive feature elimination identified a subset of 14 features. After hyperparameter tuning, the Extra Trees classifier obtained an accuracy of 93%, sensitivity of 100%, and specificity of 87%. The area under the receiver operating characteristic and precision-recall curves was 0.99. Conclusion Preoperative T2-weighted MRI texture analysis and machine learning could predict pituitary macroadenoma consistency.


Sign in / Sign up

Export Citation Format

Share Document