scholarly journals Lettuce life stage classification from texture attributes using machine learning estimators and feature selection processes

Author(s):  
Sandy Cruz Lauguico ◽  
Ronnie II Sabino Concepcion ◽  
Jonnel Dorado Alejandrino ◽  
Rogelio Ruzcko Tobias ◽  
Elmer Pamisa Dadios

Classification of lettuce life or growth stages is an effective tool for measuring the performance of an aquaponics system. It determines the balance in water nutrients, adequate temperature and lighting, other environmental factors, and the system’s productivity to sustain cultivars. This paper proposes a classification of lettuce life stages planted in an aquaponics system. The classification was done using the texture features of the leaves derived from machine vision algorithms. The attributes underwent three different feature selection processes, namely: Univariate Selection (US), Recursive Feature Elimination (RFE), and Feature Importance (FI) to determine the four most significant features from the original eight attributes. The features selected were used for training four estimators from Decision Trees Classifier (DTC), Gaussian Naïve Bayes (GNB), Stochastic Gradient Descent (SGD), and Linear Discriminant Analysis (LDA). The models trained using DTC and SGD were then optimized as they have hyperparameters for tuning. A comparative analysis among Machine Learning (ML) algorithms was conducted to identify the best-performing model with the given application. The best features were derived from US and FI as they have the same top four features using the DTC estimator optimized with the hyperparameters tuned to max depth having 5, criterion equated to ‘Gini', and splitter was set to 'Best'. The accuracy obtained from cross-validation evaluation resulted in 87.92%. Considering consistency with hold-out validation, LDA outperforms optimized DTC even with lower accuracy of 86.67%. This accuracy of LDA outperformed DTC due to its sufficient fit for generalizing the testing data on classifying lettuce growth stage.

2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Nicholas Nuechterlein ◽  
Beibin Li ◽  
Abdullah Feroze ◽  
Eric C Holland ◽  
Linda Shapiro ◽  
...  

Abstract Background Combined whole-exome sequencing (WES) and somatic copy number alteration (SCNA) information can separate isocitrate dehydrogenase (IDH)1/2-wildtype glioblastoma into two prognostic molecular subtypes, which cannot be distinguished by epigenetic or clinical features. The potential for radiographic features to discriminate between these molecular subtypes has yet to be established. Methods Radiologic features (n = 35 340) were extracted from 46 multisequence, pre-operative magnetic resonance imaging (MRI) scans of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive (TCIA), all of whom have corresponding WES/SCNA data. We developed a novel feature selection method that leverages the structure of extracted MRI features to mitigate the dimensionality challenge posed by the disparity between a large number of features and the limited patients in our cohort. Six traditional machine learning classifiers were trained to distinguish molecular subtypes using our feature selection method, which was compared to least absolute shrinkage and selection operator (LASSO) feature selection, recursive feature elimination, and variance thresholding. Results We were able to classify glioblastomas into two prognostic subgroups with a cross-validated area under the curve score of 0.80 (±0.03) using ridge logistic regression on the 15-dimensional principle component analysis (PCA) embedding of the features selected by our novel feature selection method. An interrogation of the selected features suggested that features describing contours in the T2 signal abnormality region on the T2-weighted fluid-attenuated inversion recovery (FLAIR) MRI sequence may best distinguish these two groups from one another. Conclusions We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups.


2021 ◽  
Vol 11 ◽  
Author(s):  
Guyu Dai ◽  
Xiangbin Zhang ◽  
Wenjie Liu ◽  
Zhibin Li ◽  
Guangyu Wang ◽  
...  

PurposeTo find a suitable method for analyzing electronic portal imaging device (EPID) transmission fluence maps for the identification of position errors in the in vivo dose monitoring of patients with Graves’ ophthalmopathy (GO).MethodsPosition errors combining 0-, 2-, and 4-mm errors in the left-right (LR), anterior-posterior (AP), and superior-inferior (SI) directions in the delivery of 40 GO patient radiotherapy plans to a human head phantom were simulated and EPID transmission fluence maps were acquired. Dose difference (DD) and structural similarity (SSIM) maps were calculated to quantify changes in the fluence maps. Three types of machine learning (ML) models that utilize radiomics features of the DD maps (ML 1 models), features of the SSIM maps (ML 2 models), and features of both DD and SSIM maps (ML 3 models) as inputs were used to perform three types of position error classification, namely a binary classification of the isocenter error (type 1), three binary classifications of LR, SI, and AP direction errors (type 2), and an eight-element classification of the combined LR, SI, and AP direction errors (type 3). Convolutional neural network (CNN) was also used to classify position errors using the DD and SSIM maps as input.ResultsThe best-performing ML 1 model was XGBoost, which achieved accuracies of 0.889, 0.755, 0.778, 0.833, and 0.532 in the type 1, type 2-LR, type 2-AP, type 2-SI, and type 3 classification, respectively. The best ML 2 model was XGBoost, which achieved accuracies of 0.856, 0.731, 0.736, 0.949, and 0.491, respectively. The best ML 3 model was linear discriminant classifier (LDC), which achieved accuracies of 0.903, 0.792, 0.870, 0.931, and 0.671, respectively. The CNN achieved classification accuracies of 0.925, 0.833, 0.875, 0.949, and 0.689, respectively.ConclusionML models and CNN using combined DD and SSIM maps can analyze EPID transmission fluence maps to identify position errors in the treatment of GO patients. Further studies with large sample sizes are needed to improve the accuracy of CNN.


In pharmaceutical research, traditional drug discovery process is time consuming and expensive, where several compounds are experimentally tested for their biological activities. Series of lab experiments are conducted to analyze newly synthesized drug’s pharmaceutical activities and its biological effects on human. With every new drug discovery, the required clinical properties can be determined using machine learning models and this greatly reduces the experimental cost. This paper explores parametric and non-parametric machine learning models to classify administration properties of drugs and its toxicity. The multinomial classification of drugs was based on their physicochemical and ADMET properties. Balanced data samples were drawn from chEMBL and was pre-processed. Features were reduced using Recursive Feature Elimination and the attributes were ranked based on their importance to reduce highly correlated attributes. The performance of parametric and non-parametric machine learning models was analyzed on cheminformatic data that includes physiochemical, biological and pharmaceutical properties of the drug molecules. Selecting the potent drug candidate along with its administration properties greatly reduces wet lab experimental time and cost. Multiclass classification can be determined efficiently using non-parametric machine learning model. Optimal feature engineering, tuning hyperparameters and adopting hybrid algorithms would result in more accurate predictions in future for cheminformatics data.


Author(s):  
Yanan Wang ◽  
Haoyu Niu ◽  
Tiebiao Zhao ◽  
Xiaozhong Liao ◽  
Lei Dong ◽  
...  

Abstract This paper has proposed a contactless voltage classification method for Lithium-ion batteries (LIBs). With a three-dimensional radio-frequency based sensor called Walabot, voltage data of LIBs can be collected in a contactless way. Then three machine learning algorithm, that is, principal component analysis (PCA), linear discriminant analysis (LDA), and stochastic gradient descent (SGD) classifiers, have been employed for data processing. Experiments and comparison have been conducted to verify the proposed method. The colormaps of results and prediction accuracy show that LDA may be most suitable for LIBs voltage classification.


2021 ◽  
Vol 11 ◽  
Author(s):  
Qi Wan ◽  
Jiaxuan Zhou ◽  
Xiaoying Xia ◽  
Jianfeng Hu ◽  
Peng Wang ◽  
...  

ObjectiveTo evaluate the performance of 2D and 3D radiomics features with different machine learning approaches to classify SPLs based on magnetic resonance(MR) T2 weighted imaging (T2WI).Material and MethodsA total of 132 patients with pathologically confirmed SPLs were examined and randomly divided into training (n = 92) and test datasets (n = 40). A total of 1692 3D and 1231 2D radiomics features per patient were extracted. Both radiomics features and clinical data were evaluated. A total of 1260 classification models, comprising 3 normalization methods, 2 dimension reduction algorithms, 3 feature selection methods, and 10 classifiers with 7 different feature numbers (confined to 3–9), were compared. The ten-fold cross-validation on the training dataset was applied to choose the candidate final model. The area under the receiver operating characteristic curve (AUC), precision-recall plot, and Matthews Correlation Coefficient were used to evaluate the performance of machine learning approaches.ResultsThe 3D features were significantly superior to 2D features, showing much more machine learning combinations with AUC greater than 0.7 in both validation and test groups (129 vs. 11). The feature selection method Analysis of Variance(ANOVA), Recursive Feature Elimination(RFE) and the classifier Logistic Regression(LR), Linear Discriminant Analysis(LDA), Support Vector Machine(SVM), Gaussian Process(GP) had relatively better performance. The best performance of 3D radiomics features in the test dataset (AUC = 0.824, AUC-PR = 0.927, MCC = 0.514) was higher than that of 2D features (AUC = 0.740, AUC-PR = 0.846, MCC = 0.404). The joint 3D and 2D features (AUC=0.813, AUC-PR = 0.926, MCC = 0.563) showed similar results as 3D features. Incorporating clinical features with 3D and 2D radiomics features slightly improved the AUC to 0.836 (AUC-PR = 0.918, MCC = 0.620) and 0.780 (AUC-PR = 0.900, MCC = 0.574), respectively.ConclusionsAfter algorithm optimization, 2D feature-based radiomics models yield favorable results in differentiating malignant and benign SPLs, but 3D features are still preferred because of the availability of more machine learning algorithmic combinations with better performance. Feature selection methods ANOVA and RFE, and classifier LR, LDA, SVM and GP are more likely to demonstrate better diagnostic performance for 3D features in the current study.


2021 ◽  
Vol 14 (S1) ◽  
Author(s):  
Zishuang Zhang ◽  
Zhi-Ping Liu

Abstract Background Hepatocellular carcinoma (HCC) is one of the most common cancers. The discovery of specific genes severing as biomarkers is of paramount significance for cancer diagnosis and prognosis. The high-throughput omics data generated by the cancer genome atlas (TCGA) consortium provides a valuable resource for the discovery of HCC biomarker genes. Numerous methods have been proposed to select cancer biomarkers. However, these methods have not investigated the robustness of identification with different feature selection techniques. Methods We use six different recursive feature elimination methods to select the gene signiatures of HCC from TCGA liver cancer data. The genes shared in the six selected subsets are proposed as robust biomarkers. Akaike information criterion (AIC) is employed to explain the optimization process of feature selection, which provides a statistical interpretation for the feature selection in machine learning methods. And we use several methods to validate the screened biomarkers. Results In this paper, we propose a robust method for discovering biomarker genes for HCC from gene expression data. Specifically, we implement recursive feature elimination cross-validation (RFE-CV) methods based on six different classication algorithms. The overlaps in the discovered gene sets via different methods are referred as the identified biomarkers. We give an interpretation of the feature selection process based on machine learning using AIC in statistics. Furthermore, the features selected by the backward logistic stepwise regression via AIC minimum theory are completely contained in the identified biomarkers. Through the classification results, the superiority of interpretable robust biomarker discovery method is verified. Conclusions It is found that overlaps among gene subsets contain different quantitative features selected by the RFE-CV of 6 classifiers. The AIC values in the model selection provide a theoretical foundation for the feature selection process of biomarker discovery via machine learning. What’s more, genes containing in more optimally selected subsets make better biological sense and implication. The quality of feature selection is improved by the intersections of biomarkers selected from different classifiers. This is a general method suitable for screening biomarkers of complex diseases from high-throughput data.


2021 ◽  
Author(s):  
Isaac Shiri ◽  
Yazdan Salimi ◽  
Abdollah Saberi ◽  
Masoumeh Pakbin ◽  
Ghasem Hajianfar ◽  
...  

AbstractPurposeTo derive and validate an effective radiomics-based model for differentiation of COVID-19 pneumonia from other lung diseases using a very large cohort of patients.MethodsWe collected 19 private and 5 public datasets, accumulating to 26,307 individual patient images (15,148 COVID-19; 9,657 with other lung diseases e.g. non-COVID-19 pneumonia, lung cancer, pulmonary embolism; 1502 normal cases). Images were automatically segmented using a validated deep learning (DL) model and the results carefully reviewed. Images were first cropped into lung-only region boxes, then resized to 296×216 voxels. Voxel dimensions was resized to 1×1×1mm3 followed by 64-bin discretization. The 108 extracted features included shape, first-order histogram and texture features. Univariate analysis was first performed using simple logistic regression. The thresholds were fixed in the training set and then evaluation performed on the test set. False discovery rate (FDR) correction was applied to the p-values. Z-Score normalization was applied to all features. For multivariate analysis, features with high correlation (R2>0.99) were eliminated first using Pearson correlation. We tested 96 different machine learning strategies through cross-combining 4 feature selectors or 8 dimensionality reduction techniques with 8 classifiers. We trained and evaluated our models using 3 different datasets: 1) the entire dataset (26,307 patients: 15,148 COVID-19; 11,159 non-COVID-19); 2) excluding normal patients in non-COVID-19, and including only RT-PCR positive COVID-19 cases in the COVID-19 class (20,697 patients including 12,419 COVID-19, and 8,278 non-COVID-19)); 3) including only non-COVID-19 pneumonia patients and a random sample of COVID-19 patients (5,582 patients: 3,000 COVID-19, and 2,582 non-COVID-19) to provide balanced classes. Subsequently, each of these 3 datasets were randomly split into 70% and 30% for training and testing, respectively. All various steps, including feature preprocessing, feature selection, and classification, were performed separately in each dataset. Classification algorithms were optimized during training using grid search algorithms. The best models were chosen by a one-standard-deviation rule in 10-fold cross-validation and then were evaluated on the test sets.ResultsIn dataset #1, Relief feature selection and RF classifier combination resulted in the highest performance (Area under the receiver operating characteristic curve (AUC) = 0.99, sensitivity = 0.98, specificity = 0.94, accuracy = 0.96, positive predictive value (PPV) = 0.96, and negative predicted value (NPV) = 0.96). In dataset #2, Recursive Feature Elimination (RFE) feature selection and Random Forest (RF) classifier combination resulted in the highest performance (AUC = 0.99, sensitivity = 0.98, specificity = 0.95, accuracy = 0.97, PPV = 0.96, and NPV = 0.98). In dataset #3, the ANOVA feature selection and RF classifier combination resulted in the highest performance (AUC = 0.98, sensitivity = 0.96, specificity = 0.93, accuracy = 0.94, PPV = 0.93, NPV = 0.96).ConclusionRadiomic features extracted from entire lung combined with machine learning algorithms can enable very effective, routine diagnosis of COVID-19 pneumonia from CT images without the use of any other diagnostic test.


Sign in / Sign up

Export Citation Format

Share Document