scholarly journals Accurate classification of fresh and charred grape seeds to the varietal level, using machine learning based classification method

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Vlad Landa ◽  
Yekaterina Shapira ◽  
Michal David ◽  
Avshalom Karasik ◽  
Ehud Weiss ◽  
...  

AbstractGrapevine (Vitis vinifera L.) currently includes thousands of cultivars. Discrimination between these varieties, historically done by ampelography, is done in recent decades mostly by genetic analysis. However, when aiming to identify archaeobotanical remains, which are mostly charred with extremely low genomic preservation, the application of the genomic approach is rarely successful. As a result, variety-level identification of most grape remains is currently prevented. Because grape pips are highly polymorphic, several attempts were made to utilize their morphological diversity as a classification tool, mostly using 2D image analysis technics. Here, we present a highly accurate varietal classification tool using an innovative and accessible 3D seed scanning approach. The suggested classification methodology is machine-learning-based, applied with the Iterative Closest Point (ICP) registration algorithm and the Linear Discriminant Analysis (LDA) technique. This methodology achieved classification results of 91% to 93% accuracy in average when trained by fresh or charred seeds to test fresh or charred seeds, respectively. We show that when classifying 8 groups, enhanced accuracy levels can be achieved using a "tournament" approach. Future development of this new methodology can lead to an effective seed classification tool, significantly improving the fields of archaeobotany, as well as general taxonomy.

2020 ◽  
Author(s):  
Vlad Landa ◽  
Yekaterina Shapira ◽  
Michal David ◽  
Avshalom Karasik ◽  
Ehud Weiss ◽  
...  

Abstract Grapevine (Vitis vinifera L.) is an essential part of the oldest group of fruit trees around which horticulture evolved, currently includes thousands of cultivars, grown at numerous climatic conditions. Discrimination between these varieties has been traditionally conducted using ampelography, and in recent decades mostly by genetic analysis. However, when aiming to identify archaeobotanical remains, which are mostly charred- with extremely low genomic preservation, the application of the genomic approach is rarely successful. As a result, variety-level identification of most grape remains is currently prevented. Because grape pips are highly polymorphic, several attempts were made to utilize their morphological diversity as a classification tool, mostly using 2D image analysis technics, aiming to utilize these methods for the identification of fresh and archaeological specimens. Here, we present for the first time a highly accurate varietal classification tool, using an innovative and accessible approach for 3D seed scanning. The suggested classification methodology is machine-learning-based, using a complete set of 3D data obtained for each seed, applied with the Iterative Closest Point (ICP) registration algorithm and the Linear Discriminant Analysis (LDA) technique. This methodology achieved classification results of ca. 90-99% accuracy when trained by fresh seeds to test unknown fresh seeds. Moreover, the classification of charred seeds reached up to 100% accuracy when trained by charred seeds. Based on this approach, our long-term aim is to develop a computerized classification tool for the identification of grape and possibly other species and varieties. Such a tool can significantly improve the fields of archaeobotany, as well as general taxonomy.


2021 ◽  
Vol 11 ◽  
Author(s):  
Guyu Dai ◽  
Xiangbin Zhang ◽  
Wenjie Liu ◽  
Zhibin Li ◽  
Guangyu Wang ◽  
...  

PurposeTo find a suitable method for analyzing electronic portal imaging device (EPID) transmission fluence maps for the identification of position errors in the in vivo dose monitoring of patients with Graves’ ophthalmopathy (GO).MethodsPosition errors combining 0-, 2-, and 4-mm errors in the left-right (LR), anterior-posterior (AP), and superior-inferior (SI) directions in the delivery of 40 GO patient radiotherapy plans to a human head phantom were simulated and EPID transmission fluence maps were acquired. Dose difference (DD) and structural similarity (SSIM) maps were calculated to quantify changes in the fluence maps. Three types of machine learning (ML) models that utilize radiomics features of the DD maps (ML 1 models), features of the SSIM maps (ML 2 models), and features of both DD and SSIM maps (ML 3 models) as inputs were used to perform three types of position error classification, namely a binary classification of the isocenter error (type 1), three binary classifications of LR, SI, and AP direction errors (type 2), and an eight-element classification of the combined LR, SI, and AP direction errors (type 3). Convolutional neural network (CNN) was also used to classify position errors using the DD and SSIM maps as input.ResultsThe best-performing ML 1 model was XGBoost, which achieved accuracies of 0.889, 0.755, 0.778, 0.833, and 0.532 in the type 1, type 2-LR, type 2-AP, type 2-SI, and type 3 classification, respectively. The best ML 2 model was XGBoost, which achieved accuracies of 0.856, 0.731, 0.736, 0.949, and 0.491, respectively. The best ML 3 model was linear discriminant classifier (LDC), which achieved accuracies of 0.903, 0.792, 0.870, 0.931, and 0.671, respectively. The CNN achieved classification accuracies of 0.925, 0.833, 0.875, 0.949, and 0.689, respectively.ConclusionML models and CNN using combined DD and SSIM maps can analyze EPID transmission fluence maps to identify position errors in the treatment of GO patients. Further studies with large sample sizes are needed to improve the accuracy of CNN.


2020 ◽  
Author(s):  
Ahmed M. Moustafa ◽  
Paul J. Planet

AbstractBackgroundDiscrete classification of SARS-CoV-2 viral genotypes can identify emerging strains and detect geographic spread, viral diversity, and transmission events.MethodsWe developed a tool (GNUVID) that integrates whole genome multilocus sequence typing and a supervised machine learning random forest-based classifier. We used GNUVID to assign sequence type (ST) profiles to each of 69,686 SARS-CoV-2 complete, high-quality genomes available from GISAID as of October 20th 2020. STs were then clustered into clonal complexes (CCs), and then used to train a machine learning classifier. We used this tool to detect potential introduction and exportation events, and to estimate effective viral diversity across locations and over time in 16 US states.ResultsGNUVID is a scalable tool for viral genotype classification (available at https://github.com/ahmedmagds/GNUVID) that can be used to quickly process tens of thousands of genomes. Our genotyping ST/CC analysis uncovered dynamic local changes in ST/CC prevalence and diversity with multiple replacement events in different states. We detected an average of 20.6 putative introductions and 7.5 exportations for each state. Effective viral diversity dropped in all states as shelter-in-place travel-restrictions went into effect and increased as restrictions were lifted. Interestingly, our analysis showed correlation between effective diversity and the date that state-wide mask mandates were imposed.ConclusionsOur classification tool uncovered multiple introduction and exportation events, as well as waves of expansion and replacement of SARS-CoV-2 genotypes in different states. Combined with future genomic sampling the GNUVID system could be used to track circulating viral diversity and identify emerging clones and hotspots.


2017 ◽  
Vol 4 (1) ◽  
pp. 56-74 ◽  
Author(s):  
Abinash Tripathy ◽  
Santanu Kumar Rath

Sentiment analysis helps to determine hidden intention of the concerned author of any topic and provides an evaluation report on the polarity of any document. The polarity may be positive, negative or neutral. It is observed that very often the data associated with the sentiment analysis consist of the feedback given by various specialists on any topic or product. Thus, the review may be categorized properly into any sort of class based on the polarity, in order to have a good knowledge about the product. This article proposes an approach to classify the review dataset made on basis of sentiment analysis into different polarity groups. Four machine learning algorithms viz., Naive Bayes (NB), Support Vector Machine (SVM), Random Forest, and Linear Discriminant Analysis (LDA) have been considered in this paper for classification process. The obtained result on values of accuracy of the algorithms are critically examined by using different performance parameters, applied on two different datasets.


2020 ◽  
pp. 143-163
Author(s):  
Abinash Tripathy ◽  
Santanu Kumar Rath

Sentiment analysis helps to determine hidden intention of the concerned author of any topic and provides an evaluation report on the polarity of any document. The polarity may be positive, negative or neutral. It is observed that very often the data associated with the sentiment analysis consist of the feedback given by various specialists on any topic or product. Thus, the review may be categorized properly into any sort of class based on the polarity, in order to have a good knowledge about the product. This article proposes an approach to classify the review dataset made on basis of sentiment analysis into different polarity groups. Four machine learning algorithms viz., Naive Bayes (NB), Support Vector Machine (SVM), Random Forest, and Linear Discriminant Analysis (LDA) have been considered in this paper for classification process. The obtained result on values of accuracy of the algorithms are critically examined by using different performance parameters, applied on two different datasets.


2014 ◽  
Author(s):  
Gokmen Zararsiz ◽  
Dincer Goksuluk ◽  
Selcuk Korkmaz ◽  
Vahap Eldem ◽  
Izzet Parug Duru ◽  
...  

Background RNA sequencing (RNA-Seq) is a powerful technique for transcriptome profiling of the organisms that uses the capabilities of next-generation sequencing (NGS) technologies. Recent advances in NGS let to measure the expression levels of tens to thousands of transcripts simultaneously. Using such information, developing expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of disease. Here, we present the bagging support vector machines (bagSVM), a machine learning approach and bagged ensembles of support vector machines (SVM), for classification of RNA-Seq data. The bagSVM basically uses bootstrap technique and trains each single SVM separately; next it combines the results of each SVM model using majority-voting technique. Results We demonstrate the performance of the bagSVM on simulated and real datasets. Simulated datasets are generated from negative binomial distribution under different scenarios and real datasets are obtained from publicly available resources. A deseq normalization and variance stabilizing transformation (vst) were applied to all datasets. We compared the results with several classifiers including Poisson linear discriminant analysis (PLDA), single SVM, classification and regression trees (CART), and random forests (RF). In slightly overdispersed data, all methods, except CART algorithm, performed well. Performance of PLDA seemed to be best and RF as second best for very slightly and substantially overdispersed datasets. While data become more spread, bagSVM turned out to be the best classifier. In overall results, bagSVM and PLDA had the highest accuracies. Conclusions According to our results, bagSVM algorithm after vst transformation can be a good choice of classifier for RNA-Seq datasets mostly for overdispersed ones. Thus, we recommend researchers to use bagSVM algorithm for the purpose of classification of RNA-Seq data. PLDA algorithm should be a method of choice for slight and moderately overdispersed datasets. An R/BIOCONDUCTOR package MLSeq with a vignette is freely available at http://www.bioconductor.org/packages/2.14/bioc/html/MLSeq.html Keywords: Bagging, machine learning, RNA-Seq classification, support vector machines, transcriptomics


Author(s):  
Sandy Cruz Lauguico ◽  
Ronnie II Sabino Concepcion ◽  
Jonnel Dorado Alejandrino ◽  
Rogelio Ruzcko Tobias ◽  
Elmer Pamisa Dadios

Classification of lettuce life or growth stages is an effective tool for measuring the performance of an aquaponics system. It determines the balance in water nutrients, adequate temperature and lighting, other environmental factors, and the system’s productivity to sustain cultivars. This paper proposes a classification of lettuce life stages planted in an aquaponics system. The classification was done using the texture features of the leaves derived from machine vision algorithms. The attributes underwent three different feature selection processes, namely: Univariate Selection (US), Recursive Feature Elimination (RFE), and Feature Importance (FI) to determine the four most significant features from the original eight attributes. The features selected were used for training four estimators from Decision Trees Classifier (DTC), Gaussian Naïve Bayes (GNB), Stochastic Gradient Descent (SGD), and Linear Discriminant Analysis (LDA). The models trained using DTC and SGD were then optimized as they have hyperparameters for tuning. A comparative analysis among Machine Learning (ML) algorithms was conducted to identify the best-performing model with the given application. The best features were derived from US and FI as they have the same top four features using the DTC estimator optimized with the hyperparameters tuned to max depth having 5, criterion equated to ‘Gini', and splitter was set to 'Best'. The accuracy obtained from cross-validation evaluation resulted in 87.92%. Considering consistency with hold-out validation, LDA outperforms optimized DTC even with lower accuracy of 86.67%. This accuracy of LDA outperformed DTC due to its sufficient fit for generalizing the testing data on classifying lettuce growth stage.


Author(s):  
Hicham Riri ◽  
Mohammed Ed-Dhahraouy ◽  
Abdelmajid Elmoutaouakkil ◽  
Abderrahim Beni-Hssane ◽  
Farid Bourzgui

The purpose of this study is to investigate computer vision and machine learning methods for classification of orthodontic images in order to provide orthodontists with a solution for multi-class classification of patients’ images to evaluate the evolution of their treatment. Of which, we proposed three algorithms based on extracted features, such as facial features and skin colour using YCbCrcolour space, assigned to nodes of a decision tree to classify orthodontic images: an algorithm for intra-oral images, an algorithm for mould images and an algorithm for extra-oral images. Then, we compared our method by implementing the Local Binary Pattern (LBP) algorithm to extract textural features from images. After that, we applied the principal component analysis (PCA) algorithm to optimize the redundant parameters in order to classify LBP features with six classifiers; Quadratic Support Vector Machine (SVM), Cubic SVM, Radial Basis Function SVM, Cosine K-Nearest Neighbours (KNN), Euclidian KNN, and Linear Discriminant Analysis (LDA). The presented algorithms have been evaluated on a dataset of images of 98 different patients, and experimental results demonstrate the good performances of our proposed method with a high accuracy compared with machine learning algorithms. Where LDA classifier achieves an accuracy of 84.5%.


Author(s):  
Matthew J Valetich ◽  
Charles Le Losq ◽  
Richard J Arculus ◽  
Susumu Umino ◽  
John Mavrogenes

Abstract Much of the boninite magmatism in the Izu-Bonin-Mariana (IBM) arc is preserved as evolved boninite series compositions wherein extensive fractional crystallisation of pyroxene and spinel have obscured the diagnostic geochemical indicators of boninite parentage, such as high-Mg and low-Ti at intermediate silica contents. As a result, the usual geochemical discriminants used for the classification of the broad range of parental boninites are inapplicable to such highly fractionated melts. These issues are compounded by the mixing of demonstrably different whole-rock and glass analyses in classification schemes and petrological interpretations based thereon. Whole-rock compositions are compromised by entrainment of variable proportions of crystalline phases resulting in inconsistent differences with corresponding in-situ glass analyses, which arguably better reflect prior melt compositions. To circumvent such issues, we herein present a robust method for the classification of highly fractionated boninite series glasses. This new classification leverages the analysis of trace elements, much more sensitive to evolutionary processes than major elements, and benefits from the use of unsupervised machine learning as a classification tool. The results show the most fractionated boninite series melts preserve geochemical indicators of their parentage, and highlight the pitfalls of interpreting whole rock and glass analyses interchangeably.


Sign in / Sign up

Export Citation Format

Share Document