Discriminant Analysis and Other Linear Classification Models

2013 ◽  
pp. 275-328 ◽  
Author(s):  
Max Kuhn ◽  
Kjell Johnson
2004 ◽  
Vol 1 (1) ◽  
pp. 143-161
Author(s):  
Maja Pohar ◽  
Mateja Blas ◽  
Sandra Turk

Two of the most widely used statistical methods for analyzing categorical outcome variables are linear discriminant analysis and logistic regression. While both are appropriate for the development of linear classification models, linear discriminant analysis makes more assumptions about the underlying data. Hence, it is assumed that logistic regression is the more flexible and more robust method in case of violations of these assumptions. In this paper we consider the problem of choosing between the two methods, and set some guidelines for proper choice. The comparison between the methods is based on several measures of predictive accuracy. The performance of the methods is studied by simulations. We start with an example where all the assumptions of the linear discriminant analysis are satisfied and observe the impact of changes regarding the sample size, covariance matrix, Mahalanobis distance and direction of distance between group means. Next, we compare the robustness of the methods towards categorisation and non-normality of explanatory variables in a closely controlled way. We show that the results of LDA and LR are close whenever the normality assumptions are not too badly violated, and set some guidelines for recognizing these situations. We discuss the inappropriateness of LDA in all other cases.


2019 ◽  
Author(s):  
Nico Curti ◽  
Enrico Giampieri ◽  
Giuseppe Levi ◽  
Gastone Castellani ◽  
Daniel Remondini

The objective of many high-throughput “omics” studies is to obtain a relatively low-dimensional set of observables - signature - for sample classification purposes (diagnosis, prognosis, stratification). We propose DNetPRO, Discriminant Analysis with Network PROcessing, a supervised signature identification method based on a bottom-up combinatorial approach that exploits the discriminant power of all variable pairs. The algorithm is easily scalable allowing efficient computing even for high number of observables (104 − 105). We show applications on real high-throughput genomic datasets in which our method outperforms existing results, or compares to them but with a smaller number of selected variables. Moreover the linearity of DNetPRO allows a clearer interpretation of the obtained signatures in comparison to non linear classification models


Foods ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 2723
Author(s):  
Evgenia D. Spyrelli ◽  
Christina Papachristou ◽  
George-John E. Nychas ◽  
Efstathios Z. Panagou

Fourier transform infrared spectroscopy (FT-IR) and multispectral imaging (MSI) were evaluated for the prediction of the microbiological quality of poultry meat via regression and classification models. Chicken thigh fillets (n = 402) were subjected to spoilage experiments at eight isothermal and two dynamic temperature profiles. Samples were analyzed microbiologically (total viable counts (TVCs) and Pseudomonas spp.), while simultaneously MSI and FT-IR spectra were acquired. The organoleptic quality of the samples was also evaluated by a sensory panel, establishing a TVC spoilage threshold at 6.99 log CFU/cm2. Partial least squares regression (PLS-R) models were employed in the assessment of TVCs and Pseudomonas spp. counts on chicken’s surface. Furthermore, classification models (linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), support vector machines (SVMs), and quadratic support vector machines (QSVMs)) were developed to discriminate the samples in two quality classes (fresh vs. spoiled). PLS-R models developed on MSI data predicted TVCs and Pseudomonas spp. counts satisfactorily, with root mean squared error (RMSE) values of 0.987 and 1.215 log CFU/cm2, respectively. SVM model coupled to MSI data exhibited the highest performance with an overall accuracy of 94.4%, while in the case of FT-IR, improved classification was obtained with the QDA model (overall accuracy 71.4%). These results confirm the efficacy of MSI and FT-IR as rapid methods to assess the quality in poultry products.


1989 ◽  
Vol 67 (2) ◽  
pp. 594-599 ◽  
Author(s):  
Bernard R. Baum ◽  
L. Grant Bailey

That Hordeum capense, a South African species, and H. secalinum, a mainly European species, are conspecific, has been the prevailing view for the last 80 years because of a lack of distinguishing markers. In the present paper, morphological separability is demonstrated by means of cluster analysis, classificatory discriminant analysis, logistic discrimination, and canonical discriminant analysis. The performance of the linear classification functions are evaluated by the bootstrap and discussed. Lodicules and epiblasts were found to be good distinguishing markers. The nomenclatural type of H. secalinum has been designated as lectotype instead of the previously designated neotype.


Metabolites ◽  
2020 ◽  
Vol 10 (7) ◽  
pp. 278 ◽  
Author(s):  
Marta Bevilacqua ◽  
Rasmus Bro

In this paper, we discuss the validity of using score plots of component models such as partial least squares regression, especially when these models are used for building classification models, and models derived from partial least squares regression for discriminant analysis (PLS-DA). Using examples and simulations, it is shown that the currently accepted practice of showing score plots from calibration models may give misleading interpretations. It is suggested and shown that the problem can be solved by replacing the currently used calibrated score plots with cross-validated score plots.


Author(s):  
Brian Carnahan ◽  
Gérard Meyer ◽  
Lois-Ann Kuntz

Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches - genetic programming and decision tree induction - were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.


Molecules ◽  
2019 ◽  
Vol 24 (8) ◽  
pp. 1550 ◽  
Author(s):  
Liang Xu ◽  
Wen Sun ◽  
Cui Wu ◽  
Yucui Ma ◽  
Zhimao Chao

Near infrared (NIR) spectroscopy with chemometric techniques was applied to discriminate the geographical origins of crude drugs (i.e., dried ripe fruits of Trichosanthes kirilowii) and prepared slices of Trichosanthis Fructus in this work. The crude drug samples (120 batches) from four growing regions (i.e., Shandong, Shanxi, Hebei, and Henan Provinces) were collected, dried, and used and the prepared slice samples (30 batches) were purchased from different drug stores. The raw NIR spectra were acquired and preprocessed with multiplicative scatter correction (MSC). Principal component analysis (PCA) was used to extract relevant information from the spectral data and gave visible cluster trends. Four different classification models, namely K-nearest neighbor (KNN), soft independent modeling of class analogy (SIMCA), partial least squares-discriminant analysis (PLS-DA), and support vector machine-discriminant analysis (SVM-DA), were constructed and their performances were compared. The corresponding classification model parameters were optimized by cross-validation (CV). Among the four classification models, SVM-DA model was superior over the other models with a classification accuracy up to 100% for both the calibration set and the prediction set. The optimal SVM-DA model was achieved when C =100, γ = 0.00316, and the number of principal components (PCs) = 6. While PLS-DA model had the classification accuracy of 95% for the calibration set and 98% for the prediction set. The KNN model had a classification accuracy of 92% for the calibration set and 94% for prediction set. The non-linear classification method was superior to the linear ones. Generally, the results demonstrated that the crude drugs from different geographical origins and the crude drugs and prepared slices of Trichosanthis Fructus could be distinguished by NIR spectroscopy coupled with SVM-DA model rapidly, nondestructively, and reliably.


Sign in / Sign up

Export Citation Format

Share Document