Classification of Homo sapiens gene behavior using linear discriminant analysis fused with minimum entropy mapping

RNA sequencing (RNA-Seq) is a powerful technique for thegene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies.Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of geneexpression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data hierarchically closer to microarrays and apply microarray-based classifiers.In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such asoverdispersion, sample size, number of genes, number of classes, differential-expression rate, andthe transformation method on model performances.A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate, and number of genes and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM clas sifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html .

Download Full-text

Linear discriminant analysis based on gas chromatographic measurements for geographical prediction of USA medical domestic cannabis

Acta Chromatographica ◽

10.1556/1326.2020.00782 ◽

2020 ◽

Author(s):

Ramia Z. Al Bakain ◽

Yahya S. Al-Degs ◽

James V. Cizdziel ◽

Mahmoud A. Elsohly

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Principal Components ◽

Supervised Classification ◽

Geographical Origin ◽

Active Components ◽

Linear Discriminant ◽

Sample Extraction ◽

Analytical Range

AbstractFifty four domestically produced cannabis samples obtained from different USA states were quantitatively assayed by GC–FID to detect 22 active components: 15 terpenoids and 7 cannabinoids. The profiles of the selected compounds were used as inputs for samples grouping to their geographical origins and for building a geographical prediction model using Linear Discriminant Analysis. The proposed sample extraction and chromatographic separation was satisfactory to select 22 active ingredients with a wide analytical range between 5.0 and 1,000 µg/mL. Analysis of GC-profiles by Principle Component Analysis retained three significant variables for grouping job (Δ9-THC, CBN, and CBC) and the modest discrimination of samples based on their geographical origin was reported. PCA was able to separate many samples of Oregon and Vermont while a mixed classification was observed for the rest of samples. By using LDA as a supervised classification method, excellent separation of cannabis samples was attained leading to a classification of new samples not being included in the model. Using two principal components and LDA with GC–FID profiles correctly predict the geographical of 100% Washington cannabis, 86% of both Oregon and Vermont samples, and finally, 71% of Ohio samples.

Download Full-text

Classification of Breast Cancer versus Normal Samples from Mass Spectrometry Profiles Using Linear Discriminant Analysis of Important Features Selected by Random Forest

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1345 ◽

2008 ◽

Vol 7 (2) ◽

Cited By ~ 15

Author(s):

Somnath Datta

Keyword(s):

Breast Cancer ◽

Mass Spectrometry ◽

Discriminant Analysis ◽

Random Forest ◽

Linear Discriminant Analysis ◽

Linear Discriminant

Download Full-text

Classification of extra virgin olive oils according to their genetic variety using linear discriminant analysis of sterol profiles established by ultra-performance liquid chromatography with mass spectrometry detection

Food Research International ◽

10.1016/j.foodres.2010.11.004 ◽

2011 ◽

Vol 44 (1) ◽

pp. 103-108 ◽

Cited By ~ 20

Author(s):

M.J. Lerma-García ◽

E.F. Simó-Alfonso ◽

A. Méndez ◽

J.L. Lliberia ◽

J.M. Herrero-Martínez

Keyword(s):

Mass Spectrometry ◽

Liquid Chromatography ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Ultra Performance Liquid Chromatography ◽

Mass Spectrometry Detection ◽

Virgin Olive Oils ◽

Linear Discriminant ◽

Olive Oils

Download Full-text

Recursive Feature Elimination Based on Linear Discriminant Analysis for Molecular Selection and Classification of Diseases

Intelligent Computing Theories and Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-642-39482-9_28 ◽

2013 ◽

pp. 244-251

Author(s):

Edmundo Bonilla Huerta ◽

Roberto Morales Caporal ◽

Marco Antonio Arjona ◽

José Crispín Hernández Hernández

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Recursive Feature Elimination ◽

Linear Discriminant ◽

Classification Of Diseases ◽

Molecular Selection

Download Full-text

Determination of HPLC-UV Fingerprints of Spanish Paprika (Capsicum annuum L.) for Its Classification by Linear Discriminant Analysis

Sensors ◽

10.3390/s18124479 ◽

2018 ◽

Vol 18 (12) ◽

pp. 4479 ◽

Cited By ~ 6

Author(s):

Xavier Cetó ◽

Núria Serrano ◽

Miriam Aragó ◽

Alejandro Gámez ◽

Miquel Esteban ◽

...

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Principal Component ◽

Reversed Phase ◽

Classification Rate ◽

Phenolic Profile ◽

Linear Discriminant ◽

Sample Extraction

The development of a simple HPLC-UV method towards the evaluation of Spanish paprika’s phenolic profile and their discrimination based on the former is reported herein. The approach is based on C18 reversed-phase chromatography to generate characteristic fingerprints, in combination with linear discriminant analysis (LDA) to achieve their classification. To this aim, chromatographic conditions were optimized so as to achieve the separation of major phenolic compounds already identified in paprika. Paprika samples were subjected to a sample extraction stage by sonication and centrifugation; extracting procedure and conditions were optimized to maximize the generation of enough discriminant fingerprints. Finally, chromatograms were baseline corrected, compressed employing fast Fourier transform (FFT), and then analyzed by means of principal component analysis (PCA) and LDA to carry out the classification of paprika samples. Under the developed procedure, a total of 96 paprika samples were analyzed, achieving a classification rate of 100% for the test subset (n = 25).

Download Full-text

The Use of Satellite Information (MODIS/Aqua) for Phenological and Classification Analysis of Plant Communities

Forests ◽

10.3390/f10070561 ◽

2019 ◽

Vol 10 (7) ◽

pp. 561 ◽

Cited By ~ 3

Author(s):

Yulia Ivanova ◽

Anton Kovalev ◽

Oleg Yakubailik ◽

Vlad Soukhovolsky

Keyword(s):

Time Series ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Plant Communities ◽

Vegetation Index ◽

Normalized Difference Vegetation Index ◽

Weather Conditions ◽

Linear Discriminant ◽

Canonical Correlations

Vegetation indices derived from remote sensing measurements are commonly used to describe and monitor vegetation. However, the same plant community can have a different NDVI (normalized difference vegetation index) depending on weather conditions, and this complicates classification of plant communities. The present study develops methods of classifying the types of plant communities based on long-term NDVI data (MODIS/Aqua). The number of variables is reduced by introducing two integrated parameters of the NDVI seasonal series, facilitating classification of the meadow, steppe, and forest plant communities in Siberia using linear discriminant analysis. The quality of classification conducted by using the markers characterizing NDVI dynamics during 2003–2017 varies between 94% (forest and steppe) and 68% (meadow and forest). In addition to determining phenological markers, canonical correlations have been calculated between the time series of the proposed markers and the time series of monthly average air temperatures. Based on this, each pixel with a definite plant composition can be characterized by only four values of canonical correlation coefficients over the entire period analyzed. By using canonical correlations between NDVI and weather parameters and employing linear discriminant analysis, one can obtain a highly accurate classification of the study plant communities.

Download Full-text