SHORT PROKARYOTIC DNA FRAGMENT BINNING USING A HIERARCHICAL CLASSIFIER BASED ON LINEAR DISCRIMINANT ANALYSIS AND PRINCIPAL COMPONENT ANALYSIS

2010 ◽  
Vol 08 (06) ◽  
pp. 995-1011 ◽  
Author(s):  
HAO ZHENG ◽  
HONGWEI WU

Metagenomics is an emerging field in which the power of genomic analysis is applied to an entire microbial community, bypassing the need to isolate and culture individual microbial species. Assembling of metagenomic DNA fragments is very much like the overlap-layout-consensus procedure for assembling isolated genomes, but is augmented by an additional binning step to differentiate scaffolds, contigs and unassembled reads into various taxonomic groups. In this paper, we employed n-mer oligonucleotide frequencies as the features and developed a hierarchical classifier (PCAHIER) for binning short (≤ 1,000 bps) metagenomic fragments. The principal component analysis was used to reduce the high dimensionality of the feature space. The hierarchical classifier consists of four layers of local classifiers that are implemented based on the linear discriminant analysis. These local classifiers are responsible for binning prokaryotic DNA fragments into superkingdoms, of the same superkingdom into phyla, of the same phylum into genera, and of the same genus into species, respectively. We evaluated the performance of the PCAHIER by using our own simulated data sets as well as the widely used simHC synthetic metagenome data set from the IMG/M system. The effectiveness of the PCAHIER was demonstrated through comparisons against a non-hierarchical classifier, and two existing binning algorithms (TETRA and Phylopythia).

Author(s):  
David Zhang ◽  
Xiao-Yuan Jing ◽  
Jian Yang

This chapter presents two straightforward image projection techniques — two-dimensional (2D) image matrix-based principal component analysis (IMPCA, 2DPCA) and 2D image matrix-based Fisher linear discriminant analysis (IMLDA, 2DLDA). After a brief introduction, we first introduce IMPCA. Then IMLDA technology is given. As a result, we summarize some useful conclusions.


Electronics ◽  
2019 ◽  
Vol 8 (8) ◽  
pp. 870
Author(s):  
Tengteng Wen ◽  
Dehan Luo ◽  
Yongjie Ji ◽  
Pingzhong Zhong

Odor reproduction, a branch of machine olfaction, is a technology through which a machine represents various odors by blending several odor sources in different proportions and releases them. In this paper, an odor reproduction system is proposed. The system includes an atomization-based odor dispenser using 16 micro-porous piezoelectric transducers. The authors propose the use of an electronic nose combined with a Principal Component Analysis–Linear Discriminant Analysis (PCA–LDA) model to evaluate the effectiveness of the system. The results indicate that the model can be used to evaluate the system.


2019 ◽  
Vol 3 (2) ◽  
pp. 72
Author(s):  
Widi Astuti ◽  
Adiwijaya Adiwijaya

Cancer is one of the leading causes of death globally. Early detection of cancer allows better treatment for patients. One method to detect cancer is using microarray data classification. However, microarray data has high dimensions which complicates the classification process. Linear Discriminant Analysis is a classification technique which is easy to implement and has good accuracy. However, Linear Discriminant Analysis has difficulty in handling high dimensional data. Therefore, Principal Component Analysis, a feature extraction technique is used to optimize Linear Discriminant Analysis performance. Based on the results of the study, it was found that usage of Principal Component Analysis increases the accuracy of up to 29.04% and f-1 score by 64.28% for colon cancer data.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Heping Li ◽  
Yu Ren ◽  
Fan Yu ◽  
Dongliang Song ◽  
Lizhe Zhu ◽  
...  

To facilitate the enhanced reliability of Raman-based tumor detection and analytical methodologies, an ex vivo Raman spectral investigation was conducted to identify distinct compositional information of healthy (H), ductal carcinoma in situ (DCIS), and invasive ductal carcinoma (IDC). Then, principal component analysis-linear discriminant analysis (PCA-LDA) and principal component analysis-support vector machine (PCA-SVM) models were constructed for distinguishing spectral features among different tissue groups. Spectral analysis highlighted differences in levels of unsaturated and saturated lipids, carotenoids, protein, and nucleic acid between healthy and cancerous tissue and variations in the levels of nucleic acid, protein, and phenylalanine between DCIS and IDC. Both classification models were principal component analysis-linear discriminant analysis to be extremely efficient on discriminating tissue pathological types with 99% accuracy for PCA-LDA and 100%, 100%, and 96.7% for PCA-SVM analysis based on linear kernel, polynomial kernel, and radial basis function (RBF), respectively, while PCA-SVM algorithm greatly simplified the complexity of calculation without sacrificing performance. The present study demonstrates that Raman spectroscopy combined with multivariate analysis technology has considerable potential for improving the efficiency and performance of breast cancer diagnosis.


Sign in / Sign up

Export Citation Format

Share Document