scholarly journals ADAPTIVE VARIABLE EXTRACTIONS WITH LDA FOR CLASSIFICATION OF MIXED VARIABLES, AND APPLICATIONS TO MEDICAL DATA

2021 ◽  
Vol 20 (Number 3) ◽  
pp. 305-327
Author(s):  
Hashibah Hamid ◽  
Nor Idayu Mahat ◽  
Safwati Ibrahim

The strategy surrounding the extraction of a number of mixed variables is examined in this paper in building a model for Linear Discriminant Analysis (LDA). Two methods for extracting crucial variables from a dataset with categorical and continuous variables were employed, namely, multiple correspondence analysis (MCA) and principal component analysis (PCA). However, in this case, direct use of either MCA or PCA on mixed variables is impossible due to restrictions on the structure of data that each method could handle. Therefore, this paper executes some adjustments including a strategy for managing mixed variables so that those mixed variables are equivalent in values. With this, both MCA and PCA can be performed on mixed variables simultaneously. The variables following this strategy of extraction were then utilised in the construction of the LDA model before applying them to classify objects going forward. The suggested models, using three real sets of medical data were then tested, where the results indicated that using a combination of the two methods of MCA and PCA for extraction and LDA could reduce the model’s size, having a positive effect on classifying and better performance of the model since it leads towards minimising the leave-one-out error rate. Accordingly, the models proposed in this paper, including the strategy that was adapted was successful in presenting good results over the full LDA model. Regarding the indicators that were used to extract and to retain the variables in the model, cumulative variance explained (CVE), eigenvalue, and a non-significant shift in the CVE (constant change), could be considered a useful reference or guideline for practitioners experiencing similar issues in future.

2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Jie Zhang ◽  
Wenna Guo ◽  
Qiao Li ◽  
Faxin Sun ◽  
Xiaomeng Xu ◽  
...  

Medicinal property, which is closely related to drug chemical profiling, is the essence of traditional Chinese medicine (TCM) theory and has always been the focus of modern Chinese medicine. Based on dozens of classic and commonly used TCM herbs with recognized medicinal properties, the present study just aimed to investigate the feasibility and reliability of medicinal property discriminant by using 1H-NMR spectrometry, which provided a mass of spectral data showing holistic chemical profile for multivariate analysis and data mining, including principal component analysis (PCA), Fisher linear discriminant analysis (FLDA), and canonical discriminant analysis (CDA). By using FLDA for two-class recognition, a large majority of test herbs (59/61) were properly discriminated as cold or hot group, and the only two exceptions were Chuanbeimu (Fritillariae Cirrhosae Bulbus) and Rougui (Cinnamomi Cortex), suggesting that medicinal properties interrelate with flavor and body tropism, and all these factors together bring up medicinal property and efficacy. While by performing CDA, 98.4% of the original grouped herbs and 77.0% of the leave-one-out cross-validated grouped cases were correctly classified. The findings demonstrated that discriminant analysis based on holistic chemical profiling data by 1H-NMR spectrometry may provide a powerful alternative to have a deeper understanding of TCM medicinal property.


2014 ◽  
Vol 670-671 ◽  
pp. 1482-1487
Author(s):  
Rodrigo Clemente Thom de Souza ◽  
Maria Teresinha Arns Steiner ◽  
Leandro dos Santos Coelho

Classification is a supervised learning problem used to discriminate data instances in different classes. The solution to this problem is obtained through algorithms (classifiers) that look for patterns of relationships between classes in known cases, using these relationships to classify unknown cases. The performance of the classifiers depends substantially of the data types. In order to give proper treatment to nominal data, this paper shows that the application of previous transformations can substantially improve the performance of classifiers, bringing significant benefits to the result of the whole process of Knowledge Discovery in Databases (KDD). This paper uses three different data sets with nominal data and two well-known classifiers: the Linear Discriminant Analysis (LDA), and the Naïve-Bayes (NB). For data transformation, the paper applies an approach called Geometric Data Analysis (GDA). The GDA techniques compared in this paper are the traditional Principal Component Analysis (PCA) and the underexplored Multiple Correspondence Analysis (MCA). The results confirm the capability of the GDA transformation to improve the classification accuracy and attest the superiority of the MCA in comparison with its precursor, the PCA, when applied to nominal data.


Biosensors ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 68
Author(s):  
Anais Gómez ◽  
Diana Bueno ◽  
Juan Manuel Gutiérrez

The present work reports the development of a biologically inspired analytical system known as Electronic Eye (EE), capable of qualitatively discriminating different tequila categories. The reported system is a low-cost and portable instrumentation based on a Raspberry Pi single-board computer and an 8 Megapixel CMOS image sensor, which allow the collection of images of Silver, Aged, and Extra-aged tequila samples. Image processing is performed mimicking the trichromatic theory of color vision using an analysis of Red, Green, and Blue components (RGB) for each image’s pixel. Consequently, RGB absorbances of images were evaluated and preprocessed, employing Principal Component Analysis (PCA) to visualize data clustering. The resulting PCA scores were modeled with a Linear Discriminant Analysis (LDA) that accomplished the qualitative classification of tequilas. A Leave-One-Out Cross-Validation (LOOCV) procedure was performed to evaluate classifiers’ performance. The proposed system allowed the identification of real tequila samples achieving an overall classification rate of 90.02%, average sensitivity, and specificity of 0.90 and 0.96, respectively, while Cohen’s kappa coefficient was 0.87. In this case, the EE has demonstrated a favorable capability to correctly discriminated and classified the different tequila samples according to their categories.


Metabolites ◽  
2021 ◽  
Vol 11 (5) ◽  
pp. 265
Author(s):  
Ruchi Sharma ◽  
Wenzhe Zang ◽  
Menglian Zhou ◽  
Nicole Schafer ◽  
Lesa A. Begley ◽  
...  

Asthma is heterogeneous but accessible biomarkers to distinguish relevant phenotypes remain lacking, particularly in non-Type 2 (T2)-high asthma. Moreover, common clinical characteristics in both T2-high and T2-low asthma (e.g., atopy, obesity, inhaled steroid use) may confound interpretation of putative biomarkers and of underlying biology. This study aimed to identify volatile organic compounds (VOCs) in exhaled breath that distinguish not only asthmatic and non-asthmatic subjects, but also atopic non-asthmatic controls and also by variables that reflect clinical differences among asthmatic adults. A total of 73 participants (30 asthma, eight atopic non-asthma, and 35 non-asthma/non-atopic subjects) were recruited for this pilot study. A total of 79 breath samples were analyzed in real-time using an automated portable gas chromatography (GC) device developed in-house. GC-mass spectrometry was also used to identify the VOCs in breath. Machine learning, linear discriminant analysis, and principal component analysis were used to identify the biomarkers. Our results show that the portable GC was able to complete breath analysis in 30 min. A set of nine biomarkers distinguished asthma and non-asthma/non-atopic subjects, while sets of two and of four biomarkers, respectively, further distinguished asthmatic from atopic controls, and between atopic and non-atopic controls. Additional unique biomarkers were identified that discriminate subjects by blood eosinophil levels, obese status, inhaled corticosteroid treatment, and also acute upper respiratory illnesses within asthmatic groups. Our work demonstrates that breath VOC profiling can be a clinically accessible tool for asthma diagnosis and phenotyping. A portable GC system is a viable option for rapid assessment in asthma.


Author(s):  
Hsein Kew

AbstractIn this paper, we propose a method to generate an audio output based on spectroscopy data in order to discriminate two classes of data, based on the features of our spectral dataset. To do this, we first perform spectral pre-processing, and then extract features, followed by machine learning, for dimensionality reduction. The features are then mapped to the parameters of a sound synthesiser, as part of the audio processing, so as to generate audio samples in order to compute statistical results and identify important descriptors for the classification of the dataset. To optimise the process, we compare Amplitude Modulation (AM) and Frequency Modulation (FM) synthesis, as applied to two real-life datasets to evaluate the performance of sonification as a method for discriminating data. FM synthesis provides a higher subjective classification accuracy as compared with to AM synthesis. We then further compare the dimensionality reduction method of Principal Component Analysis (PCA) and Linear Discriminant Analysis in order to optimise our sonification algorithm. The results of classification accuracy using FM synthesis as the sound synthesiser and PCA as the dimensionality reduction method yields a mean classification accuracies of 93.81% and 88.57% for the coffee dataset and the fruit puree dataset respectively, and indicate that this spectroscopic analysis model is able to provide relevant information on the spectral data, and most importantly, is able to discriminate accurately between the two spectra and thus provides a complementary tool to supplement current methods.


2020 ◽  
pp. 1-11
Author(s):  
Mayamin Hamid Raha ◽  
Tonmoay Deb ◽  
Mahieyin Rahmun ◽  
Tim Chen

Face recognition is the most efficient image analysis application, and the reduction of dimensionality is an essential requirement. The curse of dimensionality occurs with the increase in dimensionality, the sample density decreases exponentially. Dimensionality Reduction is the process of taking into account the dimensionality of the feature space by obtaining a set of principal features. The purpose of this manuscript is to demonstrate a comparative study of Principal Component Analysis and Linear Discriminant Analysis methods which are two of the highly popular appearance-based face recognition projection methods. PCA creates a flat dimensional data representation that describes as much data variance as possible, while LDA finds the vectors that best discriminate between classes in the underlying space. The main idea of PCA is to transform high dimensional input space into the function space that displays the maximum variance. Traditional LDA feature selection is obtained by maximizing class differences and minimizing class distance.


1994 ◽  
Vol 9 (4) ◽  
pp. 211-220 ◽  
Author(s):  
A Dazord ◽  
P Gerin ◽  
JD Davis ◽  
ML Davis ◽  
N Aapro ◽  
...  

SummaryThis study is part of a more extensive research conducted by a group of scientists from different countries, who have joined forces to conduct an international study on the development of the therapist, and to develop a novel instrument for therapists, the “Development of Psychotherapists’ Common Core Questionnaire” (CCQ). We report here the results based on the answers of a French-speaking sample to this questionnaire. Data were analyzed using univariate (non parametric tests) and multivariate analyses (Principal Component Analysis and Multiple Correspondence Analysis). The perceived effects of psychoanalytical training were examined. The therapists’ own perception on their current skills and the type of difficulties they experienced were very similar, whether or not they had received psychoanalytical training. However, striking and significant differences in coping strategies were observed between the two groups.


Sensors ◽  
2019 ◽  
Vol 19 (20) ◽  
pp. 4523 ◽  
Author(s):  
Carlos Cabo ◽  
Celestino Ordóñez ◽  
Fernando Sáchez-Lasheras ◽  
Javier Roca-Pardiñas ◽  
and Javier de Cos-Juez

We analyze the utility of multiscale supervised classification algorithms for object detection and extraction from laser scanning or photogrammetric point clouds. Only the geometric information (the point coordinates) was considered, thus making the method independent of the systems used to collect the data. A maximum of five features (input variables) was used, four of them related to the eigenvalues obtained from a principal component analysis (PCA). PCA was carried out at six scales, defined by the diameter of a sphere around each observation. Four multiclass supervised classification models were tested (linear discriminant analysis, logistic regression, support vector machines, and random forest) in two different scenarios, urban and forest, formed by artificial and natural objects, respectively. The results obtained were accurate (overall accuracy over 80% for the urban dataset, and over 93% for the forest dataset), in the range of the best results found in the literature, regardless of the classification method. For both datasets, the random forest algorithm provided the best solution/results when discrimination capacity, computing time, and the ability to estimate the relative importance of each variable are considered together.


2021 ◽  
pp. 096703352098731
Author(s):  
Adenilton C da Silva ◽  
Lívia PD Ribeiro ◽  
Ruth MB Vidal ◽  
Wladiana O Matos ◽  
Gisele S Lopes

The use of alcohol-based hand sanitizers is recommended as one of several strategies to minimize contamination and spread of the COVID-19 disease. Current reports suggest that the virucidal potential of ethanol occurs at concentrations close to 70%. Traditional methods of verifying the ethanol concentration in such products invite potential errors due to the viscosity of chemical components or may be prohibitively expensive to undertake in large demand. Near infrared (NIR) spectroscopy and chemometrics have already been used for the determination of ethanol in other matrices and present an alternative fast and reliable approach to quality control of alcohol-based hand sanitizers. In this study, a portable NIR spectrometer combined with classification chemometric tools, i.e., partial least square discriminant analysis (PLS–DA) and linear discriminant analysis with successive algorithm projection (SPA–LDA) were used to construct models to identify conforming and non-conforming commercial and laboratory synthesized hand sanitizer samples. Principal component analysis (PCA) was applied in an exploratory data study. Three principal components accounted for 99% of data variance and demonstrate clustering of conforming and non-conforming samples. The PLS–DA and SPA–LDA classification models presented 77 and 100% of accuracy in cross/internal validation respectively and 100% of accuracy in the classification of test samples. A total of 43% commercial samples evaluated using the PLS–DA and SPA–LDA presented ethanol content non-conforming for hand sanitizer gel. These results indicate that use of NIR spectroscopy and chemometrics is a promising strategy, yielding a method that is fast, portable, and reliable for discrimination of alcohol-based hand sanitizers with respect to conforming and non-conforming ethanol concentrations.


Molecules ◽  
2021 ◽  
Vol 26 (9) ◽  
pp. 2423
Author(s):  
Michał Miłek ◽  
Aleksandra Bocian ◽  
Ewelina Kleczyńska ◽  
Patrycja Sowa ◽  
Małgorzata Dżugan

Many imported honeys distributed on the Polish market compete with local products mainly by lower price, which can correspond to lower quality and widespread adulteration. The aim of the study was to compare honey samples (11 imported honey blends and 5 local honeys) based on their antioxidant activity (measured by DPPH, FRAP, and total phenolic content), protein profile obtained by native PAGE, soluble protein content, diastase, and acid phosphatase activities identified by zymography. These indicators were correlated with standard quality parameters (water, HMF, pH, free acidity, and electrical conductivity). It was found that raw local Polish honeys show higher antioxidant and enzymatic activity, as well as being more abundant in soluble protein. With the use of principal component analysis (PCA) and stepwise linear discriminant analysis (LDA) protein content and diastase number were found to be significant (p < 0.05) among all tested parameters to differentiate imported honey from raw local honeys.


Sign in / Sign up

Export Citation Format

Share Document