Robust Logistic Principal Component Regression for classification of data in presence of outliers

Author(s):  
H. C. Wu ◽  
S. C. Chan ◽  
K. M. Tsui
1996 ◽  
Vol 50 (12) ◽  
pp. 1590-1596 ◽  
Author(s):  
Frédéric Cadet

Several methods have been proposed with the aim of improving the precision of quantitative measurements of biological components (baseline correction, classification, elimination of unwanted components, etc.). In this context, we propose a classification method of biological samples (raw sugar cane juices) before sucrose content prediction is performed. The method consisted of isolating the two most dissimilar individuals from a large calibration family of mid-FT-IR spectra, and, by successive principal component analysis (PCA) and principal component regression (PCR), a family composed of a few individuals was constituted. Each individual from this family represented the first spectrum of the corresponding classes that were ultimately formed. The classification of the remaining samples from the calibration family was carried out by the mobile centers method, that is, by the measurements of the Euclidian distances. This procedure improved the precision of the predictions. The mean and standard deviation (SD) of the differences between predicted and reference values were, respectively, −1.62 × 10−3 and 0.308 before classification and 2.38 × 10−3 and 0.254 after classification. The procedure developed in this paper first allowed a qualitative classification of spectra without knowledge of their chemical composition, and second, improved the precision of the quantitative predictions.


Author(s):  
Hyeuk Kim

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.


2019 ◽  
Vol 8 (1) ◽  
Author(s):  
Khairunnisa Khairunnisa ◽  
Rizka Pitri ◽  
Victor P Butar-Butar ◽  
Agus M Soleh

This research used CFSRv2 data as output data general circulation model. CFSRv2 involves some variables data with high correlation, so in this research is using principal component regression (PCR) and partial least square (PLS) to solve the multicollinearity occurring in CFSRv2 data. This research aims to determine the best model between PCR and PLS to estimate rainfall at Bandung geophysical station, Bogor climatology station, Citeko meteorological station, and Jatiwangi meteorological station by comparing RMSEP value and correlation value. Size used was 3×3, 4×4, 5×5, 6×6, 7×7, 8×8, 9×9, and 11×11 that was located between (-40) N - (-90) S and 1050 E -1100 E with a grid size of 0.5×0.5 The PLS model was the best model used in stastistical downscaling in this research than PCR model because of the PLS model obtained the lower RMSEP value and the higher correlation value. The best domain and RMSEP value for Bandung geophysical station, Bogor climatology station, Citeko meteorological station, and Jatiwangi meteorological station is 9 × 9 with 100.06, 6 × 6 with 194.3, 8 × 8 with 117.6, and 6 × 6 with 108.2, respectively.


2018 ◽  
Vol 21 (2) ◽  
pp. 125-137
Author(s):  
Jolanta Stasiak ◽  
Marcin Koba ◽  
Marcin Gackowski ◽  
Tomasz Baczek

Aim and Objective: In this study, chemometric methods as correlation analysis, cluster analysis (CA), principal component analysis (PCA), and factor analysis (FA) have been used to reduce the number of chromatographic parameters (logk/logkw) and various (e.g., 0D, 1D, 2D, 3D) structural descriptors for three different groups of drugs, such as 12 analgesic drugs, 11 cardiovascular drugs and 36 “other” compounds and especially to choose the most important data of them. Material and Methods: All chemometric analyses have been carried out, graphically presented and also discussed for each group of drugs. At first, compounds’ structural and chromatographic parameters were correlated. The best results of correlation analysis were as follows: correlation coefficients like R = 0.93, R = 0.88, R = 0.91 for cardiac medications, analgesic drugs, and 36 “other” compounds, respectively. Next, part of molecular and HPLC experimental data from each group of drugs were submitted to FA/PCA and CA techniques. Results: Almost all results obtained by FA or PCA, and total data variance, from all analyzed parameters (experimental and calculated) were explained by first two/three factors: 84.28%, 76.38 %, 69.71% for cardiovascular drugs, for analgesic drugs and for 36 “other” compounds, respectively. Compounds clustering by CA method had similar characteristic as those obtained by FA/PCA. In our paper, statistical classification of mentioned drugs performed has been widely characterized and discussed in case of their molecular structure and pharmacological activity. Conclusion: Proposed QSAR strategy of reduced number of parameters could be useful starting point for further statistical analysis as well as support for designing new drugs and predicting their possible activity.


2020 ◽  
Vol 17 (1) ◽  
pp. 94-104
Author(s):  
Antonio F. Mottese ◽  
Maria R. Fede ◽  
Francesco Caridi ◽  
Giuseppe Sabatino ◽  
Giuseppe Marcianò ◽  
...  

Background and Objectives: In this work, yellow and green varieties of Cucumis melo fruits belonging to different cultivars were studied. In detail, three Sicilian cultivars of winter melons tutelated by TAP (Traditional agro-alimentary products) labels were considered, whereas asun protected the Calabrian winter melon was studied too. With the aim to compare the selective uptakes of inorganic elements among winter and summer fruits, the “PGI Melone Mantovano” was investigated. The purpose of this work was to apply the obtained results i) to guarantee the quality and healthiness of fruits, ii) to producers defend, iii) to help the customers in safe food purchase. Method: All samples were analyzed by ICP-MS and the obtained results, subsequently, were subjected to Cluster analysis (CA), Principal component analysis (PCA) and Canonical discriminant analysis (CDA). Results: CA results were generally in agreement with samples origin, whereas the PCA elaboration has confirmed the presence of a strong relation between fruit origins and trace element contents. In particular, two principal components justified the 57.32% of the total variance (PC1= 40.95%, PC2= 16.37%). Finally, the CDA approach has provided several functions with high discrimination power, confirmed by the correct classification of all samples (100%). Conclusions: CA, PCA and CDA could represent an integrated to label to discriminate the origin of agri-food products and, thus, protect and guarantee their healthiness.


2007 ◽  
Vol 90 (2) ◽  
pp. 391-404 ◽  
Author(s):  
Fadia H Metwally ◽  
Yasser S El-Saharty ◽  
Mohamed Refaat ◽  
Sonia Z El-Khateeb

Abstract New selective, precise, and accurate methods are described for the determination of a ternary mixture containing drotaverine hydrochloride (I), caffeine (II), and paracetamol (III). The first method uses the first (D1) and third (D3) derivative spectrophotometry at 331 and 315 nm for the determination of (I) and (III), respectively, without interference from (II). The second method depends on the simultaneous use of the first derivative of the ratio spectra (DD1) with measurement at 312.4 nm for determination of (I) using the spectrum of 40 μg/mL (III) as a divisor or measurement at 286.4 and 304 nm after using the spectrum of 4 μg/mL (I) as a divisor for the determination of (II) and (III), respectively. In the third method, the predictive abilities of the classical least-squares, principal component regression, and partial least-squares were examined for the simultaneous determination of the ternary mixture. The last method depends on thin-layer chromatography-densitometry after separation of the mixture on silica gel plates using ethyl acetatechloroformmethanol (16 + 3 + 1, v/v/v) as the mobile phase. The spots were scanned at 281, 272, and 248 nm for the determination of (I), (II), and (III), respectively. Regression analysis showed good correlation in the selected ranges with excellent percentage recoveries. The chemical variables affecting the analytical performance of the methodology were studied and optimized. The methods showed no significant interferences from excipients. Intraday and interday assay precision and accuracy values were within regulatory limits. The suggested procedures were checked using laboratory-prepared mixtures and were successfully applied for the analysis of their pharmaceutical preparations. The validity of the proposed methods was further assessed by applying a standard addition technique. The results obtained by applying the proposed methods were statistically analyzed and compared with those obtained by the manufacturer's method.


2021 ◽  
pp. 1471082X2110229
Author(s):  
D. Stasinopoulos Mikis ◽  
A. Rigby Robert ◽  
Georgikopoulos Nikolaos ◽  
De Bastiani Fernanda

A solution to the problem of having to deal with a large number of interrelated explanatory variables within a generalized additive model for location, scale and shape (GAMLSS) is given here using as an example the Greek–German government bond yield spreads from 25 April 2005 to 31 March 2010. Those were turbulent financial years, and in order to capture the spreads behaviour, a model has to be able to deal with the complex nature of the financial indicators used to predict the spreads. Fitting a model, using principal components regression of both main and first order interaction terms, for all the parameters of the assumed distribution of the response variable seems to produce promising results.


2021 ◽  
Vol 19 (1) ◽  
pp. 205-213
Author(s):  
Hany W. Darwish ◽  
Abdulrahman A. Al Majed ◽  
Ibrahim A. Al-Suwaidan ◽  
Ibrahim A. Darwish ◽  
Ahmed H. Bakheit ◽  
...  

Abstract Five various chemometric methods were established for the simultaneous determination of azilsartan medoxomil (AZM) and chlorthalidone in the presence of azilsartan which is the core impurity of AZM. The full spectrum-based chemometric techniques, namely partial least squares (PLS), principal component regression, and artificial neural networks (ANN), were among the applied methods. Besides, the ANN and PLS were the other two methods that were extended by genetic algorithm procedure (GA-PLS and GA-ANN) as a wavelength selection procedure. The models were developed by applying a multilevel multifactor experimental design. The predictive power of the suggested models was evaluated through a validation set containing nine mixtures with different ratios of the three analytes. For the analysis of Edarbyclor® tablets, all the proposed procedures were applied and the best results were achieved in the case of ANN, GA-ANN, and GA-PLS methods. The findings of the three methods were revealed as the quantitative tool for the analysis of the three components without any intrusion from the co-formulated excipient and without prior separation procedures. Moreover, the GA impact on strengthening the predictive power of ANN- and PLS-based models was also highlighted.


Sign in / Sign up

Export Citation Format

Share Document