scholarly journals Visualization of Chemical Space using Kernel Based Principal Component Research

Principal Component analysis (PCA) is one of the important and popular multivariate statistical methods applied over various data modeling applications. Traditional PCA handles linear variance in molecular descriptors or features. Handling complicated data by standard PCA will not be very helpful. This drawback can be handled by introducing kernel matrix over PCA. Kernel Principal Component Analysis (KPCA) is an extension of conventional PCA which handles non-linear hidden patterns exists in variables. It results in computational efficiency for data analysis and data visualization. In this paper, KPCA has been applied over dug-likeness dataset for visualization of non-linear relations exists in variables.

1969 ◽  
Vol 5 (1) ◽  
pp. 67-77 ◽  
Author(s):  
S. C. Pearce

SUMMARYMultivariate statistical methods are used increasingly in biological research to investigate the responses of organisms considered as a whole, whereas established statistical methods are usually concerned with measured characteristics considered one at a time. Multivariate techniques are mostly explained in terms of matrix algebra, which is a way of dealing with groups of numbers rather than individual ones. A brief description is given of some elementary results of matrix algebra and a method is presented whereby hypotheses can be generated about interrelations within an organism. Two techniques, principal component analysis and canonical analysis, are described in greater detail. It is emphasized that hypotheses need to be tested even though they have been generated by objective statistical means.


2014 ◽  
Vol 1030-1032 ◽  
pp. 1822-1827
Author(s):  
Ning Lv ◽  
Guang Yuan Bai ◽  
Lu Qi Yan ◽  
Yuan Jian Fu

In order to overcome the application limitations of principal component analysis fault diagnose model in non-linear time-varying and reduce computational complexity for process monitoring based on non-linear principal component, we introduced kernel transformation theory of nonlinear space to extract data feature extraction and a fault monitoring model based on kernel principal component analysis (KPCA) for constant value detection was proposed. Through the proper selection of kernel function parameter values, the KPCA model can achieve constant value of process fault detection and has lower computational complexity than other non-linear algorithms. The fault detection experiment for beer fermentation process shows that this method is able to detect process faults in a timely manner and has good real-time performance and accuracy in the batch process of slowly time-varying.


2020 ◽  
Vol 42 ◽  
pp. e17
Author(s):  
Paulo Jorge Canas Rodrigues ◽  
Rafael Almeida ◽  
Kézia Mustafa

Multivariate statistical methods have been playing an important role in statistics and data analysis for a very long time. Nowadays, with the increase in the amounts of data collected every day in many disciplines, and with the raise of data science, machine learning and applied statistics, that role is even more important. Two of the most widely used multivariate statistical methods are cluster analysis and principal component analysis. These, similarly to many other models and algorithms, are adequate when the data satisfies certain assumptions. However, when the distribution of the data is not normal and/or it shows heavy tails and outlying observations, the classic models and algorithms might produce erroneous conclusions. Robust statistical methods such as algorithms for robust cluster analysis and for robust principal component analysis are of great usefulness when analyzing contaminated data with outlying observations. In this paper we consider a data set containing the products available in a fast food restaurant chain together with their respective nutritional information, and discuss the usefulness of robust statistical methods for classification, clustering and data visualization.


2009 ◽  
Vol 413-414 ◽  
pp. 583-590 ◽  
Author(s):  
Fei He ◽  
Min Li ◽  
Jian Hong Yang ◽  
Jin Wu Xu

In order to monitor nonlinear production process effectively, multivariate statistical process control based on kernel principal component analysis is applied to process monitoring and diagnosis. Squared prediction error (SPE) statistic of the kernel principal component analysis (KPCA) model is used for process monitoring, and the fault causes of the production process could be tracked by the methods of data reconstruction and the optimal neighbor selection strategy. Simulation data and Tennessee Eastman process data are used for model validation, as a result the proposed method has better performance on abnormality detecting, compared with multivariate statistical process control based on linear principal component analysis. What is more, the causes of the faults are tracked effectively, thus the production process can be adjusted to prevent substandard products.


2021 ◽  
Vol 11 (14) ◽  
pp. 6370
Author(s):  
Elena Quatrini ◽  
Francesco Costantino ◽  
David Mba ◽  
Xiaochuan Li ◽  
Tat-Hean Gan

The water purification process is becoming increasingly important to ensure the continuity and quality of subsequent production processes, and it is particularly relevant in pharmaceutical contexts. However, in this context, the difficulties arising during the monitoring process are manifold. On the one hand, the monitoring process reveals various discontinuities due to different characteristics of the input water. On the other hand, the monitoring process is discontinuous and random itself, thus not guaranteeing continuity of the parameters and hindering a straightforward analysis. Consequently, further research on water purification processes is paramount to identify the most suitable techniques able to guarantee good performance. Against this background, this paper proposes an application of kernel principal component analysis for fault detection in a process with the above-mentioned characteristics. Based on the temporal variability of the process, the paper suggests the use of past and future matrices as input for fault detection as an alternative to the original dataset. In this manner, the temporal correlation between process parameters and machine health is accounted for. The proposed approach confirms the possibility of obtaining very good monitoring results in the analyzed context.


Sign in / Sign up

Export Citation Format

Share Document