Prediction of Graduation with Naïve Bayes Algorithm and Principal Component Analysis (PCA) on Time Series Data

Author(s):  
Wishnu Dwi Herlambang ◽  
Kusuma Ayu Laksitowening ◽  
Ibnu Asror
2011 ◽  
Vol 2011 ◽  
pp. 1-14 ◽  
Author(s):  
Min Lei ◽  
Guang Meng

Experimental data are often very complex since the underlying dynamical system may be unknown and the data may heavily be corrupted by noise. It is a crucial task to properly analyze data to get maximal information of the underlying dynamical system. This paper presents a novel principal component analysis (PCA) method based on symplectic geometry, called symplectic PCA (SPCA), to study nonlinear time series. Being nonlinear, it is different from the traditional PCA method based on linear singular value decomposition (SVD). It is thus perceived to be able to better represent nonlinear, especially chaotic data, than PCA. Using the chaotic Lorenz time series data, we show that this is indeed the case. Furthermore, we show that SPCA can conveniently reduce measurement noise.


Author(s):  
Fayed Alshammri ◽  
Jiazhu Pan

AbstractThis paper proposes an extension of principal component analysis to non-stationary multivariate time series data. A criterion for determining the number of final retained components is proposed. An advance correlation matrix is developed to evaluate dynamic relationships among the chosen components. The theoretical properties of the proposed method are given. Many simulation experiments show our approach performs well on both stationary and non-stationary data. Real data examples are also presented as illustrations. We develop four packages using the statistical software R that contain the needed functions to obtain and assess the results of the proposed method.


2016 ◽  
Vol 75 (4) ◽  
pp. 765-774
Author(s):  
Leonardo Plazas-Nossa ◽  
Thomas Hofer ◽  
Günter Gruber ◽  
Andres Torres

This work proposes a methodology for the forecasting of online water quality data provided by UV-Vis spectrometry. Therefore, a combination of principal component analysis (PCA) to reduce the dimensionality of a data set and artificial neural networks (ANNs) for forecasting purposes was used. The results obtained were compared with those obtained by using discrete Fourier transform (DFT). The proposed methodology was applied to four absorbance time series data sets composed by a total number of 5705 UV-Vis spectra. Absolute percentage errors obtained by applying the proposed PCA/ANN methodology vary between 10% and 13% for all four study sites. In general terms, the results obtained were hardly generalizable, as they appeared to be highly dependent on specific dynamics of the water system; however, some trends can be outlined. PCA/ANN methodology gives better results than PCA/DFT forecasting procedure by using a specific spectra range for the following conditions: (i) for Salitre wastewater treatment plant (WWTP) (first hour) and Graz West R05 (first 18 min), from the last part of UV range to all visible range; (ii) for Gibraltar pumping station (first 6 min) for all UV-Vis absorbance spectra; and (iii) for San Fernando WWTP (first 24 min) for all of UV range to middle part of visible range.


2020 ◽  
Vol 8 (5) ◽  
pp. 4105-4110

In the current scenario, the researchers are focusing towards health care project for the prediction of the disease and its type. In addition to the prediction, there exists a need to find the influencing parameter that directly related to the disease prediction. The analysis of the parameters needed to the prediction of the disease still remains a challenging issue. With this view, we focus on predicting the heart disease by applying the dataset with boosting the parameters of the dataset. The heart disease data set extracted from UCI Machine Learning Repository is used for implementation. The anaconda Navigator IDE along with Spyder is used for implementing the Python code. Our contribution is folded is folded in three ways. First, the data preprocessing is done and the attribute relationship is identified by the correlation values. Second, the data set is fitted to random boost regressor and the important features are identified. Third, the dataset is feature scaled reduced and then fitted to random forest classifier, decision tree classifier, Naïve bayes classifier, logistic regression classifier, kernel support vector machine and KNN classifier. Fourth, the dataset is reduced with principal component analysis with five components and then fitted to the above mentioned classifiers. Fifth, the performance of the classifiers is analyzed with the metrics like accuracy, recall, fscore and precision. Experimental results shows that, the Naïve bayes classifier is more effective with the precision, Recall and Fscore of 0.89 without random boost, 0.88 with random boosting and 0.90 with principal component analysis. Experimental results show, the Naïve bayes classifier is more effective with the accuracy of 89% without random boost, 90% with random boosting and 91% with principal component analysis.


Author(s):  
Ahmad Ashifuddin Aqham ◽  
Kristoko Dwi Hartomo

The strategy used for telemarketing by conducting promotional media, this strategy is a marketing method used by banks, in offering products to customers, banks, one of the products that will be offered is time deposits, the bank has difficulty in knowing the obstacles experienced by customers in making a decision to make deposits against the bank, so that later it will have the effect of a financial crisis at the bank. Telemarketing banks must have targets for customers, where customers have the potential to join one of the bank's products, namely deposits by looking at existing customer data.With the existing problems will be overcome by the datamining technique that will be used for this research is the Naïve Bayes algorithm and genetic algorithm which aims to predict the Telemarketing customers' sources sourced from public UCI Repsitory data so that the bank offers a product to the customer right at the target. Naïve Bayes test with experimental results of 86.71% accuracy while cross validation testing using Genetic algorithm produces high accuracy 90.27%, Root proves the prediction of time series data Naïve Bayes method and Genetics produces an accuracy of 90.27%, so it can be concluded that using the Naive Bayes algorithm and Genetics can optimize in predicting Telemarketing client decisions right in the deposit offer.


Sign in / Sign up

Export Citation Format

Share Document