Enhancing Web Data Mining

Author(s):  
Abhishek Taneja

An enormous production of databases in almost every area of human endeavor particularly through web has created a great demand for new, powerful tools for turning data into useful, task-oriented knowledge. The aim of this study is to study the predictive ability of Factor Analysis a web mining technique to prevent voting, averaging, stack generalization, meta- learning and thus saving much of our time in choosing the right technique for right kind of underlying dataset. This chapter compares the three factor based techniques viz. principal component regression (PCR), Generalized Least Square (GLS) Regression, and Maximum Likelihood Regression (MLR) method and explores their predictive ability on theoretical as well as on experimental basis. All the three factor based techniques have been compared using the necessary conditions for forecasting like R-Square, Adjusted R-Square, F-Test, JB (Jarque-Bera) test of normality. This study can be further explored and enhanced using sufficient conditions for forecasting like Theil's Inequality coefficient (TIC), and Janur Quotient (JQ).

2019 ◽  
Vol 8 (1) ◽  
Author(s):  
Khairunnisa Khairunnisa ◽  
Rizka Pitri ◽  
Victor P Butar-Butar ◽  
Agus M Soleh

This research used CFSRv2 data as output data general circulation model. CFSRv2 involves some variables data with high correlation, so in this research is using principal component regression (PCR) and partial least square (PLS) to solve the multicollinearity occurring in CFSRv2 data. This research aims to determine the best model between PCR and PLS to estimate rainfall at Bandung geophysical station, Bogor climatology station, Citeko meteorological station, and Jatiwangi meteorological station by comparing RMSEP value and correlation value. Size used was 3×3, 4×4, 5×5, 6×6, 7×7, 8×8, 9×9, and 11×11 that was located between (-40) N - (-90) S and 1050 E -1100 E with a grid size of 0.5×0.5 The PLS model was the best model used in stastistical downscaling in this research than PCR model because of the PLS model obtained the lower RMSEP value and the higher correlation value. The best domain and RMSEP value for Bandung geophysical station, Bogor climatology station, Citeko meteorological station, and Jatiwangi meteorological station is 9 × 9 with 100.06, 6 × 6 with 194.3, 8 × 8 with 117.6, and 6 × 6 with 108.2, respectively.


Author(s):  
Qiang Zhao ◽  
Jianguo Sun

Statistical analysis of microarray gene expression data has recently attracted a great deal of attention. One problem of interest is to relate genes to survival outcomes of patients with the purpose of building regression models for the prediction of future patients' survival based on their gene expression data. For this, several authors have discussed the use of the proportional hazards or Cox model after reducing the dimension of the gene expression data. This paper presents a new approach to conduct the Cox survival analysis of microarray gene expression data with the focus on models' predictive ability. The method modifies the correlation principal component regression (Sun, 1995) to handle the censoring problem of survival data. The results based on simulated data and a set of publicly available data on diffuse large B-cell lymphoma show that the proposed method works well in terms of models' robustness and predictive ability in comparison with some existing partial least squares approaches. Also, the new approach is simpler and easy to implement.


2020 ◽  
Vol 88 (3) ◽  
pp. 35
Author(s):  
Endjang Prebawa Tejamukti ◽  
Widiastuti Setyaningsih ◽  
Irnawati ◽  
Budiman Yasir ◽  
Gemini Alam ◽  
...  

Mangosteen, or Garcinia mangostana L., has merged as an emerging fruit to be investigated due to its active compounds, especially xanthone derivatives such as α -mangostin (AM), γ-mangostin (GM), and gartanin (GT). These compounds had been reported to exert some pharmacological activities, such as antioxidant and anti-inflammatory, therefore, the development of an analytical method capable of quantifying these compounds should be investigated. The aim of this study was to determine the correlation between FTIR spectra and HPLC chromatogram, combined with chemometrics for quantitative analysis of ethanolic extract of mangosteen. The ethanolic extract of mangosteen pericarp was prepared using the maceration technique, and the obtained extract was subjected to measurement using instruments of FTIR spectrophotometer at wavenumbers of 4000–650 cm−1 and HPLC, using a PDA detector at 281 nm. The data acquired were subjected to chemometrics analysis of partial least square (PLS) and principal component regression (PCR). The result showed that the wavenumber regions of 3700–2700 cm−1 offered a reliable method for quantitative analysis of GM with coefficient of determination (R2) 0.9573 in calibration and 0.8134 in validation models, along with RMSEC value of 0.0487% and RMSEP value 0.120%. FTIR spectra using the second derivatives at wavenumber 3700–663 cm−1 with coefficient of determination (R2) >0.99 in calibration and validation models, along with the lowest RMSEC value and RMSEP value, were used for quantitative analysis of GT and AM, respectively. It can be concluded that FTIR spectra combined with multivariate are accurate and precise for the analysis of xanthones.


Author(s):  
ANGGITA ROSIANA PUTRI ◽  
ABDUL ROHMAN ◽  
SUGENG RIYANTO

Objective: The aims of this research were to analyse the fatty acids contained in Patin (Pangasius micronemus) and Gabus (Channa striata) fish oils also its authentication using FTIR spectroscopy combined with chemometrics. Methods: Patin fish oil (PFO) was extracted from patin flesh using the maceration method with petroleum benzene as the solvent, while gabus fish oil (GFO) was purchased from the market in Yogyakarta. The analysis of fatty acid was done using gas chromatography–flame ionization detector (GC-FID). The authentication was performed using FTIR spectrophotometer and chemometrics methods. Principal component analysis (PCA) was used to determine the proximity of oils based on the characteristic similarity. The quantification of adulterated PFO was performed using multivariate calibrations, partial least square (PLS) and principal component regression (PCR). The classification between authentic oils and those adulterated used discriminant analysis (DA). Results: The level of saturated and polyunsaturated fatty acids in PFO is higher than in GFO. The PLS and PCR methods using the second derivative spectra at wavenumbers of 666–3050 cm-1 offered the highest values of coefficient of determination (R2) and lowest root means the square error of calibration (RMSEC) and root mean square error of prediction (RMSEP). Conclusion: The PCA method was successfully used to determine the proximity of oils. Among oils studied, PFO has a similarity fatty acid composition with GFO. The DA method was able to screen pure PFO from adulterated PFO without any misclassification reported. FTIR spectroscopy in combined with chemometrics can be used for authentication and quantification.


2018 ◽  
Vol 2018 ◽  
pp. 1-7 ◽  
Author(s):  
Edwin García-Miguel ◽  
Ofelia Gabriela Meza-Márquez ◽  
Guillermo Osorio-Revilla ◽  
Darío Iker Téllez-Medina ◽  
Cristian Jiménez-Martínez ◽  
...  

Chemometric methods using mid-FTIR spectroscopy were developed in order to reduce the time of study of melamine and cyanuric acid in infant formulas. Chemometric models were constructed using the algorithms Partial Least Squares (PLS1, PLS2) and Principal Component Regression (PCR) in order to correlate the IR signal with the levels of melamine or cyanuric acid in the infant formula samples. Results showed that the best correlations were obtained using PLS1 (R2: 0.9998, SEC: 0.0793, and SEP: 0.5545 for melamine and R2: 0.9997, SEC: 0.1074, and SEP: 0.5021 for cyanuric acid). Also, the SIMCA model was studied to distinguish between adulterated formulas and nonadulterated samples, giving optimum discrimination and good interclass distances between samples. Results showed that chemometric models demonstrated a good predictive ability of melamine and cyanuric acid concentrations in infant formulas, showing that this is a rapid and accurate technique to be used in the identification and quantification of these adulterants in infant formulas.


2003 ◽  
Vol 11 (1) ◽  
pp. 55-70 ◽  
Author(s):  
Laila Stordrange ◽  
Olav M. Kvalheim ◽  
Per A. Hassel ◽  
Dick Malthe-Sørenssen ◽  
Fred Olav Libnau

Partial least squares (PLS) is a powerful tool for multivariate linear regression. But what if the data show a non-linear structure? Near infrared spectra from a pharmaceutical process were used as a case study. An ANOVA test revealed that the data are well described by a 2nd order polynomial. This work investigates the application of regression techniques that account for slightly non-linear data. The regression techniques investigated are: linearising data by applying transformations, local PLS, i.e. splitting of data, and quadratic PLS. These models were compared with ordinary PLS and principal component regression (PCR). The predictive ability of the models was tested on an independent data set acquired a year later. Using the knowledge of non-linear pattern and important spectral regions, simpler models with better predictive ability can be obtained.


Alotrop ◽  
2019 ◽  
Vol 3 (1) ◽  
Author(s):  
Angga Aprian Dinata ◽  
M. Lutfi Firdaus ◽  
Rina Elvia

Digital image method in quantitative analysis usually uses one of the RGB primary color components (Red, Green, Blue), so that not all digital image data can be extracted. Then needed a method that can render the whole RGB values as variables in quantitative analysis are known as chemometric. This research aims to know the influence of the application of chemometric against the sensitivity of the digital image. Chemometry method used is the Principal Component Regression (PCR) and Partial Least Square (PLS) using Unscramber X software from Camo software, USA.. This method is applied for the quantitative analysis of Mercury (II) ion with silver nanoparticles (NPP) immobilization on filter paper indicator. The research results showed that chemometric has a good influence against the level of the Limit of Detection (LOD) of the digital image, where the level of LOD with chemometric application of the Principal Component Regression (PCR) is 0.4311 ppb, and Partial Least Square (PLS) is  0.4310 ppb smaller than without the application of chemometric Single Linear Regression (SLR) at 0.837 ppb. 


Plants ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 1775
Author(s):  
Amol N. Nankar ◽  
M Paul Scott ◽  
Richard C. Pratt

One aim of this experiment was to develop NIR calibrations for 20-grain components in 143 pigmented maize samples evaluated in four locations across New Mexico during 2013 and 2014. Based on reference analysis, prediction models were developed using principal component regression (PCR) and partial least squares (PLS). The predictive ability of calibrations was generally low, with the calibrations for methionine and glycine performing best by PCR and PLS. The second aim was to explore the relationships among grain constituents. In PCA, the first three PCs explained 49.62, 22.20, and 6.92% of the total variance and tend to align with nitrogen-containing compounds (amino acids), carbon-rich compounds (starch, anthocyanin, fiber, and fat), and sulfur-containing compounds (cysteine and methionine), respectively. Correlations among traits were identified, and these relationships were illustrated by a correlation network. Some relationships among components were driven by common synthetic origins, for example, among amino acids derived from pyruvate. Similarly, anthocyanins, crude fat, and fatty acids all share malonyl CoA in their biosynthetic pathways and were correlated. In contrast, crude fiber and starch have similar biosynthetic origins but were negatively correlated, and this may have been due to their different functional roles in structure and energy storage, respectively.


Food Research ◽  
2020 ◽  
Vol 4 (5) ◽  
pp. 1758-1766
Author(s):  
A.R. Putri ◽  
A. Rohman ◽  
W. Setyaningsih ◽  
S. Riyanto

Simple, rapid, and reproducible methods for determining the acid value (AV), peroxide value (PV), and saponification value (SV) of patin fish oil (PFO) were developed using Fourier Transform Infrared (FTIR) spectroscopy combined with chemometrics of Principal Component Regression (PCR) and Partial Least Square (PLS). The relationship between actual values was determined using AOCS method and predicted value was determined with FTIR spectroscopy and chemometrics. From the validation work, the high coefficient of determination (R2 ) reached up to > 0.99. This study concluded that by means of FTIR spectra that combined with PCR and PLS technique can be used to determine AV, PV, and SV of PFO.


2018 ◽  
Vol 101 (2) ◽  
pp. 394-400 ◽  
Author(s):  
Khalid A M Attia ◽  
Nasr M El-Abasawi ◽  
Ahmed El-Olemy ◽  
Ahmed H Abdelazim

Abstract Three UV spectrophotometric methods have been developed for the simultaneous determination of two new Food and Drug Administration-approved drugs, elbasvir (EBV) and grazoprevir (GRV), in their combined pharmaceutical dosage form. These methods include dual wavelength (DW), classic least-squares (CLS), and principal component regression (PCR). To achieve the DW method, two wavelengths were chosen for each drug in a way to ensure the difference in absorbance was zero from one drug to the other. GRV revealed equal absorbance at 351 and 315 nm, for which the distinctions in absorbance were measured for the determination of EBV. In the same way, distinctions in absorbance at 375 and 334.5 nm were measured for the determination of GRV. Alternatively, the CLS and PCR models were applied to the spectra analysis because the synchronous inclusion of many unreal wavelengths rather than using a single wavelength greatly increased the precision and predictive ability of the methods. The proposed methods were successfully applied to the assay of these drugs in their pharmaceutical formulation. The obtained results were statistically compared with manufacturing methods. The results conclude that there was no significant difference between the proposed methods and the manufacturing method with respect to accuracy and precision.


Sign in / Sign up

Export Citation Format

Share Document