variable subset
Recently Published Documents


TOTAL DOCUMENTS

29
(FIVE YEARS 5)

H-INDEX

9
(FIVE YEARS 1)

Author(s):  
Pablo Roman Duchowicz ◽  
Silvina Fioressi ◽  
Gustavo Romanelli ◽  
Daniel E. Bacelo

This work applied the quantitative structure-activity relationships (QSAR) theory to predict the inhibitory activity exhibited by 40 unsymmetrical aromatic disulfide compounds against the SARS-CoV main protease. Different freely available molecular descriptor programs provided 67,116 independent non-conformational molecular descriptors. This great number of descriptors contained multidimensional representations of the chemical structure and was analyzed through multivariable linear regressions and the replacement method variable subset selection technique. The developed QSAR model achieved an acceptable statistical quality and provided a prospective guide that was considered useful for predicting the inhibitory activity of structurally-related aromatic disulfide compounds on the SARS-CoV main protease.


Symmetry ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 115
Author(s):  
Hong Ji ◽  
Wanzhang Wang ◽  
Dongfeng Chong ◽  
Boyang Zhang

To rapidly detect the wheat moisture content (WMC) without harm to the wheat and before harvest, this paper measured wheat and panicle moisture content (PMC) and the corresponding spectral reflectance of panicle before harvest at the Beijing Tongzhou experimental station of China Agricultural University. Firstly, we used correlation analysis to determine the optimal regression model of WMC and PMC. Secondly, we derived the spectral sensitive band of PMC before filtering the redundant variables competitive adaptive reweighted sampling (CARS) to select the variable subset with the least error. Finally, partial least squares regression (PLSR) was used to build and analyze the prediction model of PMC. At the early stage of wheat harvest, a high correlation existed between WMC and PMC. Among all regression models such as exponential, univariate linear, polynomial models, and the power function regression model, the logarithm regression model was the best. The determination coefficients of the modeling sample were: R2 = 0.9284, the significance F = 362.957, the determination coefficient of calibration sample R2v = 0.987, the root mean square error RMSEv = 3.859, and the relative error REv = 7.532. Within the range of 350–2500 nm, bands of 728–907 nm, 1407–1809 nm, and 1940–2459 nm had a correlation coefficient of PMC and wavelength reflectivity higher than 0.6. This paper used the CARS algorithm to optimize the variables and obtained the best variable subset, which included 30 wavelength variables. The PLSR model was established based on 30 variables optimized by the CARS algorithm. Compared with the all-sensitive band, which had 1103 variables, the PLSR model not only reduced the number of variables by 1073, but also had a higher accuracy in terms of prediction. The results showed that: RMSEC = 0.9301, R2c = 0.995, RMSEP = 2.676, R2p = 0.945, and RPD = 3.362, indicating that the CARS algorithm could effectively remove the variables of spectral redundant information. The CARS algorithm provided a new way of thinking for the non-destructive and rapid detection of WMC before harvest.


2020 ◽  
Vol 93 (3) ◽  
Author(s):  
Marc Hofmann ◽  
Cristian Gatu ◽  
Erricos J. Kontoghiorghes ◽  
Ana Colubi ◽  
Achim Zeileis

Molecules ◽  
2019 ◽  
Vol 24 (11) ◽  
pp. 2134 ◽  
Author(s):  
Hui Jiang ◽  
Quansheng Chen

This work applied the FT-NIR spectroscopy technique with the aid of chemometrics algorithms to determine the adulteration content of extra virgin olive oil (EVOO). Informative spectral wavenumbers were obtained by the use of a novel variable selection algorithm of bootstrapping soft shrinkage (BOSS) during partial least-squares (PLS) modeling. Then, a PLS model was finally constructed using the best variable subset obtained by the BOSS algorithm to quantitative determine doping concentrations in EVOO. The results showed that the optimal variable subset including 15 wavenumbers was selected by the BOSS algorithm in the full-spectrum region according to the first local lowest value of the root-mean-square error of cross validation (RMSECV), which was 1.4487 % v/v. Compared with the optimal models of full-spectrum PLS, competitive adaptive reweighted sampling PLS (CARS–PLS), Monte Carlo uninformative variable elimination PLS (MCUVE–PLS), and iteratively retaining informative variables PLS (IRIV–PLS), the BOSS–PLS model achieved better results, with the coefficient of determination (R2) of prediction being 0.9922, and the root-mean-square error of prediction (RMSEP) being 1.4889 % v/v in the prediction process. The results obtained indicated that the FT-NIR spectroscopy technique has the potential to perform a rapid quantitative analysis of the adulteration content of EVOO, and the BOSS algorithm showed its superiority in informative wavenumbers selection.


2018 ◽  
Author(s):  
Jo Cutler ◽  
Joaquim Radua ◽  
Daniel Campbell-Meiklejohn

Meta-analyses of fMRI studies are vital to establish consistent findings across the literature. However, fMRI data are susceptible to signal dropout (i.e. incomplete brain coverage), which varies across studies and brain regions. In other words, for some brain regions, only a variable subset of the studies included in an fMRI meta-analysis have data present. These missing data can mean activations in fMRI meta-analysis are underestimated (type II errors). Here we present SPM (MATLAB) code to run a novel method of adjusting random-effects models for meta-analytic averaging of a group of studies and mixed-effects models for comparison between two groups of studies. In two separate datasets, meta-analytic effect sizes and z-scores were larger in the adjusted, compared to the unadjusted analysis. Relevantly, these changes were in regions such as the ventromedial prefrontal cortex where coverage was lowest. Limitations of the method, including issues of how to threshold the adjusted maps are discussed. Code and demonstration data for the adjusted method are available at https://doi.org/10.25377/sussex.c.4223411.


2018 ◽  
Vol 72 (5) ◽  
pp. 740-749 ◽  
Author(s):  
Rongnian Tang ◽  
Xupeng Chen ◽  
Chuang Li

Near-infrared spectroscopy is an efficient, low-cost technology that has potential as an accurate method in detecting the nitrogen content of natural rubber leaves. Successive projections algorithm (SPA) is a widely used variable selection method for multivariate calibration, which uses projection operations to select a variable subset with minimum multi-collinearity. However, due to the fluctuation of correlation between variables, high collinearity may still exist in non-adjacent variables of subset obtained by basic SPA. Based on analysis to the correlation matrix of the spectra data, this paper proposed a correlation-based SPA (CB-SPA) to apply the successive projections algorithm in regions with consistent correlation. The result shows that CB-SPA can select variable subsets with more valuable variables and less multi-collinearity. Meanwhile, models established by the CB-SPA subset outperform basic SPA subsets in predicting nitrogen content in terms of both cross-validation and external prediction. Moreover, CB-SPA is assured to be more efficient, for the time cost in its selection procedure is one-twelfth that of the basic SPA.


Author(s):  
Cristian Rojas ◽  
Piercosimo Tripaldi ◽  
Pablo R. Duchowicz

The aim of this work was to develop predictive structure-property relationships (QSPR) of natural and synthetic sweeteners in order to predict and model relative sweetness (RS). The data set was composed of 233 sweeteners collected from diverse sources in the literature, which was divided into training (163) and test (70) molecules according to a procedure based on k-means cluster analysis. A total of 3763 non-conformational Dragon molecular descriptors were calculated which were simultaneously analyzed through multivariable linear regression analysis coupled with the replacement method variable subset selection technique. The established six-parameter model was validated through the cross-validation techniques, together with Y-randomization and applicability domain analysis. The results for the training set and the test set showed that the non-conformational descriptors offer relevant information for modeling the RS of a compound. Thus, this model can be used to predict the sweetness of both un-evaluated and un-synthesized sweeteners.


Sign in / Sign up

Export Citation Format

Share Document