scholarly journals A New Mixed Estimator in Nonparametric Regression for Longitudinal Data

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Made Ayu Dwi Octavanny ◽  
I Nyoman Budiantara ◽  
Heri Kuswanto ◽  
Dyah Putri Rahmawati

We introduce a new method for estimating the nonparametric regression curve for longitudinal data. This method combines two estimators: truncated spline and Fourier series. This estimation is completed by minimizing the penalized weighted least squares and weighted least squares. This paper also provides the properties of the new mixed estimator, which are biased and linear in the observations. The best model is selected using the smallest value of generalized cross-validation. The performance of the new method is demonstrated by a simulation study with a variety of time points. Then, the proposed approach is applied to a stroke patient dataset. The results show that simulated data and real data yield consistent findings.

2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Ni Putu Ayu Mirah Mariati ◽  
I. Nyoman Budiantara ◽  
Vita Ratnasari

So far, most of the researchers developed one type of estimator in nonparametric regression. But in reality, in daily life, data with mixed patterns were often encountered, especially data patterns which partly changed at certain subintervals, and some others followed a recurring pattern in a certain trend. The estimator method used for the data pattern was a mixed estimator method of smoothing spline and Fourier series. This regression model was approached by the component smoothing spline and Fourier series. From this process, the mixed estimator was completed using two estimation stages. The first stage was the estimation with penalized least squares (PLS), and the second stage was the estimation with least squares (LS). Those estimators were then implemented using simulated data. The simulated data were gained by generating two different functions, namely, polynomial and trigonometric functions with the size of the sample being 100. The whole process was then repeated 50 times. The experiment of the two functions was modeled using a mixture of the smoothing spline and Fourier series estimators with various smoothing and oscillation parameters. The generalized cross validation (GCV) minimum was selected as the best model. The simulation results showed that the mixed estimators gave a minimum (GCV) value of 11.98. From the minimum GCV results, it was obtained that the mean square error (MSE) was 0.71 and R2 was 99.48%. So, the results obtained indicated that the model was good for a mixture estimator of smoothing spline and Fourier series.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Made Ayu Dwi Octavanny ◽  
I. Nyoman Budiantara ◽  
Heri Kuswanto ◽  
Dyah Putri Rahmawati

Existing literature in nonparametric regression has established a model that only applies one estimator to all predictors. This study is aimed at developing a mixed truncated spline and Fourier series model in nonparametric regression for longitudinal data. The mixed estimator is obtained by solving the two-stage estimation, consisting of a penalized weighted least square (PWLS) and weighted least square (WLS) optimization. To demonstrate the performance of the proposed method, simulation and real data are provided. The results of the simulated data and case study show a consistent finding.


Author(s):  
Parisa Torkaman

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Camilo Broc ◽  
Therese Truong ◽  
Benoit Liquet

Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.


Geophysics ◽  
2018 ◽  
Vol 83 (6) ◽  
pp. V345-V357 ◽  
Author(s):  
Nasser Kazemi

Given the noise-corrupted seismic recordings, blind deconvolution simultaneously solves for the reflectivity series and the wavelet. Blind deconvolution can be formulated as a fully perturbed linear regression model and solved by the total least-squares (TLS) algorithm. However, this algorithm performs poorly when the data matrix is a structured matrix and ill-conditioned. In blind deconvolution, the data matrix has a Toeplitz structure and is ill-conditioned. Accordingly, we develop a fully automatic single-channel blind-deconvolution algorithm to improve the performance of the TLS method. The proposed algorithm, called Toeplitz-structured sparse TLS, has no assumptions about the phase of the wavelet. However, it assumes that the reflectivity series is sparse. In addition, to reduce the model space and the number of unknowns, the algorithm benefits from the structural constraints on the data matrix. Our algorithm is an alternating minimization method and uses a generalized cross validation function to define the optimum regularization parameter automatically. Because the generalized cross validation function does not require any prior information about the noise level of the data, our approach is suitable for real-world applications. We validate the proposed technique using synthetic examples. In noise-free data, we achieve a near-optimal recovery of the wavelet and the reflectivity series. For noise-corrupted data with a moderate signal-to-noise ratio (S/N), we found that the algorithm successfully accounts for the noise in its model, resulting in a satisfactory performance. However, the results deteriorate as the S/N and the sparsity level of the data are decreased. We also successfully apply the algorithm to real data. The real-data examples come from 2D and 3D data sets of the Teapot Dome seismic survey.


Agriculture ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 1129
Author(s):  
Yiping Peng ◽  
Lu Wang ◽  
Li Zhao ◽  
Zhenhua Liu ◽  
Chenjie Lin ◽  
...  

Soil nutrients play a vital role in plant growth and thus the rapid acquisition of soil nutrient content is of great significance for agricultural sustainable development. Hyperspectral remote-sensing techniques allow for the quick monitoring of soil nutrients. However, at present, obtaining accurate estimates proves to be difficult due to the weak spectral features of soil nutrients and the low accuracy of soil nutrient estimation models. This study proposed a new method to improve soil nutrient estimation. Firstly, for obtaining characteristic variables, we employed partial least squares regression (PLSR) fit degree to select an optimal screening algorithm from three algorithms (Pearson correlation coefficient, PCC; least absolute shrinkage and selection operator, LASSO; and gradient boosting decision tree, GBDT). Secondly, linear (multi-linear regression, MLR; ridge regression, RR) and nonlinear (support vector machine, SVM; and back propagation neural network with genetic algorithm optimization, GABP) algorithms with 10-fold cross-validation were implemented to determine the most accurate model for estimating soil total nitrogen (TN), total phosphorus (TP), and total potassium (TK) contents. Finally, the new method was used to map the soil TK content at a regional scale using the soil component spectral variables retrieved by the fully constrained least squares (FCLS) method based on an image from the HuanJing-1A Hyperspectral Imager (HJ-1A HSI) of the Conghua District of Guangzhou, China. The results identified the GBDT-GABP was observed as the most accurate estimation method of soil TN ( of 0.69, the root mean square error of cross-validation (RMSECV) of 0.35 g kg−1 and ratio of performance to interquartile range (RPIQ) of 2.03) and TP ( of 0.73, RMSECV of 0.30 g kg−1 and RPIQ = 2.10), and the LASSO-GABP proved to be optimal for soil TK estimations ( of 0.82, RMSECV of 3.39 g kg−1 and RPIQ = 3.57). Additionally, the highly accurate LASSO-GABP-estimated soil TK (R2 = 0.79) reveals the feasibility of the LASSO-GABP method to retrieve soil TK content at the regional scale.


1992 ◽  
Vol 288 (2) ◽  
pp. 533-538 ◽  
Author(s):  
M E Jones

An algorithm for the least-squares estimation of enzyme parameters Km and Vmax. is proposed and its performance analysed. The problem is non-linear, but the algorithm is algebraic and does not require initial parameter estimates. On a spreadsheet program such as MINITAB, it may be coded in as few as ten instructions. The algorithm derives an intermediate estimate of Km and Vmax. appropriate to data with a constant coefficient of variation and then applies a single reweighting. Its performance using simulated data with a variety of error structures is compared with that of the classical reciprocal transforms and to both appropriately and inappropriately weighted direct least-squares estimators. Three approaches to estimating the standard errors of the parameter estimates are discussed, and one suitable for spreadsheet implementation is illustrated.


2019 ◽  
Vol 2019 ◽  
pp. 1-7
Author(s):  
Abdelmounaim Kerkri ◽  
Jelloul Allal ◽  
Zoubir Zarrouk

Partial least squares (PLS) regression is an alternative to the ordinary least squares (OLS) regression, used in the presence of multicollinearity. As with any other modelling method, PLS regression requires a reliable model selection tool. Cross validation (CV) is the most commonly used tool with many advantages in both preciseness and accuracy, but it also has some drawbacks; therefore, we will use L-curve criterion as an alternative, given that it takes into consideration the shrinking nature of PLS. A theoretical justification for the use of L-curve criterion is presented as well as an application on both simulated and real data. The application shows how this criterion generally outperforms cross validation and generalized cross validation (GCV) in mean squared prediction error and computational efficiency.


2002 ◽  
Vol 56 (5) ◽  
pp. 615-624 ◽  
Author(s):  
David K. Melgaard ◽  
David M. Haaland ◽  
Christine M. Wehlburg

A significant extension to the classical least-squares (CLS) algorithm called concentration residual augmented CLS (CRACLS) has been developed. Previously, unmodeled sources of spectral variation have rendered CLS models ineffective for most types of problems, but with the new CRACLS algorithm, CLS-type models can be applied to a significantly wider range of applications. This new quantitative multivariate spectral analysis algorithm iteratively augments the calibration matrix of reference concentrations with concentration residuals estimated during CLS prediction. Because these residuals represent linear combinations of the unmodeled spectrally active component concentrations, the effects of these components are removed from the calibration of the analytes of interest. This iterative process allows the development of a CLS-type calibration model comparable in prediction ability to implicit multivariate calibration methods such as partial least squares (PLS) even when unmodeled spectrally active components are present in the calibration sample spectra. In addition, CRACLS retains the improved qualitative spectral information of the CLS algorithm relative to PLS. More importantly, CRACLS provides a model compatible with the recently presented prediction-augmented CLS (PACLS) method. The CRACLS/PACLS combination generates an adaptable model that can achieve excellent prediction ability for samples of unknown composition that contain unmodeled sources of spectral variation. The CRACLS algorithm is demonstrated with both simulated and real data derived from a system of dilute aqueous solutions containing glucose, ethanol, and urea. The simulated data demonstrate the effectiveness of the new algorithm and help elucidate the principles behind the method. Using experimental data, we compare the prediction abilities of CRACLS and PLS during cross-validated calibration. In combination with PACLS, the CRACLS predictions are comparable to PLS for the prediction of the glucose, ethanol, and urea components for validation samples collected when significant instrument drift was present. However, the PLS predictions required recalibration using nonstandard cross-validated rotations while CRACLS/PACLS was rapidly updated during prediction without the need for time-consuming cross-validated recalibration. The CRACLS/PACLS algorithm provides a more general approach to removing the detrimental effects of unmodeled components.


Author(s):  
Shaohua Zhu ◽  
Tingting Guo ◽  
Chao Yuan ◽  
Jianbin Liu ◽  
Jianye Li ◽  
...  

ABSTRACT The marker density, the heritability level of trait and the statistical models adopted are critical to the accuracy of genomic prediction (GP) or selection (GS). If the potential of GP is to be fully utilized to optimize the effect of breeding and selection, in addition to incorporating the above factors into simulated data for analysis, it is essential to incorporate these factors into real data for understanding their impact on GP accuracy, more clearly and intuitively. Herein, we studied the genomic prediction of six wool traits of sheep by two different models, including Bayesian Alphabet (BayesA, BayesB, BayesC π and Bayesian LASSO) and genomic best linear unbiased prediction (GBLUP). We adopted 5-fold cross-validation to perform the accuracy evaluation based on the genotyping data of Alpine Merino sheep (n = 821). The main aim was to study the influence and interaction of different models and marker densities on GP accuracy. The GP accuracy of the six traits was found to be between 0.28 and 0.60, as demonstrated by the cross-validation results. We showed that the accuracy of GP could be improved by increasing the marker density, which is closely related to the model adopted and the heritability level of the trait. Moreover, based on two different marker densities, it was derived that the prediction effect of GBLUP model for traits with low heritability was better; while with the increase of heritability level, the advantage of Bayesian Alphabet would be more obvious, therefore, different models of GP are appropriate in different traits. These findings indicated the significance of applying appropriate models for GP which would assist in further exploring the optimization of GP.


Sign in / Sign up

Export Citation Format

Share Document