scholarly journals The prediction of live weight of hair goats through penalized regression methods: LASSO and adaptive LASSO

2018 ◽  
Vol 61 (4) ◽  
pp. 451-458
Author(s):  
Suna Akkol

Abstract. The least absolute selection and shrinkage operator (LASSO) and adaptive LASSO methods have become a popular model in the last decade, especially for data with a multicollinearity problem. This study was conducted to estimate the live weight (LW) of Hair goats from biometric measurements and to select variables in order to reduce the model complexity by using penalized regression methods: LASSO and adaptive LASSO for γ=0.5 and γ=1. The data were obtained from 132 adult goats in Honaz district of Denizli province. Age, gender, forehead width, ear length, head length, chest width, rump height, withers height, back height, chest depth, chest girth, and body length were used as explanatory variables. The adjusted coefficient of determination (Radj2), root mean square error (RMSE), Akaike's information criterion (AIC), Schwarz Bayesian criterion (SBC), and average square error (ASE) were used in order to compare the effectiveness of the methods. It was concluded that adaptive LASSO (γ=1) estimated the LW with the highest accuracy for both male (Radj2=0.9048; RMSE = 3.6250; AIC = 79.2974; SBC = 65.2633; ASE = 7.8843) and female (Radj2=0.7668; RMSE = 4.4069; AIC = 392.5405; SBC = 308.9888; ASE = 18.2193) Hair goats when all the criteria were considered.

2017 ◽  
Vol 47 (4) ◽  
Author(s):  
Felipe Amorim Caetano Souza ◽  
Tales Jesus Fernandes ◽  
Raquel Silva de Moura ◽  
Sarah Laguna Conceição Meirelles ◽  
Rafaela Aparecida Ribeiro ◽  
...  

ABSTRACT: The analysis of the growth and development of various species has been done using the growth curves of the specific animal based on non-linear models. The objective of the current study was to evaluate the fit of the Brody, Gompertz, Logistic and von Bertalanffy models to the cross-sectional data of the live weight of the MangalargaMarchador horses to identify the best model and make accurate predictions regarding the growth and maturity in the males and females of this breed. The study involved recording the weight of 214 horses, of which 94 were males and 120 were non-pregnant females, between 6 and 153 months of age. The parameters of the model were estimated by employing the method of least squares, using the iteratively regularized Gauss-Newton method and the R software package. Comparison of the models was done based on the following criteria: coefficient of determination (R²); Residual Standard Deviation (RSD); corrected Akaike Information Criterion (AICc). The estimated weight of the adult horses by the models ranged between 431kg and 439kg for males and between 416kg and 420kg for females. The growth curves were studied using the cross-sectional data collection method. For males the von Bertalanffymodel was found to be the most effective in expressing growth, while in females the Brody model was more suitable. The MangalargaMarchador females achieve adult body weight earlier than the males.


2015 ◽  
Vol 14s2 ◽  
pp. CIN.S17295 ◽  
Author(s):  
Jenna Czarnota ◽  
Chris Gennings ◽  
David C. Wheeler

In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case-control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome.


2019 ◽  
Vol 30 (3) ◽  
pp. 697-719 ◽  
Author(s):  
Fan Wang ◽  
Sach Mukherjee ◽  
Sylvia Richardson ◽  
Steven M. Hill

AbstractPenalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users. In this paper, we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 2300 data-generating scenarios, including both synthetic and semisynthetic data (real covariates and simulated responses), allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely used approaches (Lasso, Adaptive Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector and Stability Selection). We find considerable variation in performance between methods. Our results support a “no panacea” view, with no unambiguous winner across all scenarios or goals, even in this restricted setting where all data align well with the assumptions underlying the methods. The study allows us to make some recommendations as to which approaches may be most (or least) suitable given the goal and some data characteristics. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.


2018 ◽  
Vol 7 (3) ◽  
pp. 398-421
Author(s):  
Addison James ◽  
Lan Xue ◽  
Virginia Lesser

Abstract Nonparametric model-assisted estimators have been proposed to improve estimates of finite population parameters. Flexible nonparametric models provide more reliable estimators when a parametric model is misspecified. In this article, we propose an information criterion to select appropriate auxiliary variables to use in an additive model-assisted method. We approximate the additive nonparametric components using polynomial splines and extend the Bayesian Information Criterion (BIC) for finite populations. By removing irrelevant auxiliary variables, our method reduces model complexity and decreases estimator variance. We establish that the proposed BIC is asymptotically consistent in selecting the important explanatory variables when the true model is additive without interactions, a result supported by our numerical study. Our proposed method is easier to implement and better justified theoretically than the existing method proposed in the literature.


Author(s):  
Furkan Yılmaz ◽  
Lütfi Bayyurt ◽  
Samet Hasan Abacı ◽  
Yalçın Tahtalı

The aim of this study is to compare the least squares (LS) method that lost its function in the case of multicollinearity in regression methods with Ridge Regression (RR) and Principal Components Regression (PCR) which are bias estimators. For this aim, the effect of some body measurements on body weight (BW), body length (BL), height at withers (HW), height at rump (HR), chest depth (CD), chest girth (CG) and chest width (CW) obtained from 59 Saanen kids at weaning period raised at Research Farm of Tokat Gaziosmanpaşa University. Determination coefficient (R2) and mean square error (MSE) values were used to evaluate the estimation performance of the methods. The multicollinearity between height at withers (HW) and height at rump (HR) which were used to estimate body weight was eliminated by using RR and PCR. When R2 and HKO values of the examined methods are compared; It has been shown that RR method have better results of live weight of Saanen goats.


2020 ◽  
Vol 21 (1) ◽  
pp. 25-33
Author(s):  
Deni K.L. Mudin ◽  
Paulus Un ◽  
Lika Bernadina

ABSTRACT Peanuts are one of the high economic value commodities in the dry land area. This commodity also contributes to the social life of the dry land area. This research has been conducted in Semau Sub-district, Kupang Regency, with the aim to determine the amount of income, break event point (BEP), R / C ratio, efficiency of capital use and factors that affect the income of peanuts farming, with the number of farmer respondents as many as 92 people , simple randomly selected. Data that has been collected by survey, library and interview methods; analyzed quantitatively-descriptive using regression methods. The results showed that the total average income of peanut farming in the study location was IDR 1,739,895 with a total average income of IDR 3,498,261 and a total average cost of IDR 1,758,366. While the break event point average of production is 147 Kg and the break event point price is IDR. 6.509, while for the total average the R / C ratio is 1.99. With factors that affect income are production (X1), seed costs (X2), and labor costs (X3). From the regression results with the Cobb-Douglass function the coefficient of determination (R2) is 0.822 with the meaning that variations in independent variables such as production, seed costs and labor costs explain the dependent variable namely income (Y) of 82.20% and the rest 17.80 % is explained by variables outside of the variables analyzed. From the results of the F test (diversity test) it was found that the factors X1, X2, and X3 had a significant effect on income at ⍺ 1%, then accept H1 at least one of: βi ≠ 0. Whereas the results of the t test (partial test) obtained that factors significant effect on income, namely production (X1) and labor costs (X2), while the cost of seeds (X3) does not significantly affect income.


Author(s):  
Mark David Walker ◽  
Mihály Sulyok

Abstract Background Restrictions on social interaction and movement were implemented by the German government in March 2020 to reduce the transmission of coronavirus disease 2019 (COVID-19). Apple's “Mobility Trends” (AMT) data details levels of community mobility; it is a novel resource of potential use to epidemiologists. Objective The aim of the study is to use AMT data to examine the relationship between mobility and COVID-19 case occurrence for Germany. Is a change in mobility apparent following COVID-19 and the implementation of social restrictions? Is there a relationship between mobility and COVID-19 occurrence in Germany? Methods AMT data illustrates mobility levels throughout the epidemic, allowing the relationship between mobility and disease to be examined. Generalized additive models (GAMs) were established for Germany, with mobility categories, and date, as explanatory variables, and case numbers as response. Results Clear reductions in mobility occurred following the implementation of movement restrictions. There was a negative correlation between mobility and confirmed case numbers. GAM using all three categories of mobility data accounted for case occurrence as well and was favorable (AIC or Akaike Information Criterion: 2504) to models using categories separately (AIC with “driving,” 2511. “transit,” 2513. “walking,” 2508). Conclusion These results suggest an association between mobility and case occurrence. Further examination of the relationship between movement restrictions and COVID-19 transmission may be pertinent. The study shows how new sources of online data can be used to investigate problems in epidemiology.


Forecasting ◽  
2021 ◽  
Vol 3 (1) ◽  
pp. 39-55
Author(s):  
Rodgers Makwinja ◽  
Seyoum Mengistou ◽  
Emmanuel Kaunda ◽  
Tena Alemiew ◽  
Titus Bandulo Phiri ◽  
...  

Forecasting, using time series data, has become the most relevant and effective tool for fisheries stock assessment. Autoregressive integrated moving average (ARIMA) modeling has been commonly used to predict the general trend for fish landings with increased reliability and precision. In this paper, ARIMA models were applied to predict Lake Malombe annual fish landings and catch per unit effort (CPUE). The annual fish landings and CPUE trends were first observed and both were non-stationary. The first-order differencing was applied to transform the non-stationary data into stationary. Autocorrelation functions (AC), partial autocorrelation function (PAC), Akaike information criterion (AIC), Bayesian information criterion (BIC), square root of the mean square error (RMSE), the mean absolute error (MAE), percentage standard error of prediction (SEP), average relative variance (ARV), Gaussian maximum likelihood estimation (GMLE) algorithm, efficiency coefficient (E2), coefficient of determination (R2), and persistent index (PI) were estimated, which led to the identification and construction of ARIMA models, suitable in explaining the time series and forecasting. According to the measures of forecasting accuracy, the best forecasting models for fish landings and CPUE were ARIMA (0,1,1) and ARIMA (0,1,0). These models had the lowest values AIC, BIC, RMSE, MAE, SEP, ARV. The models further displayed the highest values of GMLE, PI, R2, and E2. The “auto. arima ()” command in R version 3.6.3 further displayed ARIMA (0,1,1) and ARIMA (0,1,0) as the best. The selected models satisfactorily forecasted the fish landings of 2725.243 metric tons and CPUE of 0.097 kg/h by 2024.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Alexander Schmidt ◽  
Karsten Schweikert

Abstract In this paper, we propose a new approach to model structural change in cointegrating regressions using penalized regression techniques. First, we consider a setting with known breakpoint candidates and show that a modified adaptive lasso estimator can consistently estimate structural breaks in the intercept and slope coefficient of a cointegrating regression. Second, we extend our approach to a diverging number of breakpoint candidates and provide simulation evidence that timing and magnitude of structural breaks are consistently estimated. Third, we use the adaptive lasso estimation to design new tests for cointegration in the presence of multiple structural breaks, derive the asymptotic distribution of our test statistics and show that the proposed tests have power against the null of no cointegration. Finally, we use our new methodology to study the effects of structural breaks on the long-run PPP relationship.


2017 ◽  
Vol 48 (1) ◽  
Author(s):  
Thais Destefani Ribeiro ◽  
Taciana Villela Savian ◽  
Tales Jesus Fernandes ◽  
Joel Augusto Muniz

ABSTRACT: The goal of this study was to elucidate the growth and development of the Asian pear fruit, on the grounds of length, diameter and fresh weight determined over time, using the non-linear Gompertz and Logistic models. The specifications of the models were assessed utilizing the R statistical software, via the least squares method and iterative Gauss-Newton process (DRAPER & SMITH, 2014). The residual standard deviation, adjusted coefficient of determination and the Akaike information criterion were used to compare the models. The residual correlations, observed in the data for length and diameter, were modeled using the second-order regression process to render the residuals independent. The logistic model was highly suitable in demonstrating the data, revealing the Asian pear fruit growth to be sigmoid in shape, showing remarkable development for three variables. It showed an average of up to 125 days for length and diameter and 140 days for fresh fruit weight, with values of 72mm length, 80mm diameter and 224g heavy fat.


Sign in / Sign up

Export Citation Format

Share Document