The prediction of live weight of hair goats  through penalized regression methods:  LASSO and adaptive LASSO

Abstract. The least absolute selection and shrinkage operator (LASSO) and adaptive LASSO methods have become a popular model in the last decade, especially for data with a multicollinearity problem. This study was conducted to estimate the live weight (LW) of Hair goats from biometric measurements and to select variables in order to reduce the model complexity by using penalized regression methods: LASSO and adaptive LASSO for γ=0.5 and γ=1. The data were obtained from 132 adult goats in Honaz district of Denizli province. Age, gender, forehead width, ear length, head length, chest width, rump height, withers height, back height, chest depth, chest girth, and body length were used as explanatory variables. The adjusted coefficient of determination (Radj2), root mean square error (RMSE), Akaike's information criterion (AIC), Schwarz Bayesian criterion (SBC), and average square error (ASE) were used in order to compare the effectiveness of the methods. It was concluded that adaptive LASSO (γ=1) estimated the LW with the highest accuracy for both male (Radj2=0.9048; RMSE = 3.6250; AIC = 79.2974; SBC = 65.2633; ASE = 7.8843) and female (Radj2=0.7668; RMSE = 4.4069; AIC = 392.5405; SBC = 308.9888; ASE = 18.2193) Hair goats when all the criteria were considered.

Download Full-text

Nonlinear modeling growth body weight of Mangalarga Marchador horses

Ciência Rural ◽

10.1590/0103-8478cr20160636 ◽

2017 ◽

Vol 47 (4) ◽

Cited By ~ 2

Author(s):

Felipe Amorim Caetano Souza ◽

Tales Jesus Fernandes ◽

Raquel Silva de Moura ◽

Sarah Laguna Conceição Meirelles ◽

Rafaela Aparecida Ribeiro ◽

...

Keyword(s):

Body Weight ◽

Linear Models ◽

Nonlinear Modeling ◽

Growth Curves ◽

Live Weight ◽

Information Criterion ◽

Coefficient Of Determination ◽

Cross Sectional ◽

Adult Body Weight ◽

The Cross

ABSTRACT: The analysis of the growth and development of various species has been done using the growth curves of the specific animal based on non-linear models. The objective of the current study was to evaluate the fit of the Brody, Gompertz, Logistic and von Bertalanffy models to the cross-sectional data of the live weight of the MangalargaMarchador horses to identify the best model and make accurate predictions regarding the growth and maturity in the males and females of this breed. The study involved recording the weight of 214 horses, of which 94 were males and 120 were non-pregnant females, between 6 and 153 months of age. The parameters of the model were estimated by employing the method of least squares, using the iteratively regularized Gauss-Newton method and the R software package. Comparison of the models was done based on the following criteria: coefficient of determination (R²); Residual Standard Deviation (RSD); corrected Akaike Information Criterion (AICc). The estimated weight of the adult horses by the models ranged between 431kg and 439kg for males and between 416kg and 420kg for females. The growth curves were studied using the cross-sectional data collection method. For males the von Bertalanffymodel was found to be the most effective in expressing growth, while in females the Brody model was more suitable. The MangalargaMarchador females achieve adult body weight earlier than the males.

Download Full-text

Assessment of Weighted Quantile Sum Regression for Modeling Chemical Mixtures and Cancer Risk

Cancer Informatics ◽

10.4137/cin.s17295 ◽

2015 ◽

Vol 14s2 ◽

pp. CIN.S17295 ◽

Cited By ~ 9

Author(s):

Jenna Czarnota ◽

Chris Gennings ◽

David C. Wheeler

Keyword(s):

Cancer Risk ◽

Strong Association ◽

Body Burden ◽

Penalized Regression ◽

Environmental Chemicals ◽

Adaptive Lasso ◽

Environmental Chemical ◽

Site Specific ◽

Regression Methods ◽

Correlated Components

In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case-control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome.

Download Full-text

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Statistics and Computing ◽

10.1007/s11222-019-09914-9 ◽

2019 ◽

Vol 30 (3) ◽

pp. 697-719 ◽

Cited By ~ 1

Author(s):

Fan Wang ◽

Sach Mukherjee ◽

Sylvia Richardson ◽

Steven M. Hill

Keyword(s):

Variable Selection ◽

Large Scale ◽

Penalized Regression ◽

Adaptive Lasso ◽

High Dimensional ◽

Finite Sample ◽

Dantzig Selector ◽

Regression Methods ◽

High Dimensional Regression ◽

Selection And Ranking

AbstractPenalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users. In this paper, we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 2300 data-generating scenarios, including both synthetic and semisynthetic data (real covariates and simulated responses), allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely used approaches (Lasso, Adaptive Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector and Stability Selection). We find considerable variation in performance between methods. Our results support a “no panacea” view, with no unambiguous winner across all scenarios or goals, even in this restricted setting where all data align well with the assumptions underlying the methods. The study allows us to make some recommendations as to which approaches may be most (or least) suitable given the goal and some data characteristics. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.

Download Full-text

Information Criterion for Nonparametric Model-Assisted Survey Estimators

Journal of Survey Statistics and Methodology ◽

10.1093/jssam/smy015 ◽

2018 ◽

Vol 7 (3) ◽

pp. 398-421

Author(s):

Addison James ◽

Lan Xue ◽

Virginia Lesser

Keyword(s):

Numerical Study ◽

Additive Model ◽

Information Criterion ◽

Model Complexity ◽

Nonparametric Model ◽

Auxiliary Variables ◽

Polynomial Splines ◽

Explanatory Variables ◽

Nonparametric Models ◽

True Model

Abstract Nonparametric model-assisted estimators have been proposed to improve estimates of finite population parameters. Flexible nonparametric models provide more reliable estimators when a parametric model is misspecified. In this article, we propose an information criterion to select appropriate auxiliary variables to use in an additive model-assisted method. We approximate the additive nonparametric components using polynomial splines and extend the Bayesian Information Criterion (BIC) for finite populations. By removing irrelevant auxiliary variables, our method reduces model complexity and decreases estimator variance. We establish that the proposed BIC is asymptotically consistent in selecting the important explanatory variables when the true model is additive without interactions, a result supported by our numerical study. Our proposed method is easier to implement and better justified theoretically than the existing method proposed in the literature.

Download Full-text

Çoklu Doğrusal Bağlantı Durumunda En Küçük Kareler ve Bazı Yanlı Tahmin Edicilerin Karşılaştırılması

Turkish Journal of Agriculture - Food Science and Technology ◽

10.24925/turjaf.v8i3.793-799.3405 ◽

2020 ◽

Vol 8 (3) ◽

pp. 793

Author(s):

Furkan Yılmaz ◽

Lütfi Bayyurt ◽

Samet Hasan Abacı ◽

Yalçın Tahtalı

Keyword(s):

Body Weight ◽

Mean Square Error ◽

Principal Components ◽

Ridge Regression ◽

Live Weight ◽

Body Measurements ◽

Mean Square ◽

Principal Components Regression ◽

Regression Methods ◽

Chest Girth

The aim of this study is to compare the least squares (LS) method that lost its function in the case of multicollinearity in regression methods with Ridge Regression (RR) and Principal Components Regression (PCR) which are bias estimators. For this aim, the effect of some body measurements on body weight (BW), body length (BL), height at withers (HW), height at rump (HR), chest depth (CD), chest girth (CG) and chest width (CW) obtained from 59 Saanen kids at weaning period raised at Research Farm of Tokat Gaziosmanpaşa University. Determination coefficient (R2) and mean square error (MSE) values were used to evaluate the estimation performance of the methods. The multicollinearity between height at withers (HW) and height at rump (HR) which were used to estimate body weight was eliminated by using RR and PCR. When R2 and HKO values of the examined methods are compared; It has been shown that RR method have better results of live weight of Saanen goats.

Download Full-text

FAKTOR-FAKTOR YANG MEMPENGARUHI PENDAPATAN USAHATANI KACANG TANAH DI KECAMATAN SEMAU KABUPATEN KUPANG

Buletin Ilmiah IMPAS ◽

10.35508/impas.v21i01.2607 ◽

2020 ◽

Vol 21 (1) ◽

pp. 25-33

Author(s):

Deni K.L. Mudin ◽

Paulus Un ◽

Lika Bernadina

Keyword(s):

Social Life ◽

Economic Value ◽

Coefficient Of Determination ◽

Labor Costs ◽

Land Area ◽

Average Income ◽

Dry Land ◽

Regression Methods ◽

The Cost ◽

Total Average

ABSTRACT Peanuts are one of the high economic value commodities in the dry land area. This commodity also contributes to the social life of the dry land area. This research has been conducted in Semau Sub-district, Kupang Regency, with the aim to determine the amount of income, break event point (BEP), R / C ratio, efficiency of capital use and factors that affect the income of peanuts farming, with the number of farmer respondents as many as 92 people , simple randomly selected. Data that has been collected by survey, library and interview methods; analyzed quantitatively-descriptive using regression methods. The results showed that the total average income of peanut farming in the study location was IDR 1,739,895 with a total average income of IDR 3,498,261 and a total average cost of IDR 1,758,366. While the break event point average of production is 147 Kg and the break event point price is IDR. 6.509, while for the total average the R / C ratio is 1.99. With factors that affect income are production (X1), seed costs (X2), and labor costs (X3). From the regression results with the Cobb-Douglass function the coefficient of determination (R2) is 0.822 with the meaning that variations in independent variables such as production, seed costs and labor costs explain the dependent variable namely income (Y) of 82.20% and the rest 17.80 % is explained by variables outside of the variables analyzed. From the results of the F test (diversity test) it was found that the factors X1, X2, and X3 had a significant effect on income at ⍺ 1%, then accept H1 at least one of: βi ≠ 0. Whereas the results of the t test (partial test) obtained that factors significant effect on income, namely production (X1) and labor costs (X2), while the cost of seeds (X3) does not significantly affect income.

Download Full-text

The Relationship between Mobility and COVID-19 in Germany: Modeling Case Occurrence using Apple's Mobility Trends Data

Methods of Information in Medicine ◽

10.1055/s-0041-1726276 ◽

2021 ◽

Author(s):

Mark David Walker ◽

Mihály Sulyok

Keyword(s):

Generalized Additive Models ◽

Information Criterion ◽

Additive Models ◽

Online Data ◽

German Government ◽

Explanatory Variables ◽

Mobility Data ◽

Community Mobility ◽

The Relationship ◽

Potential Use

Abstract Background Restrictions on social interaction and movement were implemented by the German government in March 2020 to reduce the transmission of coronavirus disease 2019 (COVID-19). Apple's “Mobility Trends” (AMT) data details levels of community mobility; it is a novel resource of potential use to epidemiologists. Objective The aim of the study is to use AMT data to examine the relationship between mobility and COVID-19 case occurrence for Germany. Is a change in mobility apparent following COVID-19 and the implementation of social restrictions? Is there a relationship between mobility and COVID-19 occurrence in Germany? Methods AMT data illustrates mobility levels throughout the epidemic, allowing the relationship between mobility and disease to be examined. Generalized additive models (GAMs) were established for Germany, with mobility categories, and date, as explanatory variables, and case numbers as response. Results Clear reductions in mobility occurred following the implementation of movement restrictions. There was a negative correlation between mobility and confirmed case numbers. GAM using all three categories of mobility data accounted for case occurrence as well and was favorable (AIC or Akaike Information Criterion: 2504) to models using categories separately (AIC with “driving,” 2511. “transit,” 2513. “walking,” 2508). Conclusion These results suggest an association between mobility and case occurrence. Further examination of the relationship between movement restrictions and COVID-19 transmission may be pertinent. The study shows how new sources of online data can be used to investigate problems in epidemiology.

Download Full-text

Modeling of Lake Malombe Annual Fish Landings and Catch per Unit Effort (CPUE)

Forecasting ◽

10.3390/forecast3010004 ◽

2021 ◽

Vol 3 (1) ◽

pp. 39-55

Author(s):

Rodgers Makwinja ◽

Seyoum Mengistou ◽

Emmanuel Kaunda ◽

Tena Alemiew ◽

Titus Bandulo Phiri ◽

...

Keyword(s):

Time Series ◽

Information Criterion ◽

Coefficient Of Determination ◽

Series Data ◽

Efficiency Coefficient ◽

Arima Models ◽

Catch Per Unit Effort ◽

Annual Fish ◽

Unit Effort ◽

The Mean

Forecasting, using time series data, has become the most relevant and effective tool for fisheries stock assessment. Autoregressive integrated moving average (ARIMA) modeling has been commonly used to predict the general trend for fish landings with increased reliability and precision. In this paper, ARIMA models were applied to predict Lake Malombe annual fish landings and catch per unit effort (CPUE). The annual fish landings and CPUE trends were first observed and both were non-stationary. The first-order differencing was applied to transform the non-stationary data into stationary. Autocorrelation functions (AC), partial autocorrelation function (PAC), Akaike information criterion (AIC), Bayesian information criterion (BIC), square root of the mean square error (RMSE), the mean absolute error (MAE), percentage standard error of prediction (SEP), average relative variance (ARV), Gaussian maximum likelihood estimation (GMLE) algorithm, efficiency coefficient (E2), coefficient of determination (R2), and persistent index (PI) were estimated, which led to the identification and construction of ARIMA models, suitable in explaining the time series and forecasting. According to the measures of forecasting accuracy, the best forecasting models for fish landings and CPUE were ARIMA (0,1,1) and ARIMA (0,1,0). These models had the lowest values AIC, BIC, RMSE, MAE, SEP, ARV. The models further displayed the highest values of GMLE, PI, R2, and E2. The “auto. arima ()” command in R version 3.6.3 further displayed ARIMA (0,1,1) and ARIMA (0,1,0) as the best. The selected models satisfactorily forecasted the fish landings of 2725.243 metric tons and CPUE of 0.097 kg/h by 2024.

Download Full-text

Multiple structural breaks in cointegrating regressions: a model selection approach

Studies in Nonlinear Dynamics & Econometrics ◽

10.1515/snde-2020-0063 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Alexander Schmidt ◽

Karsten Schweikert

Keyword(s):

Structural Breaks ◽

Penalized Regression ◽

Adaptive Lasso ◽

Test Statistics ◽

New Approach ◽

Multiple Structural Breaks ◽

Long Run ◽

Model Selection Approach ◽

Regression Techniques ◽

Lasso Estimator

Abstract In this paper, we propose a new approach to model structural change in cointegrating regressions using penalized regression techniques. First, we consider a setting with known breakpoint candidates and show that a modified adaptive lasso estimator can consistently estimate structural breaks in the intercept and slope coefficient of a cointegrating regression. Second, we extend our approach to a diverging number of breakpoint candidates and provide simulation evidence that timing and magnitude of structural breaks are consistently estimated. Third, we use the adaptive lasso estimation to design new tests for cointegration in the presence of multiple structural breaks, derive the asymptotic distribution of our test statistics and show that the proposed tests have power against the null of no cointegration. Finally, we use our new methodology to study the effects of structural breaks on the long-run PPP relationship.

Download Full-text

The use of the nonlinear models in the growth of pears of ‘Shinseiki’ cultivar

Ciência Rural ◽

10.1590/0103-8478cr20161097 ◽

2017 ◽

Vol 48 (1) ◽

Cited By ~ 4

Author(s):

Thais Destefani Ribeiro ◽

Taciana Villela Savian ◽

Tales Jesus Fernandes ◽

Joel Augusto Muniz

Keyword(s):

Logistic Model ◽

Least Squares Method ◽

Nonlinear Models ◽

Logistic Models ◽

Information Criterion ◽

Fruit Weight ◽

Coefficient Of Determination ◽

Pear Fruit ◽

Asian Pear ◽

Residual Correlations

ABSTRACT: The goal of this study was to elucidate the growth and development of the Asian pear fruit, on the grounds of length, diameter and fresh weight determined over time, using the non-linear Gompertz and Logistic models. The specifications of the models were assessed utilizing the R statistical software, via the least squares method and iterative Gauss-Newton process (DRAPER & SMITH, 2014). The residual standard deviation, adjusted coefficient of determination and the Akaike information criterion were used to compare the models. The residual correlations, observed in the data for length and diameter, were modeled using the second-order regression process to render the residuals independent. The logistic model was highly suitable in demonstrating the data, revealing the Asian pear fruit growth to be sigmoid in shape, showing remarkable development for three variables. It showed an average of up to 125 days for length and diameter and 140 days for fresh fruit weight, with values of 72mm length, 80mm diameter and 224g heavy fat.

Download Full-text