scholarly journals WEIGHTED CROSS VALIDATION IN THE SELECTION OF ROBUST REGRESSION MODEL WITH CHANGE-POINT FOR TELEVISION RATING FORECAST

Author(s):  
Carlos Alberto Huaira Contreras ◽  
Carlos Cristiano Hasenclever Borges ◽  
Camila Borelli Zeller ◽  
Amanda Romanelli

The paper proposes a weighted cross-validation (WCV) algorithm  to select a linear regression model with change-point under a scale mixtures of normal (SMN) distribution that yields the best prediction results. SMN distributions are used to construct robust regression models to the influence of outliers on the parameter estimation process. Thus, we relaxed the usual assumption of normality of the regression models and considered that the random errors follow a SMN distribution, specifically the Student-t distribution. In addition, we consider the fact that the parameters of the regression model can change from a specific and unknown point, called change-point. In this context, the estimations of the model parameters, which include the change-point, are obtained via the EM-type algorithm (Expectation-Maximization). The WCV method is used in the selection of the model that presents greater robustness and that offers a smaller prediction error, considering that the weighting values come from step E of the EM-type algorithm. Finally, numerical examples considering simulated and real data (data from television audiences) are presented to illustrate the proposed methodology.

2017 ◽  
Vol 47 (5) ◽  
Author(s):  
Priscila Becker Ferreira ◽  
Paulo Roberto Nogara Rorato ◽  
Fernanda Cristina Breda ◽  
Vanessa Tomazetti Michelotti ◽  
Alexandre Pires Rosa ◽  
...  

ABSTRACT: This study aimed to test different genotypic and residual covariance matrix structures in random regression models to model the egg production of Barred Plymouth Rock and White Plymouth Rock hens aged between 5 and 12 months. In addition, we estimated broad-sense heritability, and environmental and genotypic correlations. Six random regression models were evaluated, and for each model, 12 genotypic and residual matrix structures were tested. The random regression model with linear intercept and unstructured covariance (UN) for a matrix of random effects and unstructured correlation (UNR) for residual matrix adequately model the egg production curve of hens of the two study breeds. Genotypic correlations ranged from 0.15 (between age of 5 and 12 months) to 0.99 (between age of 10 and 11 months) and increased based on the time elapsed. Egg production heritability between 5- and 12-month-old hens increased with age, varying from 0.15 to 0.51. From the age of 9 months onward, heritability was moderate with estimates of genotypic correlations higher than 90% at the age of 10, 11, and 12 months. Results suggested that selection of hens to improve egg production should commence at the ninth month of age.


2019 ◽  
Vol 11 (01n02) ◽  
pp. 1950003
Author(s):  
Fábio Prataviera ◽  
Gauss M. Cordeiro ◽  
Edwin M. M. Ortega ◽  
Adriano K. Suzuki

In several applications, the distribution of the data is frequently unimodal, asymmetric or bimodal. The regression models commonly used for applications to data with real support are the normal, skew normal, beta normal and gamma normal, among others. We define a new regression model based on the odd log-logistic geometric normal distribution for modeling asymmetric or bimodal data with support in [Formula: see text], which generalizes some known regression models including the widely known heteroscedastic linear regression. We adopt the maximum likelihood method for estimating the model parameters and define diagnostic measures to detect influential observations. For some parameter settings, sample sizes and different systematic structures, various simulations are performed to verify the adequacy of the estimators of the model parameters. The empirical distribution of the quantile residuals is investigated and compared with the standard normal distribution. We prove empirically the usefulness of the proposed models by means of three applications to real data.


2016 ◽  
Vol 5 (3) ◽  
pp. 9 ◽  
Author(s):  
Elizabeth M. Hashimoto ◽  
Gauss M. Cordeiro ◽  
Edwin M.M. Ortega ◽  
G.G. Hamedani

We propose and study a new log-gamma Weibull regression model. We obtain explicit expressions for the raw and incomplete moments, quantile and generating functions and mean deviations of the log-gamma Weibull distribution. We demonstrate that the new regression model can be applied to censored data since it represents a parametric family of models which includes as sub-models several widely-known regression models and therefore can be used more effectively in the analysis of survival data. We obtain the maximum likelihood estimates of the model parameters by considering censored data and evaluate local influence on the estimates of the parameters by taking different perturbation schemes. Some global-influence measurements are also investigated. Further, for different parameter settings, sample sizes and censoring percentages, various simulations are performed. In addition, the empirical distribution of some modified residuals are displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be extended to a modified deviance residual in the proposed regression model applied to censored data. We demonstrate that our extended regression model is very useful to the analysis of real data and may give more realistic fits than other special regression models. 


Author(s):  
JING-RUNG YU ◽  
GWO-HSHIUNG TZENG ◽  
HAN-LIN LI

To handle large variation data, an interval piecewise regression method with automatic change-point detection by quadratic programming is proposed as an alternative to Tanaka and Lee's method. Their unified quadratic programming approach can alleviate the phenomenon where some coefficients tend to become crisp in possibilistic regression by linear programming and also obtain the possibility and necessity models at one time. However, that method can not guarantee the existence of a necessity model if a proper regression model is not assumed especially with large variations in data. Using automatic change-point detection, the proposed method guarantees obtaining the necessity model with better measure of fitness by considering variability in data. Without piecewise terms in estimated model, the proposed method is the same as Tanaka and Lee's model. Therefore, the proposed method is an alternative method to handle data with the large variations, which not only reduces the number of crisp coefficients of the possibility model in linear programming, but also simultaneously obtains the fuzzy regression models, including possibility and necessity models with better fitness. Two examples are presented to demonstrate the proposed method.


2005 ◽  
Vol 30 (2) ◽  
pp. 169-187 ◽  
Author(s):  
David Kaplan

This article considers the problem of estimating dynamic linear regression models when the data are generated from finite mixture probability density function where the mixture components are characterized by different dynamic regression model parameters. Specifically, conventional linear models assume that the data are generated by a single probability density function characterized by a single set of regression model parameters. However, when the true generating model is finite mixture density function, then estimation of conventional linear models under the assumption of a single density function may lead to erroneous conclusions. Instead, it may be desirable to estimate the regression model under the assumption that the data are derived from a finite mixture density function and to examine differences in the parameters of the model within each mixture component. Dynamic regression models and subsequent dynamic response analysis using dynamic multipliers are also likely to be affected by the existence of a finite mixture density because dynamic multipliers are functions of the regression model parameters. Utilizing finite mixture modeling applied to two real data examples, this article shows that dynamic responses to changes in exogenous variables can be quite different depending on the number and nature of underlying mixture components. Implications for substantive conclusions based on the use of dynamic multipliers is discussed.


2021 ◽  
Author(s):  
Jose Pina-Sánchez ◽  
David Buil-Gil ◽  
ian brunton-smith ◽  
Alexandru Cernat

Objectives: Assess the extent to which measurement error in police recorded crime rates impact the estimates of regression models exploring the causes and consequences of crime.Methods: We focus on linear models where crime rates are included either as the response or as an explanatory variable, in their original scale, or log-transformed. Two measurement error mechanisms are considered, systematic errors in the form of under-recorded crime, and random errors in the form of recording inconsistencies across areas. The extent to which such measurement error mechanisms impact model parameters is demonstrated algebraically, using formal notation, and graphically, using simulations.Results: Most coefficients and measures of uncertainty from models where crime rates are included in their original scale are severely biased. However, in many cases, this problem could be minimised, or altogether eliminated by log-transforming crime rates. This transforms the multiplicative measurement error observed in police recorded crime rates into a less harmful additive mechanism.Conclusions: The validity of findings from regression models where police recorded crime rates are used in their original scale is put into question. In interpreting the large evidence base exploring the effects and consequences of crime using police statistics we urge researchers to consider the biasing effects shown here. Equally, we urge researchers to log-transform crime rates before they are introduced in statistical models.


1989 ◽  
Vol 19 (2) ◽  
pp. 179-184 ◽  
Author(s):  
David L. Verbyla ◽  
Richard F. Fisher

The conventional approach in site-quality studies has been to develop a multiple regression site index model with soil–site measurements from randomly selected plots. This approach has several weaknessess: (i) a potential prediction bias associated with most stepwise regression procedures; (ii) low precision of soil–site regression models developed in areas with diverse topography and geologic formations; and (iii) poor representation of rare prime sites by random sampling. An alternative approach, aimed at minimizing these problems, is presented. Prediction bias potential (due to overfitting a model with too many predictor variables) can be reduced by using cross validation during model development. Models that accurately predict prime sites can be more useful than imprecise soil–site regression models. This can be accomplished by stratified random sampling from prime and nonprime site areas. Classification-tree analysis was used to develop a model that predicts prime ponderosa pine (Pinusponderosa Laws.) sites on the basis of vegetation and soil variables. Forest habitat type, percent sand content, and soil pH were model predictor variables. Cross-validation was used to estimate the accuracy of the classification tree as 88%. A multiple regression model developed from randomly selected plots consistently underestimated site index when it was applied to plots randomly selected from prime site areas. The conventional regression model was also misleading because it contained a predictor variable that was not significantly different between prime and nonprime sites.


1976 ◽  
Vol 1 (3) ◽  
pp. 253-277 ◽  
Author(s):  
Herbert J. Walberg ◽  
Sue Pinzur Rasher

This paper illustrates cut-and-try techniques that point to appropriate transformations of variables and to the selection of sets of variables for an equation that may improve understanding of a social process. The substance of the research reported — the relation of mental test results to state population, cultural, and school resource indexes (Walberg and Rasher, 1974) — illustrates typical problems of behavioral data: multi-colinearity, outliers, abnormal distributions, and the lack of a consensually-validated, explicit theoretical model. Despite these problems, data originally collected for purposes other than the investigator’s may yield tentative confirmations or cautions about prior findings and provisional indications for theory or policy; such inferences may be at least partially checked by cross-validation on independent or semi-independent sets of data. After discussing the sequence of analyses and the results, we conclude by mentioning a number of uncertainties and reservations about drawing substantive or policy implications.


Author(s):  
Yoshiyuki Yabuuchi ◽  
◽  
Junzo Watada ◽  

Since management and economic systems are complex, it is hard to handle data obtained in management and economic areas. Hitherto, in the fields, much research has focused on the structure and analysis of such data. H. Tanaka et al. proposed a fuzzy regression model to illustrate the potential possibilities inherent in the target system. J. C. Bezdek proposed a switching regression model based on a fuzzy clustering model to separate mixed samples coming from plural latent systems and apply regression models to the groups of samples coming from each system. It is hard to illustrate a rough and moderate possibility of the target system. In this paper, to deal with the possibility of a social system, we propose a new fuzzy robust regression model.


Sign in / Sign up

Export Citation Format

Share Document