WEIGHTED CROSS VALIDATION IN THE SELECTION OF ROBUST REGRESSION MODEL WITH CHANGE-POINT FOR TELEVISION RATING FORECAST

The paper proposes a weighted cross-validation (WCV) algorithm to select a linear regression model with change-point under a scale mixtures of normal (SMN) distribution that yields the best prediction results. SMN distributions are used to construct robust regression models to the influence of outliers on the parameter estimation process. Thus, we relaxed the usual assumption of normality of the regression models and considered that the random errors follow a SMN distribution, specifically the Student-t distribution. In addition, we consider the fact that the parameters of the regression model can change from a specific and unknown point, called change-point. In this context, the estimations of the model parameters, which include the change-point, are obtained via the EM-type algorithm (Expectation-Maximization). The WCV method is used in the selection of the model that presents greater robustness and that offers a smaller prediction error, considering that the weighting values come from step E of the EM-type algorithm. Finally, numerical examples considering simulated and real data (data from television audiences) are presented to illustrate the proposed methodology.

Download Full-text

Genotypic parameters for egg production in pure breed hens by using random regression model

Ciência Rural ◽

10.1590/0103-8478cr20141631 ◽

2017 ◽

Vol 47 (5) ◽

Author(s):

Priscila Becker Ferreira ◽

Paulo Roberto Nogara Rorato ◽

Fernanda Cristina Breda ◽

Vanessa Tomazetti Michelotti ◽

Alexandre Pires Rosa ◽

...

Keyword(s):

Regression Model ◽

Regression Models ◽

Egg Production ◽

Random Regression ◽

Broad Sense Heritability ◽

Residual Matrix ◽

Random Regression Model ◽

Residual Covariance ◽

Genotypic Correlations ◽

Selection Of

ABSTRACT: This study aimed to test different genotypic and residual covariance matrix structures in random regression models to model the egg production of Barred Plymouth Rock and White Plymouth Rock hens aged between 5 and 12 months. In addition, we estimated broad-sense heritability, and environmental and genotypic correlations. Six random regression models were evaluated, and for each model, 12 genotypic and residual matrix structures were tested. The random regression model with linear intercept and unstructured covariance (UN) for a matrix of random effects and unstructured correlation (UNR) for residual matrix adequately model the egg production curve of hens of the two study breeds. Genotypic correlations ranged from 0.15 (between age of 5 and 12 months) to 0.99 (between age of 10 and 11 months) and increased based on the time elapsed. Egg production heritability between 5- and 12-month-old hens increased with age, varying from 0.15 to 0.51. From the age of 9 months onward, heritability was moderate with estimates of genotypic correlations higher than 90% at the age of 10, 11, and 12 months. Results suggested that selection of hens to improve egg production should commence at the ninth month of age.

Download Full-text

The Odd Log-Logistic Geometric Normal Regression Model with Applications

Advances in Data Science and Adaptive Analysis ◽

10.1142/s2424922x19500037 ◽

2019 ◽

Vol 11 (01n02) ◽

pp. 1950003

Author(s):

Fábio Prataviera ◽

Gauss M. Cordeiro ◽

Edwin M. M. Ortega ◽

Adriano K. Suzuki

Keyword(s):

Regression Model ◽

Normal Distribution ◽

Regression Models ◽

Empirical Distribution ◽

Real Data ◽

Likelihood Method ◽

Standard Normal Distribution ◽

Model Parameters ◽

Diagnostic Measures ◽

Standard Normal

In several applications, the distribution of the data is frequently unimodal, asymmetric or bimodal. The regression models commonly used for applications to data with real support are the normal, skew normal, beta normal and gamma normal, among others. We define a new regression model based on the odd log-logistic geometric normal distribution for modeling asymmetric or bimodal data with support in [Formula: see text], which generalizes some known regression models including the widely known heteroscedastic linear regression. We adopt the maximum likelihood method for estimating the model parameters and define diagnostic measures to detect influential observations. For some parameter settings, sample sizes and different systematic structures, various simulations are performed to verify the adequacy of the estimators of the model parameters. The empirical distribution of the quantile residuals is investigated and compared with the standard normal distribution. We prove empirically the usefulness of the proposed models by means of three applications to real data.

Download Full-text

Selection of optimal regression models via cross-validation

Journal of Chemometrics ◽

10.1002/cem.1180020106 ◽

1988 ◽

Vol 2 (1) ◽

pp. 39-48 ◽

Cited By ~ 216

Author(s):

David W. Osten

Keyword(s):

Regression Models ◽

Cross Validation ◽

Selection Of

Download Full-text

New Flexible Regression Models Generated by Gamma Random Variables with Censored Data

International Journal of Statistics and Probability ◽

10.5539/ijsp.v5n3p9 ◽

2016 ◽

Vol 5 (3) ◽

pp. 9 ◽

Cited By ~ 4

Author(s):

Elizabeth M. Hashimoto ◽

Gauss M. Cordeiro ◽

Edwin M.M. Ortega ◽

G.G. Hamedani

Keyword(s):

Regression Model ◽

Censored Data ◽

Survival Data ◽

Regression Models ◽

Empirical Distribution ◽

Real Data ◽

Maximum Likelihood Estimates ◽

Standard Normal Distribution ◽

Model Parameters ◽

Linear Regression Models

We propose and study a new log-gamma Weibull regression model. We obtain explicit expressions for the raw and incomplete moments, quantile and generating functions and mean deviations of the log-gamma Weibull distribution. We demonstrate that the new regression model can be applied to censored data since it represents a parametric family of models which includes as sub-models several widely-known regression models and therefore can be used more effectively in the analysis of survival data. We obtain the maximum likelihood estimates of the model parameters by considering censored data and evaluate local influence on the estimates of the parameters by taking different perturbation schemes. Some global-influence measurements are also investigated. Further, for different parameter settings, sample sizes and censoring percentages, various simulations are performed. In addition, the empirical distribution of some modified residuals are displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be extended to a modified deviance residual in the proposed regression model applied to censored data. We demonstrate that our extended regression model is very useful to the analysis of real data and may give more realistic fits than other special regression models.

Download Full-text

INTERVAL PIECEWISE REGRESSION MODEL WITH AUTOMATIC CHANGE-POINT DETECTION BY QUADRATIC PROGRAMMING

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488505003503 ◽

2005 ◽

Vol 13 (03) ◽

pp. 347-361 ◽

Cited By ~ 5

Author(s):

JING-RUNG YU ◽

GWO-HSHIUNG TZENG ◽

HAN-LIN LI

Keyword(s):

Linear Programming ◽

Quadratic Programming ◽

Regression Model ◽

Change Point ◽

Regression Models ◽

Change Point Detection ◽

Fuzzy Regression ◽

Programming Approach ◽

Piecewise Regression ◽

Point Detection

To handle large variation data, an interval piecewise regression method with automatic change-point detection by quadratic programming is proposed as an alternative to Tanaka and Lee's method. Their unified quadratic programming approach can alleviate the phenomenon where some coefficients tend to become crisp in possibilistic regression by linear programming and also obtain the possibility and necessity models at one time. However, that method can not guarantee the existence of a necessity model if a proper regression model is not assumed especially with large variations in data. Using automatic change-point detection, the proposed method guarantees obtaining the necessity model with better measure of fitness by considering variability in data. Without piecewise terms in estimated model, the proposed method is the same as Tanaka and Lee's model. Therefore, the proposed method is an alternative method to handle data with the large variations, which not only reduces the number of crisp coefficients of the possibility model in linear programming, but also simultaneously obtains the fuzzy regression models, including possibility and necessity models with better fitness. Two examples are presented to demonstrate the proposed method.

Download Full-text

Finite Mixture Dynamic Regression Modeling of Panel Data With Implications for Dynamic Response Analysis

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986030002169 ◽

2005 ◽

Vol 30 (2) ◽

pp. 169-187 ◽

Cited By ~ 11

Author(s):

David Kaplan

Keyword(s):

Probability Density Function ◽

Dynamic Response ◽

Regression Model ◽

Density Function ◽

Regression Models ◽

Linear Models ◽

Finite Mixture ◽

Response Analysis ◽

Model Parameters ◽

Mixture Density

This article considers the problem of estimating dynamic linear regression models when the data are generated from finite mixture probability density function where the mixture components are characterized by different dynamic regression model parameters. Specifically, conventional linear models assume that the data are generated by a single probability density function characterized by a single set of regression model parameters. However, when the true generating model is finite mixture density function, then estimation of conventional linear models under the assumption of a single density function may lead to erroneous conclusions. Instead, it may be desirable to estimate the regression model under the assumption that the data are derived from a finite mixture density function and to examine differences in the parameters of the model within each mixture component. Dynamic regression models and subsequent dynamic response analysis using dynamic multipliers are also likely to be affected by the existence of a finite mixture density because dynamic multipliers are functions of the regression model parameters. Utilizing finite mixture modeling applied to two real data examples, this article shows that dynamic responses to changes in exogenous variables can be quite different depending on the number and nature of underlying mixture components. Implications for substantive conclusions based on the use of dynamic multipliers is discussed.

Download Full-text

The impact of measurement error in models using police recorded crime rates

10.31235/osf.io/ydf4b ◽

2021 ◽

Author(s):

Jose Pina-Sánchez ◽

David Buil-Gil ◽

ian brunton-smith ◽

Alexandru Cernat

Keyword(s):

Measurement Error ◽

Regression Models ◽

Linear Models ◽

Evidence Base ◽

Crime Rates ◽

Model Parameters ◽

Random Errors ◽

Original Scale ◽

Biasing Effects ◽

The Impact

Objectives: Assess the extent to which measurement error in police recorded crime rates impact the estimates of regression models exploring the causes and consequences of crime.Methods: We focus on linear models where crime rates are included either as the response or as an explanatory variable, in their original scale, or log-transformed. Two measurement error mechanisms are considered, systematic errors in the form of under-recorded crime, and random errors in the form of recording inconsistencies across areas. The extent to which such measurement error mechanisms impact model parameters is demonstrated algebraically, using formal notation, and graphically, using simulations.Results: Most coefficients and measures of uncertainty from models where crime rates are included in their original scale are severely biased. However, in many cases, this problem could be minimised, or altogether eliminated by log-transforming crime rates. This transforms the multiplicative measurement error observed in police recorded crime rates into a less harmful additive mechanism.Conclusions: The validity of findings from regression models where police recorded crime rates are used in their original scale is put into question. In interpreting the large evidence base exploring the effects and consequences of crime using police statistics we urge researchers to consider the biasing effects shown here. Equally, we urge researchers to log-transform crime rates before they are introduced in statistical models.

Download Full-text

An alternative approach to conventional soil–site regression modeling

Canadian Journal of Forest Research ◽

10.1139/x89-025 ◽

1989 ◽

Vol 19 (2) ◽

pp. 179-184 ◽

Cited By ~ 10

Author(s):

David L. Verbyla ◽

Richard F. Fisher

Keyword(s):

Regression Model ◽

Random Sampling ◽

Regression Models ◽

Cross Validation ◽

Classification Tree ◽

Site Index ◽

Predictor Variables ◽

Prediction Bias ◽

Alternative Approach ◽

Soil Site

The conventional approach in site-quality studies has been to develop a multiple regression site index model with soil–site measurements from randomly selected plots. This approach has several weaknessess: (i) a potential prediction bias associated with most stepwise regression procedures; (ii) low precision of soil–site regression models developed in areas with diverse topography and geologic formations; and (iii) poor representation of rare prime sites by random sampling. An alternative approach, aimed at minimizing these problems, is presented. Prediction bias potential (due to overfitting a model with too many predictor variables) can be reduced by using cross validation during model development. Models that accurately predict prime sites can be more useful than imprecise soil–site regression models. This can be accomplished by stratified random sampling from prime and nonprime site areas. Classification-tree analysis was used to develop a model that predicts prime ponderosa pine (Pinusponderosa Laws.) sites on the basis of vegetation and soil variables. Forest habitat type, percent sand content, and soil pH were model predictor variables. Cross-validation was used to estimate the accuracy of the classification tree as 88%. A multiple regression model developed from randomly selected plots consistently underestimated site index when it was applied to plots randomly selected from prime site areas. The conventional regression model was also misleading because it contained a predictor variable that was not significantly different between prime and nonprime sites.

Download Full-text

Improving Regression Models

Journal of Educational Statistics ◽

10.3102/10769986001003253 ◽

1976 ◽

Vol 1 (3) ◽

pp. 253-277 ◽

Cited By ~ 3

Author(s):

Herbert J. Walberg ◽

Sue Pinzur Rasher

Keyword(s):

Theoretical Model ◽

Regression Models ◽

Cross Validation ◽

Social Process ◽

Independent Sets ◽

Policy Implications ◽

Behavioral Data ◽

Test Results ◽

Mental Test ◽

Selection Of

This paper illustrates cut-and-try techniques that point to appropriate transformations of variables and to the selection of sets of variables for an equation that may improve understanding of a social process. The substance of the research reported — the relation of mental test results to state population, cultural, and school resource indexes (Walberg and Rasher, 1974) — illustrates typical problems of behavioral data: multi-colinearity, outliers, abnormal distributions, and the lack of a consensually-validated, explicit theoretical model. Despite these problems, data originally collected for purposes other than the investigator’s may yield tentative confirmations or cautions about prior findings and provisional indications for theory or policy; such inferences may be at least partially checked by cross-validation on independent or semi-independent sets of data. After discussing the sequence of analyses and the results, we conclude by mentioning a number of uncertainties and reservations about drawing substantive or policy implications.

Download Full-text

Fuzzy Robust Regression Model by Possibility Maximization

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2011.p0479 ◽

2011 ◽

Vol 15 (4) ◽

pp. 479-484 ◽

Cited By ~ 11

Author(s):

Yoshiyuki Yabuuchi ◽

◽

Junzo Watada ◽

Keyword(s):

Regression Model ◽

Regression Models ◽

Robust Regression ◽

Fuzzy Regression ◽

Target System ◽

Economic Systems ◽

Switching Regression ◽

Clustering Model ◽

Fuzzy Regression Model ◽

Mixed Samples

Since management and economic systems are complex, it is hard to handle data obtained in management and economic areas. Hitherto, in the fields, much research has focused on the structure and analysis of such data. H. Tanaka et al. proposed a fuzzy regression model to illustrate the potential possibilities inherent in the target system. J. C. Bezdek proposed a switching regression model based on a fuzzy clustering model to separate mixed samples coming from plural latent systems and apply regression models to the groups of samples coming from each system. It is hard to illustrate a rough and moderate possibility of the target system. In this paper, to deal with the possibility of a social system, we propose a new fuzzy robust regression model.

Download Full-text