Exploratory Multivariable Analyses of California Driver Record Accident Rates

Author(s):  
Michael A. Gebers

Since 1964 the California Department of Motor Vehicles has issued several monographs on driver characteristics and accident risk factors as part of a series of analyses known as the California driver record study. A number of regression analyses were conducted of driving record variables measured over a 6-year time period (1986 to 1991). The techniques presented consist of ordinary least squares, weighted least squares, Poisson, negative binomial, linear probability, and logistic regression models. The objective of the analyses was to compare the results obtained from several different regression techniques under consideration for use in the in-progress California driver record study. The results are informative in determining whether the various regression methods produce similar results for different sample sizes and in exploring whether reliance on ordinary least squares techniques in past California driver record study analyses has produced biased significance levels and parameter estimates. The results indicate that, for these data, the use of the different regression techniques do not lead to any greater increase in individual accident prediction beyond that obtained through application of ordinary least squares regression. The methods produce almost identical results in terms of the relative importance and statistical significance of the independent variables. It therefore appears safe to employ ordinary least squares multiple regression techniques on driver accident count distributions of the type represented by California driver records, at least when the sample sizes are large.

2009 ◽  
Vol 2009 ◽  
pp. 1-8 ◽  
Author(s):  
Janet Myhre ◽  
Daniel R. Jeske ◽  
Michael Rennie ◽  
Yingtao Bi

A heteroscedastic linear regression model is developed from plausible assumptions that describe the time evolution of performance metrics for equipment. The inherited motivation for the related weighted least squares analysis of the model is an essential and attractive selling point to engineers with interest in equipment surveillance methodologies. A simple test for the significance of the heteroscedasticity suggested by a data set is derived and a simulation study is used to evaluate the power of the test and compare it with several other applicable tests that were designed under different contexts. Tolerance intervals within the context of the model are derived, thus generalizing well-known tolerance intervals for ordinary least squares regression. Use of the model and its associated analyses is illustrated with an aerospace application where hundreds of electronic components are continuously monitored by an automated system that flags components that are suspected of unusual degradation patterns.


Entropy ◽  
2022 ◽  
Vol 24 (1) ◽  
pp. 95
Author(s):  
Pontus Söderbäck ◽  
Jörgen Blomvall ◽  
Martin Singull

Liquid financial markets, such as the options market of the S&P 500 index, create vast amounts of data every day, i.e., so-called intraday data. However, this highly granular data is often reduced to single-time when used to estimate financial quantities. This under-utilization of the data may reduce the quality of the estimates. In this paper, we study the impacts on estimation quality when using intraday data to estimate dividends. The methodology is based on earlier linear regression (ordinary least squares) estimates, which have been adapted to intraday data. Further, the method is also generalized in two aspects. First, the dividends are expressed as present values of future dividends rather than dividend yields. Second, to account for heteroscedasticity, the estimation methodology was formulated as a weighted least squares, where the weights are determined from the market data. This method is compared with a traditional method on out-of-sample S&P 500 European options market data. The results show that estimations based on intraday data have, with statistical significance, a higher quality than the corresponding single-times estimates. Additionally, the two generalizations of the methodology are shown to improve the estimation quality further.


1985 ◽  
Vol 15 (2) ◽  
pp. 331-340 ◽  
Author(s):  
T. Cunia ◽  
R. D. Briggs

To construct biomass tables for various tree components that are consistent with each other, one may use linear regression techniques with dummy variables. When the biomass of these components is measured on the same sample trees, one should also use the generalized rather than ordinary least squares method. A procedure is shown which allows the estimation of the covariance matrix of the sample biomass values and circumvents the problem of storing and inverting large covariance matrices. Applied to 20 sets of sample tree data, the generalized least squares regressions generated estimates which, on the average were slightly higher (about 1%) than the sample data. The confidence and prediction bands about the regression function were wider, sometimes considerably wider than those estimated by the ordinary weighted least squares.


1989 ◽  
Vol 19 (5) ◽  
pp. 664-673 ◽  
Author(s):  
Andrew J. R. Gillespie ◽  
Tiberius Cunia

Biomass tables are often constructed from cluster samples by means of ordinary least squares regression estimation procedures. These procedures assume that sample observations are uncorrelated, which ignores the intracluster correlation of cluster samples and results in underestimates of the model error. We tested alternative estimation procedures by simulation under a variety of cluster sampling methods, to determine combinations of sampling and estimation procedures that yield accurate parameter estimates and reliable estimates of error. Modified, generalized, and jack-knife least squares procedures gave accurate parameter and error estimates when sample trees were selected with equal probability. Regression models that did not include height as a predictor variable yielded biased parameter estimates when sample trees were selected with probability proportional to tree size. Models that included height did not yield biased estimates. There was no discernible gain in precision associated with sampling with probability proportional to size. Random coefficient regressions generally gave biased point estimates with poor precision, regardless of sampling method.


Separations ◽  
2018 ◽  
Vol 5 (4) ◽  
pp. 49 ◽  
Author(s):  
Juan Sanchez

It is necessary to determine the limit of detection when validating any analytical method. For methods with a linear response, a simple and low labor-consuming procedure is to use the linear regression parameters obtained in the calibration to estimate the blank standard deviation from the residual standard deviation (sres), or the intercept standard deviation (sb0). In this study, multiple experimental calibrations are evaluated, applying both ordinary and weighted least squares. Moreover, the analyses of replicated blank matrices, spiked at 2–5 times the lowest calculated limit values with the two regression methods, are performed to obtain the standard deviation of the blank. The limits of detection obtained with ordinary least squares, weighted least squares, the signal-to-noise ratio, and replicate blank measurements are then compared. Ordinary least squares, which is the simplest and most commonly applied calibration regression methodology, always overestimate the values of the standard deviations at the lower levels of calibration ranges. As a result, the detection limits are up to one order of magnitude greater than those obtained with the other approaches studied, which all gave similar limits.


2014 ◽  
Vol 2014 ◽  
pp. 1-17 ◽  
Author(s):  
Dana D. Marković ◽  
Branislava M. Lekić ◽  
Vladana N. Rajaković-Ognjanović ◽  
Antonije E. Onjia ◽  
Ljubinka V. Rajaković

Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart’s percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method.


Talanta ◽  
2010 ◽  
Vol 80 (3) ◽  
pp. 1102-1109 ◽  
Author(s):  
Rosilene S. Nascimento ◽  
Roberta E.S. Froes ◽  
Nilton O.C. e Silva ◽  
Rita L.P. Naveira ◽  
Denise B.C. Mendes ◽  
...  

2018 ◽  
Vol 7 (4.30) ◽  
pp. 106
Author(s):  
N S M Shariff ◽  
H M B Duzan

The Ordinary Least Squares (OLS) is a common method to investigate the linear relationship among variable of interest. The presence of multicollinearity will produce unreliable result in the parameter estimates if OLS is applied to estimate the model. Due to such reason, this study aims to use the proposed ridge estimator as linear combinations of the coefficient of least squares regression of explanatory variables to the real application. The numerical example of stock market price and macroeconomic variables in Malaysia is employed using both methods with the aim of investigating the relationship of the variables in the presence of multicollinearity in the data set.  The variables on interest are Consumer Price Index (CPI), Gross Domestic Product (GDP), Base Lending Rate (BLR) and Money Supply (M1). The obtained findings show that the proposed procedure is able to estimate the model and produce reliable result by reducing the effect of multicollinearity in the data set.


Author(s):  
Daniel Hoechle

I present a new Stata program, xtscc, that estimates pooled ordinary least-squares/weighted least-squares regression and fixed-effects (within) regression models with Driscoll and Kraay (Review of Economics and Statistics 80: 549–560) standard errors. By running Monte Carlo simulations, I compare the finite-sample properties of the cross-sectional dependence–consistent Driscoll–Kraay estimator with the properties of other, more commonly used covariance matrix estimators that do not account for cross-sectional dependence. The results indicate that Driscoll–Kraay standard errors are well calibrated when cross-sectional dependence is present. However, erroneously ignoring cross-sectional correlation in the estimation of panel models can lead to severely biased statistical results. I illustrate the xtscc program by considering an application from empirical finance. Thereby, I also propose a Hausman-type test for fixed effects that is robust to general forms of cross-sectional and temporal dependence.


1987 ◽  
Vol 109 (1) ◽  
pp. 103-112
Author(s):  
C. R. Mischke

In estimating the cumulative density function of data, investigators selectively transform the data and their order statistics in order to achieve rectification of the data string. Ordinary least-squares regression procedures no longer apply because of the transformations. Investigators are often seeking a fifty-percent (median) locus, which least-squares methods do not ordinarily discover. A weighted least-squares regression procedure is presented that will establish an estimate of the mean CDF line and through appropriate rotation, provide an estimate of the median CDF line. Examples from common distributions follow a general development.


Sign in / Sign up

Export Citation Format

Share Document