Postlude: models and data

Author(s):  
M. D. Edge

Becoming a well-rounded data analyst requires more than the skills covered in this book. This postlude sketches some ways in which the types of thinking covered here can be extended to real problems in data analysis. Different ways of evaluating the assumptions of linear regression are considered, including plotting, hypothesis tests, and out-of-sample prediction. If the assumptions are not met, simple linear regression can be extended in various ways, including multiple regression, generalized linear models, and mixed models (among many other possibilities). This postlude concludes with a short discussion of the themes of the book: probabilistic models, methodological pluralism, and the value of elementary statistical thinking.

2016 ◽  
Vol 41 (4) ◽  
Author(s):  
Ernst Stadlober ◽  
Zuzana Hübnerová ◽  
Jaroslav Michálek ◽  
Miroslav Kolář

Brno and Graz, the second largest cities of their countries, observe in each winter season PM10 concentrations of daily means which regularly exceed the limit value of 50 ?g/m3. This is mainly caused by unfavorable dissemination conditions of the ambient air. Hence, partial regulation measureshave to be taken in Brno and Graz where specific decisions for certain regulations may be based on the average PM10 concentration of the next day provided that reliable forecasts of these values are available. For several sites in the two cities we establish forecasts of daily PM10 concentrations based onmultiple linear regression and generalized linear models utilizing both measured covariates of the present day and meteorological forecasts of the next day. The comparisons, based on different quality measures demonstrate the usefulness of both model approaches as they yield results of similar quality.Our prediction models may support future decisions concerning possible traffic restrictions or other regulations.


2020 ◽  
Vol 103 (4) ◽  
pp. 1105-1111
Author(s):  
Anli Gao ◽  
Jennifer Fischer-Jenssen ◽  
Charles Wroblewski ◽  
Perry Martos

Abstract Background Bacterial enumeration data are typically log transformed to realize a more normal distribution and stabilize the variance. Unfortunately, statistical results from log transformed data are often misinterpreted as data within the arithmetic domain. Objective To explore the implication of slope and intercept from an unweighted linear regression and compare it to the results of the regression of log transformed data. Method Mathematical formulae inferencing explained using real dataset. Results For y=Ax+B+ε, where y is the recovery (CFU/g) and x is the target concentration (CFU/g) with error ε homogeneous across x. When B=0, slope A estimates percent recovery R. In the regression of log transformed data, logy=αlogx+β+εz (equivalent to equation y=Axα·ω), it is the intercept β=logyx=logA that estimates the percent recovery in logarithm when slope α=1, which means that R doesn’t vary over x. Error term ω is multiplicative to x, while εz or log(ω) is additive to log(x). Whether the data should be transformed or not is not a choice, but a decision based on the distribution of the data. Significant difference was not found between the five models (the linear regression of log transformed data, three generalized linear models and a nonlinear model) regarding their predicted percent recovery when applied to our data. An acceptable regression model should result in approximately the best normal distribution of residuals. Conclusions Statistical procedures making use of log transformed data should be studied separately and documented as such, not collectively reported and interpreted with results studied in arithmetic domain. Highlights The way to interpret statistical results developed from arithmetic domain does not apply to that of the log transformed data.


Sign in / Sign up

Export Citation Format

Share Document