Epidemiology with R
Latest Publications


TOTAL DOCUMENTS

10
(FIVE YEARS 10)

H-INDEX

0
(FIVE YEARS 0)

Published By Oxford University Press

9780198841326, 9780191876936

2020 ◽  
pp. 181-218
Author(s):  
Bendix Carstensen

This chapter describes survival analysis. Survival analysis concerns data where the outcome is a length of time, namely the time from inclusion in the study (such as diagnosis of some disease) till death or some other event — hence the term 'time to event analysis', which is also used. There are two primary targets normally addressed in survival analysis: survival probabilities and event rates. The chapter then looks at the life table estimator of survival function and the Kaplan–Meier estimator of survival. It also considers the Cox model and its relationship with Poisson models, as well as the Fine–Gray approach to competing risks.


2020 ◽  
pp. 47-63
Author(s):  
Bendix Carstensen

This chapter examines prevalence data, using a dataset which contains the number of diabetes patients and the total number of persons in Denmark as of January 1, 2010, classified by age and sex. Prevalence of a disease condition in a population is merely the proportion of affected people. The chapter uses prevalence to illustrate core modelling concepts: the model itself, the likelihood, the maximum likelihood estimation principle, and the properties of the results, all of which underlies most modern epidemiological methods. It also explains the concept of a statistical model leading to the distinction between empirical and theoretical prevalences. The chapter then focuses on the task of comparing different models for the same data, models that describe data in various degrees of detail.


2020 ◽  
pp. 3-39
Author(s):  
Bendix Carstensen

This chapter discusses how the best way to learn R is to use it. One should start by using it as a simple calculator, and keep on exploring what one gets back by inspecting the size, shape, and content of what one creates. R is available from CRAN, the Comprehensive R Archive Network. A nice interface to R is RStudio, which is a commercial product, but RStudio has a free open source license that allows one to have a very good and handy interface to R for free, including the possibility of writing reports using Rmarkdown, Sweave, or knitr. The chapter then looks at the two main graphics systems used in R: base graphics, which is an integral part of any R distribution, and ggplot2 (gg referring to grammar of graphics). Data from large epidemiological studies are often summarized in the form of frequency data, which record the frequency of all possible combinations of values of the variables in the study.


2020 ◽  
pp. 93-118
Author(s):  
Bendix Carstensen

This chapter assesses the analysis and representation of follow-up data. follow-up refers to the process of monitoring persons over time for occurrence of a (set of) prespecified event(s). Practical data collection is often via look-up in registers or databases. The basic requirements for recordings in a follow-up study include date of entry to the study, date of exit from the study, and the status of the person at the exit date. The chapter then explains the likelihood from a follow-up study and why one can analyse rates using Poisson regression. The likelihood contribution from a single person's follow-up can be subdivided in contributions from subintervals of the follow-up. The chapter details the task of splitting the follow-up time along a time-scale. Finally, it considers time-dependent variables.


2020 ◽  
pp. 65-92
Author(s):  
Bendix Carstensen

This chapter evaluates regression models, focusing on the normal linear regression model. The normal linear regression model establishes a relationship between a quantitative response (also called outcome or dependent) variable, assumed to be normally distributed, and one or more explanatory (also called regression, predictor, or independent) variables about which no distributional assumptions are made. The model is usually referred to as 'the general linear model'. The chapter then differentiates between simple linear regression and multiple regression. The term 'simple linear regression' covers the regression model where there is one response variable and one explanatory variable, assuming a linear relationship between the two. The chapter also discusses the model formulae in R; generalized linear models; collinearity and aliasing; and logarithmic transformations.


2020 ◽  
pp. 219-222
Author(s):  
Bendix Carstensen

This chapter explores the problems caused by categorizing quantitative variables (here termed continuous variables). Optimum decisions are made by applying a utility function to a predicted value. At the decision point, one can solve for the personalized cutpoint for predicted risk that optimizes the decision. Dichotomization on independent variables is completely at odds with making optimal decisions. To make an optimal decision, the cutpoint for a predictor would necessarily be a function of the continuous values of all the other predictors. Moreover, categorization assumes that the relationship between the predictor and the response is flat within intervals; this assumption is far less reasonable than a linearity assumption in most cases. Categorization of continuous variables using percentiles is particularly hazardous. To make a continuous predictor be more accurately modelled when categorization is used, multiple intervals are required.


2020 ◽  
pp. 119-158
Author(s):  
Bendix Carstensen

This chapter provides an overview of parametrizing quantitative covariate effects, that is, describing how variables (covariates) influence the outcome variable, be that disease odds, rates, or some quantitative measurement. It begins by differentiating between predictions and contrasts. When reporting rate ratios between two groups or the odds ratio of a disease associated with a certain difference in exposure, one is using contrasts of the outcome variable between different values of a covariate to describe the effect. Thus, one uses ratios or differences. But when reporting the mortality rate in, say, 60-year-old males, one is making a prediction of the outcome. This requires a set of values for all covariates in a model. When describing the effects of covariates by a model, one normally uses a linear predictor. The chapter then discusses the prediction of a single rate; categorical variables; the task of modelling the effect of quantitative variables; quantitative predictors; and quantitative interactions.


2020 ◽  
pp. 1-2
Author(s):  
Bendix Carstensen
Keyword(s):  

The code shown in each chapter is available at the website http://bendixcarstensen.com/EwR (this URL is case-sensitive). You will benefit from running it on your own computer and use R to look further into the various data structures you create on the way (‘objects’ as we call them)....


2020 ◽  
pp. 159-180
Author(s):  
Bendix Carstensen

This chapter addresses Case-control and case-cohort studies. In a Case-control study, one samples persons based on their disease outcome, so the fraction of diseased persons in a Case-control study is usually known (at least approximately) before data collection. In a cohort (follow-up) study, the relationship between some exposure and disease incidence is investigated by following the entire cohort and measuring the rate of occurrence of new cases in the different exposure groups. The follow-up records all persons who develop the disease during the study period. Implicit in this is that the relevant exposure information is available at all times for all persons under follow-up. The chapter then looks at the statistical model for the odds ratio, before differentiating between odds ratio and rate ratio. It also considers confounding and stratified sampling; individually matched studies; and nested Case-control studies.


2020 ◽  
pp. 41-46
Author(s):  
Bendix Carstensen

This chapter provides a brief introduction to some of the most common measures of disease occurrence used in epidemiology, both the empirical and theoretical versions of the measures. It begins with the prevalence of a disease in a population, which is the fraction of the population that has the disease at a given date. The chapter then considers mortality rate, incidence rate, standardized mortality ratio (SMR), and survival. Mortality is typically reported as a number of people that have died in a population of a certain size. Incidence rates are defined exactly as mortality rates, where one just counts incident cases, that is, newly diagnosed cases of a particular disease. Meanwhile, the SMR is a measure of the mortality in a group of persons as compared to the general population. Finally, the survival after diagnosis of a disease is defined as the fraction of diagnosed individuals alive at a given time after diagnosis.


Sign in / Sign up

Export Citation Format

Share Document