Epidemiology with R | ScienceGate

Survival analysis

Epidemiology with R ◽

10.1093/oso/9780198841326.003.0009 ◽

2020 ◽

pp. 181-218

Author(s):

Bendix Carstensen

Keyword(s):

Survival Analysis ◽

Competing Risks ◽

Cox Model ◽

Survival Function ◽

Event Analysis ◽

Time To Event ◽

Poisson Models ◽

Kaplan Meier ◽

Analysis Survival ◽

Event Rates

This chapter describes survival analysis. Survival analysis concerns data where the outcome is a length of time, namely the time from inclusion in the study (such as diagnosis of some disease) till death or some other event — hence the term 'time to event analysis', which is also used. There are two primary targets normally addressed in survival analysis: survival probabilities and event rates. The chapter then looks at the life table estimator of survival function and the Kaplan–Meier estimator of survival. It also considers the Cox model and its relationship with Poisson models, as well as the Fine–Gray approach to competing risks.

Download Full-text

Prevalence data—models, likelihood, and binomial regression

Epidemiology with R ◽

10.1093/oso/9780198841326.003.0004 ◽

2020 ◽

pp. 47-63

Author(s):

Bendix Carstensen

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimation ◽

Statistical Model ◽

Likelihood Estimation ◽

Data Models ◽

Epidemiological Methods ◽

Prevalence Data ◽

Binomial Regression ◽

Disease Condition ◽

Age And Sex

This chapter examines prevalence data, using a dataset which contains the number of diabetes patients and the total number of persons in Denmark as of January 1, 2010, classified by age and sex. Prevalence of a disease condition in a population is merely the proportion of affected people. The chapter uses prevalence to illustrate core modelling concepts: the model itself, the likelihood, the maximum likelihood estimation principle, and the properties of the results, all of which underlies most modern epidemiological methods. It also explains the concept of a statistical model leading to the distinction between empirical and theoretical prevalences. The chapter then focuses on the task of comparing different models for the same data, models that describe data in various degrees of detail.

Download Full-text

Using R

Epidemiology with R ◽

10.1093/oso/9780198841326.003.0002 ◽

2020 ◽

pp. 3-39

Author(s):

Bendix Carstensen

Keyword(s):

Open Source ◽

Epidemiological Studies ◽

Commercial Product ◽

Frequency Data ◽

Open Source License ◽

Free Open Source

This chapter discusses how the best way to learn R is to use it. One should start by using it as a simple calculator, and keep on exploring what one gets back by inspecting the size, shape, and content of what one creates. R is available from CRAN, the Comprehensive R Archive Network. A nice interface to R is RStudio, which is a commercial product, but RStudio has a free open source license that allows one to have a very good and handy interface to R for free, including the possibility of writing reports using Rmarkdown, Sweave, or knitr. The chapter then looks at the two main graphics systems used in R: base graphics, which is an integral part of any R distribution, and ggplot2 (gg referring to grammar of graphics). Data from large epidemiological studies are often summarized in the form of frequency data, which record the frequency of all possible combinations of values of the variables in the study.

Download Full-text

Analysis of follow-up data

Epidemiology with R ◽

10.1093/oso/9780198841326.003.0006 ◽

2020 ◽

pp. 93-118

Author(s):

Bendix Carstensen

Keyword(s):

Data Collection ◽

Time Scale ◽

Poisson Regression ◽

Time Dependent ◽

Follow Up Study ◽

Dependent Variables ◽

The Status ◽

Over Time

This chapter assesses the analysis and representation of follow-up data. follow-up refers to the process of monitoring persons over time for occurrence of a (set of) prespecified event(s). Practical data collection is often via look-up in registers or databases. The basic requirements for recordings in a follow-up study include date of entry to the study, date of exit from the study, and the status of the person at the exit date. The chapter then explains the likelihood from a follow-up study and why one can analyse rates using Poisson regression. The likelihood contribution from a single person's follow-up can be subdivided in contributions from subintervals of the follow-up. The chapter details the task of splitting the follow-up time along a time-scale. Finally, it considers time-dependent variables.

Download Full-text

Regression models

Epidemiology with R ◽

10.1093/oso/9780198841326.003.0005 ◽

2020 ◽

pp. 65-92

Author(s):

Bendix Carstensen

Keyword(s):

Linear Regression ◽

Regression Model ◽

Generalized Linear Models ◽

Linear Regression Model ◽

Regression Models ◽

Linear Models ◽

General Linear Model ◽

Simple Linear Regression ◽

Response Variable ◽

Quantitative Response

This chapter evaluates regression models, focusing on the normal linear regression model. The normal linear regression model establishes a relationship between a quantitative response (also called outcome or dependent) variable, assumed to be normally distributed, and one or more explanatory (also called regression, predictor, or independent) variables about which no distributional assumptions are made. The model is usually referred to as 'the general linear model'. The chapter then differentiates between simple linear regression and multiple regression. The term 'simple linear regression' covers the regression model where there is one response variable and one explanatory variable, assuming a linear relationship between the two. The chapter also discusses the model formulae in R; generalized linear models; collinearity and aliasing; and logarithmic transformations.

Download Full-text

Do not group quantitative variables

Epidemiology with R ◽

10.1093/oso/9780198841326.003.0010 ◽

2020 ◽

pp. 219-222

Author(s):

Bendix Carstensen

Keyword(s):

Utility Function ◽

The Other ◽

Optimal Decision ◽

Continuous Variables ◽

Decision Point ◽

Independent Variables ◽

Optimal Decisions ◽

The Relationship ◽

Continuous Predictor ◽

Predicted Risk

This chapter explores the problems caused by categorizing quantitative variables (here termed continuous variables). Optimum decisions are made by applying a utility function to a predicted value. At the decision point, one can solve for the personalized cutpoint for predicted risk that optimizes the decision. Dichotomization on independent variables is completely at odds with making optimal decisions. To make an optimal decision, the cutpoint for a predictor would necessarily be a function of the continuous values of all the other predictors. Moreover, categorization assumes that the relationship between the predictor and the response is flat within intervals; this assumption is far less reasonable than a linearity assumption in most cases. Categorization of continuous variables using percentiles is particularly hazardous. To make a continuous predictor be more accurately modelled when categorization is used, multiple intervals are required.

Download Full-text

Parametrization and prediction of rates

Epidemiology with R ◽

10.1093/oso/9780198841326.003.0007 ◽

2020 ◽

pp. 119-158

Author(s):

Bendix Carstensen

Keyword(s):

Mortality Rate ◽

Odds Ratio ◽

Quantitative Measurement ◽

Reporting Rate ◽

Outcome Variable ◽

Categorical Variables ◽

Linear Predictor ◽

Covariate Effects ◽

Rate Ratios

This chapter provides an overview of parametrizing quantitative covariate effects, that is, describing how variables (covariates) influence the outcome variable, be that disease odds, rates, or some quantitative measurement. It begins by differentiating between predictions and contrasts. When reporting rate ratios between two groups or the odds ratio of a disease associated with a certain difference in exposure, one is using contrasts of the outcome variable between different values of a covariate to describe the effect. Thus, one uses ratios or differences. But when reporting the mortality rate in, say, 60-year-old males, one is making a prediction of the outcome. This requires a set of values for all covariates in a model. When describing the effects of covariates by a model, one normally uses a linear predictor. The chapter then discusses the prediction of a single rate; categorical variables; the task of modelling the effect of quantitative variables; quantitative predictors; and quantitative interactions.

Download Full-text

Introduction

Epidemiology with R ◽

10.1093/oso/9780198841326.003.0001 ◽

2020 ◽

pp. 1-2

Author(s):

Bendix Carstensen

Keyword(s):

Data Structures ◽

The Way

The code shown in each chapter is available at the website http://bendixcarstensen.com/EwR (this URL is case-sensitive). You will benefit from running it on your own computer and use R to look further into the various data structures you create on the way (‘objects’ as we call them)....

Download Full-text

Case-control and case-cohort studies

Epidemiology with R ◽

10.1093/oso/9780198841326.003.0008 ◽

2020 ◽

pp. 159-180

Author(s):

Bendix Carstensen

Keyword(s):

Cohort Studies ◽

Odds Ratio ◽

Case Control Study ◽

Disease Incidence ◽

Case Control ◽

Case Control Studies ◽

Entire Cohort ◽

Nested Case Control ◽

Control Study

This chapter addresses Case-control and case-cohort studies. In a Case-control study, one samples persons based on their disease outcome, so the fraction of diseased persons in a Case-control study is usually known (at least approximately) before data collection. In a cohort (follow-up) study, the relationship between some exposure and disease incidence is investigated by following the entire cohort and measuring the rate of occurrence of new cases in the different exposure groups. The follow-up records all persons who develop the disease during the study period. Implicit in this is that the relevant exposure information is available at all times for all persons under follow-up. The chapter then looks at the statistical model for the odds ratio, before differentiating between odds ratio and rate ratio. It also considers confounding and stratified sampling; individually matched studies; and nested Case-control studies.

Download Full-text

Measures of disease occurrence

Epidemiology with R ◽

10.1093/oso/9780198841326.003.0003 ◽

2020 ◽

pp. 41-46

Author(s):

Bendix Carstensen

Keyword(s):

Mortality Rate ◽

General Population ◽

Incidence Rate ◽

Mortality Rates ◽

Incidence Rates ◽

Standardized Mortality Ratio ◽

Newly Diagnosed ◽

Disease Occurrence ◽

Mortality Ratio ◽

Incident Cases

This chapter provides a brief introduction to some of the most common measures of disease occurrence used in epidemiology, both the empirical and theoretical versions of the measures. It begins with the prevalence of a disease in a population, which is the fraction of the population that has the disease at a given date. The chapter then considers mortality rate, incidence rate, standardized mortality ratio (SMR), and survival. Mortality is typically reported as a number of people that have died in a population of a certain size. Incidence rates are defined exactly as mortality rates, where one just counts incident cases, that is, newly diagnosed cases of a particular disease. Meanwhile, the SMR is a measure of the mortality in a group of persons as compared to the general population. Finally, the survival after diagnosis of a disease is defined as the fraction of diagnosed individuals alive at a given time after diagnosis.

Download Full-text

Epidemiology with R
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Oxford University Press

Survival analysis

Prevalence data—models, likelihood, and binomial regression

Using R

Analysis of follow-up data

Regression models

Do not group quantitative variables

Parametrization and prediction of rates

Introduction

Case-control and case-cohort studies

Measures of disease occurrence

Export Citation Format

Epidemiology with RLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Oxford University Press

Survival analysis

Prevalence data—models, likelihood, and binomial regression

Using R

Analysis of follow-up data

Regression models

Do not group quantitative variables

Parametrization and prediction of rates

Introduction

Case-control and case-cohort studies

Measures of disease occurrence

Epidemiology with R
Latest Publications