What does risk mean in public health?

The epidemiologist primarily studies transitions between states of health and disease. The purpose of the present article is to define a foundational parameter for such studies, namely risk. We begin simply and build to the setting in which there is more than 1 event type (i.e., competing risks or competing events), as well as more than 1 treatment or exposure level of interest. In the presence of competing events, the risks are a set of counterfactual cumulative incidence functions for each treatment. These risks can be depicted visually and summarized numerically. We use an example from the study of human immunodeficiency virus to illustrate concepts.

The epidemiologist chiefly studies transitions between states of health and disease in populo (1) using observational (or cohort) studies. Therefore, in a well-defined study population, we are concerned with the time between 2 events, namely the origin and an outcome event of interest. The purpose of the present article is to define a foundational parameter for such epidemiologic studies, namely risk. In epidemiology, risk has been defined as “the probability of an event during a specified period of time” (2, p. 10). Below, we define risk as a function of time, allowing for competing risks (hereafter referred to as competing events) and more than 1 treatment (or exposure level) of interest. We conclude with a brief discussion of several points that are somewhat nuanced, including identification conditions, generalizability, risks versus hazards and rates, and competing events versus censored observations.

RISK

We begin with the simple setting in which there is no treatment and only a single type of event, which is the outcome of interest (i.e., there are no competing events). By “no treatment,” we mean the natural history or natural course (3), which is the observed or factual experience of the sample of n participants. To fix ideas, suppose we are interested in the t-year risk of mortality (from any cause) after initiation of antiretroviral therapy among adults infected with human immunodeficiency virus in a developed country like the United States. This is the population-average risk of all-cause mortality.

In this setting, the risk F, considered as a function of time t, is the cumulative distribution function for the time to death, also known as the complement of the survival function. The notation R was used in the companion paper (4). This risk is written formally as where T is a random variable denoting time from the origin to the event and P(·) is the probability function. We define probabilities as proportions in a hypothetical, arbitrarily large population from which our observed n participants are assumed to be a random sample (5). This risk F is a 1-dimensional function of time (i.e., the function has domain t = [0, ∞)) and is bounded by 0 and 1 (i.e., the function has a range of 0 ≤ F(t) ≤ 1). The specific values of t for which we present numerical summaries depend on the scientific context (e.g., 5-year all-cause mortality risk).

RISK WITH COMPETING EVENTS

Next consider the more complex setting in which there is more than 1 event type, denoted by J, with possible levels j ∈ {1, 2, …, m}. For the time being, we will continue to consider the (descriptive) natural course. To revisit our example, instead of the population-average risk of all-cause mortality after therapy initiation, suppose we are interested in the population-average risks of a diagnosis of acquired immunodeficiency syndrome (AIDS) (j = 1) or death from any cause without an AIDS diagnosis (j = 2) after initiation of therapy. In the presence of competing events, the outcome becomes the pair of event time and event type. We note that AIDS and all-cause death are not mutually exclusive competing events because a person infected with human immunodeficiency virus can die with or without AIDS; however, the definition of “death without an AIDS diagnosis” induces mutually exclusive competing events.

The set of risks F(t, j), 1 for each level j, are now defined by the subdistribution functions, also known as the cumulative incidence functions (6), and written formally as where T and J denote the time to the first event of any type and the event type, respectively. The risk F(t, j) is now a function with 2-dimensional domain [0, ∞) × {1, 2, …, m}, where m denotes the number of possible competing events.

The m events are competing in the sense that if 1 occurs, the others are precluded (7–9). The preclusion of a competing event is distinct from right censoring of an event; in the former case, the competing event can no longer occur, whereas in the latter case, the censored event can occur but cannot be observed. The risks F(t, j) can equivalently be written as a function of the cause-specific hazard functions λ(t, j), namely F(t,j)=∫0tλ(u,j)S(u)du⁠, where S(t)=exp{−∫0t∑j=1mλ(u,j)du}=1−F(t) is the overall survival function. In this form, it is clear that the risk of 1 event type j depends on the cause-specific hazards for the competing events through the overall survival function.

The risks as defined above can essentially be viewed as a set of m 2-dimension functions. A natural extension of the complement of the survival function is Figure 1A in the article by Cole et al. (4). (Figure 1A refers to those who report not using injection drugs, but for our purposes let us ignore that detail for the time being.) In that figure, the m event type–specific (estimated) risks are “stacked.” By stacked, we mean that the jth event-specific risk is given by the vertical distance between the jth and (j − 1)th stacked risks, where the first stacked risk is simply the risk for j = 1.

The specific values of t for which we present numerical summaries depend on the scientific context. However, if there is more than 1 event type, there is no single summary measure of risk at a given time t. To summarize the risks in a single measure at time t, one must combine the m event types in some manner (10). Such a combination can be achieved using expected utility (11), which is a way to assign preferences to the different competing events. Specifically, expected utility may take as inputs the m event-specific risk functions F(t, j) and a user-defined utility 0 ≤ u(j) ≤ 1 assigned to each event type j, according to its importance (12, p. 158). Expected utility therefore combines the m event-specific risk functions into a single weighted sum at each time t, or ∑j=1mF(t,j)u(j). For example, when an equal utility of u(j) = 1 is placed on each of the m event types, the resultant expected utility is equivalent to the distribution function F(t), which corresponds to the widely used “composite endpoint” (i.e., the top/dotted stacked line in Figure 1A of Cole et al. (4)). For our example, this composite endpoint would be the complement of AIDS-free survival, with the components in the composite being the competing events of AIDS diagnosis and all-cause mortality while free of AIDS. Such composite endpoints are often used in the presence of competing events. Alternatively, when a utility of 1 is placed on only 1 particular event type j (and utilities of 0 on all other events), the resultant expected utility reduces to the cumulative incidence function for that event type. For our example, this could be the cumulative incidence function for AIDS (or the bottom/solid line in Figure 1A) or the cumulative incidence function for death without AIDS. In principle, the utilities can be extended to be time-varying u(t, j). Determination of utilities is context-dependent and is a key area in which epidemiologists can ally productively with policy experts.

RISK WITH TREATMENTS

Now consider the setting in which we are interested in contrasting the risks across levels of a treatment denoted as A with possible levels a, where the value of A is fixed at the origin. To extend our human immunodeficiency virus example, say we are interested in the causal effect on total mortality of injection drug use (a = 1) or no such use (a = 0) at therapy initiation. From here on, we refer to potential outcomes (i.e., factual and counterfactual outcomes) and causal effects, rather than solely observed or factual outcomes (13). In the absence of competing events, these risks are then a set of counterfactual cumulative distribution functions, 1 for each level a, or formally where Ta is a random variable denoting time from the origin to the event when an individual receives treatment level a. The function Fa(t) has a 2-dimensional domain [0, ∞) × {0, 1, …, k − 1}, where k denotes levels of treatment indexed by a. Causal risk differences (and ratios) for a chosen time t may be defined as functions of Fa(t) and Fa′(t)⁠, where a′ is an alternate treatment (or reference) level. For example, the risk ratio function for a binary treatment is RR(t) = Fa=1(t)/Fa=0(t) (e.g., Figure 3 in the article by Cole et al. (4)). The choice of the treatment reference level a′ is context-dependent. However, no choice need be made between the risk difference or ratio, as both may be presented compactly side-by-side (e.g., Table 3 in Cole et al. (14)).

RISK WITH COMPETING EVENTS AND TREATMENTS

Finally, suppose we are interested in the relationships between injection drug use (a = 1) or no such use (a = 0) at therapy initiation and AIDS diagnosis (j = 1) or death from any cause without an AIDS diagnosis (j = 2). The risks are now defined as the set of counterfactual cumulative incidence functions, or formally where Ta and Ja are random variables denoting time from the origin to the first of m possible events and the event type, respectively, when an individual receives treatment level a. The above risks are a function with 3-dimensional domain, specifically [0, ∞) × {1, 2, …, m} × {0, 1, …, k − 1}, representing time t, event type j, and treatment a. Again a visual representation is helpful. The panels of Figures 1 and 2 in Cole et al. (4) provide a visual display of the risk of each of the m = 2 event types, with 1 panel for each treatment level. When there is more than 1 event type, there is no single summary measure of risk at a given time t for a given treatment level a, and expected utility may again be helpful.

DISCUSSION

We have concentrated on defining risk as a central parameter in cohort studies of the timing between an origin and outcomes of interest, which is the prototypical study design for epidemiologic research (15). However, these same risk parameters are of central interest in other epidemiologic study designs (e.g., case-cohort, case-control), which have long been viewed as designs that strategically sample from an explicit or implicit cohort. Of course, there are scenarios in which epidemiologists are concerned with the evolution of a biomarker (or biomarkers) as the outcome of interest rather than the risk. Although we did not address such scenarios, approaches similar to those above for risk can be applied.

To identify the general risks Fa(t, j), we must make assumptions that are untestable in the observed data (Appendix 2 in Cole and Hernán (16)). Alternate sets of identification assumptions may be possible. The following is a set that we find useful. First, we assume no interference (17) and either no versions of treatment (18) or treatment-variation irrelevance (19). No interference means that the potential outcomes of 1 participant do not depend on the treatment values of other participants. This assumption can be relaxed. Treatment-variation irrelevance means that for a fixed treatment level a (e.g., injection drug use), additional variations of the treatment a (e.g., daily vs. weekly use) do not affect the potential outcomes. This assumption can be relaxed (e.g., by sharpening the treatment definition). Second, we assume negligible measurement/information bias (20). Third, with respect to confounding, we may (rightly or wrongly) assume unconditional exchangeability or conditional exchangeability with positivity. Conditional exchangeability with positivity assumes that treatment was received at random, independent of potential outcomes, conditional on a vector of measured confounders W and also that there are treated and untreated individuals at every level of the measured confounders. Fourth, if we have selection (due to late entries, drop outs, or missing data), then we extend the conditional exchangeability assumption to assume selection at random conditional on measured covariates. Fifth, if we use 1 or more parametric models to account for components of the conditional exchangeability assumption, then we must assume that these models are correctly specified. Each of these identification assumptions, along with the assumed population sampling model, is not testable in observed data. However, some of these assumptions are, in principle, empirically testable. By empirically testable, we mean that another experiment or observational study can be envisaged (even if infeasible or unethical) that could be used to alter our belief that the assumption is correct. In the absence of an empirical test, bounds for the logically possible range of findings are calculable (21–23). Such bounds are often so wide as to be largely uninformative, and in such cases the sensitivity of findings to each of these assumptions can be explored as a function of an unknown bias parameter (24–27). Results from sensitivity analyses can be synthesized, for example by using multiple bias modeling (28).

Under the above assumptions, asymptotically consistent nonparametric estimation of the counterfactual risks can be accomplished. The above assumptions (including unconditional exchangeability) likely hold in large randomized experiments with no (or minimal) drop out and perfect (or near-perfect) compliance. Therefore, the Aalen-Johansen estimator (29) (which can be viewed as a generalization of the Kaplan-Meier estimator under competing events) can be used for nonparametric inference. However, in broken experiments or pseudoexperiments (i.e., observational studies), many of the assumptions described above will not hold. In particular, we often must assume conditional exchangeability as stated above. In practice, then, we often wish to standardize the crude Aalen-Johansen estimator to the total study population under the joint treatment plan of a specified treatment level and no selection bias due to missing data (e.g., loss to follow-up). By joint treatment plan, we mean more than 1 set of treatments, as exemplified by factorial randomized experiments. In realistic epidemiologic settings with multiple and continuous covariates, standardization can be accomplished using the g-formula or inverse-probability weighting of a marginal structural model (13). In the latter case, the estimator is then a semiparametric inverse probability of treatment-and-selection–weighted survival curve (30, 31), perhaps extended to allow for competing events (6) and left truncation (32, 33).

Both the g-formula and inverse-probability–weighted estimators can be sensitive to the specification of the parametric models used to construct the estimators. Moreover, standard inverse-probability–weighted approaches for estimating the risks are not semiparametric efficient (34). Doubly robust and more efficient estimators of the risks are available by drawing on the theory of semiparametric inference, including augmented inverse-probability weighting (35, 36) and targeted minimum-loss estimation (37).

Epidemiologists often must combine information about risks from multiple studies (38, 39) or add a current estimate to the existing knowledge base while acknowledging the dangers of combining heterogeneous estimates. When combining is appropriate (40), semi-Bayes methods (41) are a reasonable approach to combine estimates (42) and may be viewed as an extension of inverse-variance weighting (43). Such “semi-Bayes semiparametric” methods will also be desirable whenever the parameters in a structural model would benefit from penalization (44).

Defining and identifying the risks in the study population and estimating a risk difference or ratio in the study sample is not sufficient for valid inference if the population implied by the study sample and the target population differ on factors for which there is treatment-effect heterogeneity (or there are differences in treatment versions or interference patterns (45)). Inverse-probability weights might be useful in addressing generalizability under treatment-effect heterogeneity (46), although one must be cautious in such generalization (47).

In the present article, we have focused on the risk rather than the hazard because of the hazard's conditional nature. The hazard at time t is the instantaneous rate of the outcome at time t conditional on being at risk at time t. This conditioning on survival causes the hazard ratio to be a noncollapsible parameter (48, 49). The value of a noncollapsible parameter depends on the set of covariates that are conditioned on by design or analysis, even in the absence of systematic errors (i.e., confounding, selection, and measurement/information bias). Summarizing a knowledge base built from a set of noncollapsible parameter estimates is more complicated than summarizing a knowledge base built from analogous collapsible parameter estimates because differences in noncollapsible estimates may be due to noncollapsibility in addition to random and systematic errors. Related, we also prefer the risk to the incidence density because the incidence density can be defined as a hazard averaged over a period of time and therefore inherits the complications due to noncollapsibility (50, 51). Also, there are additional complications for the incidence density due to time averaging (52).

When faced with competing events, sometimes epidemiologists treat competing events the same as drop out by right censoring such events. This ad hoc approach only provides a reasonable approximation to the risk defined above when all competing events are rare. When competing events are not rare, absolute risks will not be consistently estimated by censoring the competing events.

G-methods (53, 54) can be used when the treatment of interest is time-fixed or time-varying. G-methods, developed by Robins et al. (53, 54), are quantitative methods to estimate causal effects for fixed and static or dynamic time-varying treatments or exposures; they include the aforementioned g-formula (14), g-estimation of structural nested models (55), and inverse-probability–weighted estimation of marginal structural models (56). Modifiable risk factors are often time-varying treatments. The risks defined above can be generalized to compare time-varying treatment plans, in which case the assumptions required for inference would also need to be generalized (54). In conclusion, our definition of risk is not new (53, 57), but a deepened appreciation of risk and the implications discussed above is beneficial for epidemiology.

ACKNOWLEDGMENTS

Author affiliations: Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina (Stephen R. Cole, M. Alan Brookhart, Daniel Westreich); and Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina (Michael G. Hudgens).

S.R.C. was supported in part by National Institutes of Health (NIH) grants R01AI100654, R24AI067039, U01AI103390, and P30AI50410. M.G.H. was supported in part by NIH grants R01AI100654 and P30AI50410. D.W. was supported in part by NIH grants R01AI100654 and DP2HD084070. M.A.B. was supported in part by NIH grants R01AG042845, R21HD080214, and R01AG023178 and through contracts with the Agency for Healthcare Research and Quality's Developing Evidence to Inform Decisions About Effectiveness program and the Patient Centered Outcomes Research Institute.

We thank Drs. Lauren E. Cain, Jessie K. Edwards, Sander Greenland, Miguel A. Hernán, Chanelle J. Howe, Timothy L. Lash, Bryan Lau, Ashley I. Naimi, Robert W. Platt, Charles Poole, David B. Richardson, James M. Robins, Enrique F. Schisterman, Bryan E. Shepherd, and Elizabeth Stuart for expert advice.

Conflict of interest: S.R.C. and D.W. have provided ad hoc consulting on epidemiologic methods to the NIH/Eunice Kennedy Shriver National Institute of Child Health and Human Development.

REFERENCES

3

,  ,  , et al. 

The parametric g-formula to estimate the effect of highly active antiretroviral therapy on incident AIDS or death

, , , vol.  (pg. -)

4

,  ,  , et al. 

Estimation of the standardized risk difference and ratio in a competing risks framework: application to injection drug use and progression to AIDS after initiation of antiretroviral therapy

, , , vol.  (pg. -)

5

Confidence intervals for causal parameters

, , , vol.  (pg. -)

6

A class of k-sample tests for comparing the cumulative incidence of a competing risk

, , , vol.  (pg. -)

7

,  ,  . 

Competing risk regression models for epidemiologic data

, , , vol.  (pg. -)

8

,  ,  . 

Tutorial in biostatistics: competing risks and multi-state models

, , , vol.  (pg. -)

9

,  ,  , et al. 

Competing risks in epidemiology: possibilities and pitfalls

, , , vol.  (pg. -)

10

Epidemiologic measures and policy formulation: lessons from potential outcomes

, , , vol.  pg.  

11

,  . 

Comment on: causal inference without counterfactuals

, , , vol.  (pg. -)

12

. , 

Understanding Uncertainty

13

,  ,  . 

Marginal structural models and causal inference in epidemiology

, , , vol.  (pg. -)

14

,  ,  , et al. 

Analysis of occupational asbestos exposure and lung cancer mortality using the g formula

, , , vol.  (pg. -)

15

,  . 

Evolution of the cohort study

, , , vol.  (pg. -)

16

,  . 

Constructing inverse probability weights for marginal structural models

, , , vol.  (pg. -)

17

,  . 

Toward causal inference with interference

, , , vol.  (pg. -)

18

Discussion of “Randomized analysis of experimental data: the Fisher randomization test” by Basu D

, , , vol.  (pg. -)

19

Concerning the consistency assumption in causal inference

, , , vol.  (pg. -)

20

,  . 

Invited commentary: causal diagrams and measurement bias

, , , vol.  (pg. -)

21

. ,  ,  . 

The analysis of randomized and nonrandomized AIDS treatment trials using a new approach to causal inference in longitudinal studies

Health Service Research Methodology: A Focus on AIDS

, (pg. -)

22

Nonparametric bounds on treatment effects

, , , vol.  (pg. -)

23

,  . 

Bounds on treatment effects from studies with imperfect compliance

, , , vol.  (pg. -)

24

Basic methods for sensitivity analysis of biases

, , , vol.  (pg. -)

25

,  . 

Semi-automated sensitivity analysis to assess systematic errors in observational data

, , , vol.  (pg. -)

26

,  ,  , et al. 

Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures

, , , vol.  (pg. -)

27

,  . 

Monte Carlo sensitivity analysis and Bayesian analysis of smoking as an unmeasured confounder in a study of silica and lung cancer

, , , vol.  (pg. -)

28

Multiple-bias modelling for analysis of observational data

J R Stat Soc Ser A Stat Soc

, , vol.  (pg. -)

29

,  . 

An empirical transition matrix for non-homogeneous Markov chains based on censored observations

, , , vol.  (pg. -)

30

,  . 

Correcting for noncompliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests

, , , vol.  (pg. -)

31

,  . 

Adjusted survival curves with inverse probability weights

Comput Methods Programs Biomed

, , vol.  (pg. -)

32

The empirical distribution function with arbitrarily grouped, censored and truncated data

, , , vol.  (pg. -)

33

Cause-specific cumulative incidence estimation and the fine and gray model under both left truncation and right censoring

, , , vol.  (pg. -)

34

Semiparametric efficiency bounds

, , , vol.  (pg. -)

35

,  ,  . 

Estimation of regression coefficients when some regressors are not always observed

, , , vol.  (pg. -)

36

. , 

Semiparametric Theory and Missing Data

37

,  . , 

Targeted Learning: Causal Inference From Observational and Experimental Data

38

Quantitative methods in the review of epidemiologic literature

, , , vol.  (pg. -)

39

,  . ,  ,  . , , 3rd ed

Lippincott Williams & Wilkins

(pg. -)

40

,  . 

Random-effects meta-analyses are not always conservative

, , , vol.  (pg. -)

41

Generalized conjugate priors for Bayesian analysis of risk and survival regressions

, , , vol.  (pg. -)

42

The Bayes/non-Bayes compromise: a brief review

, , , vol.  (pg. -)

43

Bayesian perspectives for epidemiological research: I. Foundations and basic methods

, , , vol.  (pg. -)

44

,  ,  . 

Maximum likelihood, profile likelihood, and penalized likelihood: a primer

, , , vol.  (pg. -)

45

,  . 

Compound treatments and transportability of causal inference

, , , vol.  (pg. -)

46

,  . 

Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 trial

, , , vol.  (pg. -)

47

,  . 

Commentary: extending organizational schema for causal effects

, , , vol.  (pg. -)

48

,  ,  . 

Estimating short-term effects of time-varying exposures on survival

, , , vol.  (pg. -)

49

The hazards of hazard ratios

, , , vol.  (pg. -)

50

Absence of confounding does not correspond to collapsibility of the rate ratio or rate difference

, , , vol.  (pg. -)

51

Confounding of incidence density ratio in case-control studies

, , , vol.  (pg. -)

52

Events per person-time (incidence rate): a misleading statistic?

, , , vol.  (pg. -)

53

A new approach to causal inference in mortality studies with a sustained exposure period: application to control of the healthy worker survivor effect

, , , vol.  (pg. -)

54

,  . ,  ,  , et al. 

Estimation of the causal effects of time-varying exposures

Longitudinal Data Analysis

, (pg. -)

55

,  . 

Effect of acyclovir on herpetic ocular recurrence using a structural nested model

, , , vol.  (pg. -)

56

,  ,  , et al. 

Effect of highly active antiretroviral therapy on time to acquired immunodeficiency syndrome or death using marginal structural models

, , , vol.  (pg. -)

57

,  . 

Estimates of absolute cause-specific risk in cohort studies

, , , vol.  (pg. -)

© The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: [email protected].

What does risk mean in epidemiology?

In epidemiology, risk has been defined as “the probability of an event during a specified period of time” (2, p. 10). Below, we define risk as a function of time, allowing for competing risks (hereafter referred to as competing events) and more than 1 treatment (or exposure level) of interest.

What is a risk to human health?

A risk is the probability (or likelihood) that a hazard will cause harm. A health risk is therefore the probability (or likelihood) that exposure to a health hazard will cause harm.

What are the 3 types of risk factors?

A risk factor is a characteristic, condition, or behaviour that increases the likelihood of getting a disease or injury..
Behavioural..
Physiological..
Demographic..
Environmental..
Genetic..

What is a risk in health and safety?

When we refer to risk in relation to occupational safety and health the most commonly used definition is 'risk is the likelihood that a person may be harmed or suffers adverse health effects if exposed to a hazard.