Methods/Statistics
Estimating absolute risks from case-cohort designs Caitlin A. Cassidy* Caitlin A. Cassidy Jessie K. Edwards
Case-cohort designs are useful in epidemiology when exposure or covariate measurement is resource intense, or an outcome of interest is rare. Here, we define a case-cohort design to include all cases from a cohort and a randomly selected sub-cohort from the entire risk set at baseline. Cox proportional hazard models are typically used to compute hazard ratios from this design. However, the absolute risks under each exposure are often also of interest. Here, we describe a nonparametric approach to estimate these risks and the risk difference (RD) directly from case-cohort data.
We propose a weighted Kaplan-Meier estimator to estimate risk and survival functions under each exposure of interest in a case-cohort design. We compare the proposed estimator to a weighted Kaplan-Meier estimator that includes only individuals from the random sub-cohort. The weights account for the probability of inclusion in the sub-cohort and confounding.
We simulate 2000 replicates of a cohort from which a sub-cohort is randomly selected. The time-to-event data are generated assuming an increasing hazard function with a Weibull distribution. We estimate the RD of a binary outcome Y=1 comparing levels of a binary exposure A=1 vs A=0. We vary the true RD, sub-cohort sampling methods, and sampling fractions. We consider censoring and confounding by a binary covariate X. We compare the bias and precision of these two estimators of the RD. We illustrate both estimators in an applied example using data from the Women’s Interagency HIV Study.
In most scenarios, both estimators demonstrate low average bias (≤ 0.003). Bias decreases as sample sizes and sampling fractions increase. The inclusion of a confounding variable and censoring leads to marginal increases in bias in most scenarios (≤ 0.007). The proposed estimator utilizing the case-cohort design demonstrates average standard errors that are less than half of those from the estimator utilizing only individuals in the random sub-cohort.
Risk functions and risk differences can be directly estimated from a case-cohort design using a weighted Kaplan-Meier approach. A case-cohort design yields greater statistical efficiency compared to a design using a random sub-cohort and decreases the study resources required to estimate absolute risks and risk differences compared to a study of a full cohort.