Title: | Doubly Robust Estimation of Survival Differences with Censored Data |
---|---|
Description: | An implementation of several doubly robust estimators for the survival difference at a given time point and one more complex doubly robust estimator for the survival curve process. The estimators are doubly robust in the sense that they are consistent if the censoring model is correctly specified for censoring and either the outcome model is correctly specified for confounding or the exposure model is correctly specified for confounding. See <doi:10.48550/arXiv.2310.16207> for more details and examples. |
Authors: | Michael C Sachs [aut, cre], Arvid Sjölander [aut], Erin E Gabriel [aut] |
Maintainer: | Michael C Sachs <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2024-11-19 05:57:46 UTC |
Source: | https://github.com/sachsmc/drsurv |
Doubly robust estimation of the survival probabilities in two exposure groups
and their difference using the method of Blanche et al 2023. By default it is
assumed that censoring is completely independent, but covariate-dependence
can be modeled by either specifying the right side of cformula
which
will be passed to a Cox model for censoring, or estimating censoring weights
externally and passing them to the cens.weights
argument.
blanche_survdiff( oformula, cformula = ~1, cens.weights = NULL, eformula, times, data, weights = NULL, subset = NULL )
blanche_survdiff( oformula, cformula = ~1, cens.weights = NULL, eformula, times, data, weights = NULL, subset = NULL )
oformula |
The outcome formula, the left side should be a call to Event |
cformula |
The censoring model formula, a one sided formula with no outcome |
cens.weights |
A vector of inverse probability of censoring weights. If not NULL, then |
eformula |
The exposure model formula, which will be fit by logistic regression in a call to glm |
times |
Vector of times at which to estimate the survival probabilities |
data |
Data frame in which to find the variables in the model formulas |
weights |
Vector of case weights |
subset |
Logical vector |
As presented in Blanche et al. 2023, we model the causal survival
probabilities . We assume there is a way to correctly
specify for confounding a model for
as in a logistic
regression model
The working propensity model for exposure is a logistic regression model. In
addition to these models, we
model the censoring distribution as If
cens.weights
is not NULL, then we estimate
by a Cox model. This can be done externally by Aalen's additive hazard model,
nonparametric estimators stratified on covariates, or if censoring is assumed
independent of covariates,
may be estimated with the Kaplan-Meier
estimator or any other nonparametric estimator and passed in the
cens.weights
argument. Let the estimated inverse
probability of censoring weights be
Then let denote the parameter estimates resulting
from the inverse probability of censoring weighted, but propensity score
unweighted, score equations for the logistic model estimated via maximum
likelihood among the subset of individuals who either had the event before
time t or remained under observation until at least time t. Then the doubly
robust estimator of the causal event probability (one minus survival) under
exposure level
is
where the term is defined to be 0
for individuals censored before time
, but it is otherwise fully
observed. Similarly, we can substitute in
and take the difference
between them to obtain estimates of differences in survival probabilities. A
standard error estimator is suggested in Blanche et al. 2023 and implemented
in the R package logitIPCWATE. This function is basically a
wrapper to that function to be consistent with the other estimators.
A list with estimated survival probabilities in each group, their difference, and estimated standard errors.
Blanche, Paul Frédéric, Anders Holt, and Thomas Scheike. "On logistic regression with right censored data, with or without competing risks, and its use for estimating treatment effects." Lifetime data analysis 29(2):441-482, 2023.
df <- rotterdam df$time <- pmin(df$rtime, df$dtime) / 365.25 df$status <- ifelse(df$recur == 1 | df$death == 1, 1, 0) df$censor <- 1 - df$status df$chemo <- factor(df$chemo) drFit <- blanche_survdiff( oformula = Event(time, status) ~ chemo + year + age + meno + size + factor(grade) + nodes + pgr + er + hormon, cformula = ~ chemo + meno + size, eformula = chemo ~ year + age + meno + size + factor(grade) + nodes + pgr + er + hormon, times = c(2.5, 5, 7.5), data = df ) drFit
df <- rotterdam df$time <- pmin(df$rtime, df$dtime) / 365.25 df$status <- ifelse(df$recur == 1 | df$death == 1, 1, 0) df$censor <- 1 - df$status df$chemo <- factor(df$chemo) drFit <- blanche_survdiff( oformula = Event(time, status) ~ chemo + year + age + meno + size + factor(grade) + nodes + pgr + er + hormon, cformula = ~ chemo + meno + size, eformula = chemo ~ year + age + meno + size + factor(grade) + nodes + pgr + er + hormon, times = c(2.5, 5, 7.5), data = df ) drFit
Doubly robust estimation of the survival probabilities in two exposure groups
and their difference using the method of Wang (2018). By default it is
assumed that censoring is completely independent, but covariate-dependence
can be modeled by either specifying the right side of cformula
and cens.method
which
will be passed to cumincglm.
pseudo_survdiff( oformula, cformula = ~1, cens.method = NULL, eformula, times, data, weights = NULL, subset = NULL, R = 50, parallel = "no", cl = NULL, ncpus = NULL )
pseudo_survdiff( oformula, cformula = ~1, cens.method = NULL, eformula, times, data, weights = NULL, subset = NULL, R = 50, parallel = "no", cl = NULL, ncpus = NULL )
oformula |
The outcome formula, the left side should be a call to Surv |
cformula |
The censoring model formula, a one sided formula with no outcome |
cens.method |
Method to use for the censoring model, passed to cumincglm |
eformula |
The exposure model formula, which will be fit by logistic regression in a call to glm |
times |
Vector of times at which to estimate the survival probabilities |
data |
Data frame in which to find the variables in the model formulas |
weights |
Vector of case weights |
subset |
Logical vector |
R |
Number of bootstrap replicates for standard error |
parallel |
Parallel processing for bootstrap, see boot |
cl |
Optional cluster if parallel processing, see boot |
ncpus |
Number of cpus to use for parallel processing, see boot |
As presented in Wang 2018, let the jackknife pseudo observation for the $i$th subject be
where
is an estimator of the survival probability at time
using all
subjects, and
is an estimator of the
survival probability at time
leaving the
th subject out of the
estimation procedure. The assumption about censoring is implied by the way
these pseudo observations are computed, i.e., these could be inverse
probability of censoring weighted estimators where the censoring distribution
may depend on covariates, nonparametric estimators stratified on covariates,
or simply the Kaplan-Meier estimators if censoring is assumed to be
independent of covariates. We then use a working outcome model for the
conditional survival probability:
where
is the link function that coincides with how a
proportional hazards model fits survival at this time point. It is of note
that, unlike the Cox model, this estimator does not assume anything about the
survival function prior to time point
. The parameters in the above
model are estimated by solving the generalized estimating equations with
used as the
outcome variable. Let
. A logistic regression model for the propensity of the exposure is assumed and estimated. The doubly robust
estimator of the difference in survival probability at time
is
Although Wang proposed a variance estimator, it was found to be biased upward, and they suggest instead using bootstrap variance estimation. Thus, we implement the bootstrap variance estimator.
A list with estimated survival probabilities in each group, their difference, and estimated standard errors.
Wang, J. A simple, doubly robust, efficient estimator for survival functions using pseudo observations. Pharmaceutical Statistics. 2018; 17: 38–48. https://doi.org/10.1002/pst.1834
df <- rotterdam df$time <- pmin(df$rtime, df$dtime) / 365.25 df$status <- ifelse(df$recur == 1 | df$death == 1, 1, 0) df$censor <- 1 - df$status drFit <- pseudo_survdiff( oformula = Surv(time, status) ~ chemo + year + age + meno + size + factor(grade) + nodes + pgr + er + hormon, cformula = ~ chemo + meno + size, eformula = chemo ~ year + age + meno + size + factor(grade) + nodes + pgr + er + hormon, times = c(2.5, 5, 7.5), data = df ) drFit
df <- rotterdam df$time <- pmin(df$rtime, df$dtime) / 365.25 df$status <- ifelse(df$recur == 1 | df$death == 1, 1, 0) df$censor <- 1 - df$status drFit <- pseudo_survdiff( oformula = Surv(time, status) ~ chemo + year + age + meno + size + factor(grade) + nodes + pgr + er + hormon, cformula = ~ chemo + meno + size, eformula = chemo ~ year + age + meno + size + factor(grade) + nodes + pgr + er + hormon, times = c(2.5, 5, 7.5), data = df ) drFit
Rotterdam dataset from survival
rotterdam
rotterdam
A data frame with 2982 rows and 15 variables See rotterdam
Doubly robust estimation of the survival curves in two exposure groups and their difference using the method of Sjolander and Vansteelandt (2017) .
sjovan_survdiff( oformula = NULL, ofunc = "survreg", oarg = list(), cformula = NULL, cfunc = "survreg", carg = list(), eformula = NULL, earg = list(), method = "DR", times = NULL, rel.tol = .Machine$double.eps^0.1, jacobian.method = "simple", se.type = "none", data = NULL, weights = NULL, subset = NULL, R = 50, parallel = "no", cl = NULL, ncpus = NULL )
sjovan_survdiff( oformula = NULL, ofunc = "survreg", oarg = list(), cformula = NULL, cfunc = "survreg", carg = list(), eformula = NULL, earg = list(), method = "DR", times = NULL, rel.tol = .Machine$double.eps^0.1, jacobian.method = "simple", se.type = "none", data = NULL, weights = NULL, subset = NULL, R = 50, parallel = "no", cl = NULL, ncpus = NULL )
oformula |
The outcome formula, the left side should be a call to Surv |
ofunc |
The model type for the outcome, one of "coxph" or "survreg" |
oarg |
Arguments passed to |
cformula |
The censoring model formula, the left side should be a call to Surv |
cfunc |
The model type for the censoring model, either "coxph" or "survreg" |
carg |
Arguments passed to |
eformula |
The exposure model formula, which will be fit by logistic regression in a call to glm |
earg |
Arguments passed to the glm for the exposure model |
method |
Estimation method, either "IPW" or "DR" |
times |
Vector of times at which to estimate the survival curves |
rel.tol |
Convergence tolerance |
jacobian.method |
Method of computing the jacobian, passed to jacobian |
se.type |
Method of calculating standard errors, either "none" for no standard errors, "sandwich", or "boot" |
data |
Data frame in which to find the variables in the model formulas |
weights |
Vector of case weights |
subset |
Logical vector |
R |
Number of bootstrap replicates, used if |
parallel |
Parallel processing for bootstrap, see boot |
cl |
Optional cluster if parallel processing, see boot |
ncpus |
Number of cpus to use for parallel processing, see boot |
Following Bai et al. 2013 and Sjölander and Vansteelandt 2017, consider the
following estimating equation for :
where ,
is a model for
, and
is the martingale increment for the censoring distribution. The
above is an unbiased estimating equation for
if either
is correctly specified for both censoring and
confounding, i.e., is a correctly specified model for
or both
and
are correctly specified for
censoring and confounding, respectively. To obtain estimates of the
difference in survival probabilities, one must specify models for the unknown
functions
,
, and
, get estimates of those, plug them
into the estimating equations, and solve for
under
. In this package, one can use semiparametric Cox models or
parametric survival models for the outcome and the censoring distributions,
and logistic regression for the propensity score model. Bai et al. 2013
provide an expression for a variance estimator that accounts for the
uncertainty due to the estimation of the propensity score
and the
censoring distribution
.
A list with the estimates survival probabilities in each group, their difference, and standard errors, if requested
Xiaofei Bai, Anastasios A Tsiatis, and Sean~M O'Brien. "Doubly-robust estimators of treatment-specific survival distributions in observational studies with stratified sampling." Biometrics, 69(4):830–839, 2013.
Sjölander, Arvid, and Stijn Vansteelandt. "Doubly robust estimation of attributable fractions in survival analysis." Statistical methods in medical research 26(2):948-969, 2017.
df <- rotterdam df$time <- pmin(df$rtime, df$dtime) / 365.25 df$status <- ifelse(df$recur == 1 | df$death == 1, 1, 0) df$censor <- 1 - df$status drFit <- sjovan_survdiff( oformula = Surv(time, status) ~ chemo + year + age + meno + size + factor(grade) + nodes + pgr + er + hormon, ofunc = "survreg", cformula = Surv(time, censor) ~ chemo + year + age, cfunc = "survreg", eformula = chemo ~ year + age + meno + size + factor(grade) + nodes + pgr + er + hormon, method = "DR", times = c(2.5, 5, 7.5), se.type = "sandwich", data = df ) drFit
df <- rotterdam df$time <- pmin(df$rtime, df$dtime) / 365.25 df$status <- ifelse(df$recur == 1 | df$death == 1, 1, 0) df$censor <- 1 - df$status drFit <- sjovan_survdiff( oformula = Surv(time, status) ~ chemo + year + age + meno + size + factor(grade) + nodes + pgr + er + hormon, ofunc = "survreg", cformula = Surv(time, censor) ~ chemo + year + age, cfunc = "survreg", eformula = chemo ~ year + age + meno + size + factor(grade) + nodes + pgr + er + hormon, method = "DR", times = c(2.5, 5, 7.5), se.type = "sandwich", data = df ) drFit