| Title: | Regression Standardization for Causal Inference |
|---|---|
| Description: | Contains more modern tools for causal inference using regression standardization. Four general classes of models are implemented; generalized linear models, conditional generalized estimating equation models, Cox proportional hazards models, and shared frailty gamma-Weibull models. Methodological details are described in Sjölander, A. (2016) <doi:10.1007/s10654-016-0157-3>. Also includes functionality for doubly robust estimation for generalized linear models in some special cases, and the ability to implement custom models. |
| Authors: | Michael C Sachs [aut, cre], Arvid Sjölander [aut], Erin E Gabriel [aut], Johan Sebastian Ohlendorff [aut], Adam Brand [aut] |
| Maintainer: | Michael C Sachs <[email protected]> |
| License: | AGPL (>= 3) |
| Version: | 1.0.8 |
| Built: | 2026-05-28 15:03:29 UTC |
| Source: | https://github.com/sachsmc/stdReg2 |
Contains more modern tools for causal inference using regression standardization. Four general classes of models are implemented; generalized linear models, conditional generalized estimating equation models, Cox proportional hazards models, and shared frailty gamma-Weibull models. Methodological details are described in Sjölander, A. (2016) doi:10.1007/s10654-016-0157-3. Also includes functionality for doubly robust estimation for generalized linear models in some special cases, and the ability to implement custom models.
Maintainer: Michael C Sachs [email protected]
Authors:
Arvid Sjölander
Erin E Gabriel
Johan Sebastian Ohlendorff
Adam Brand
Useful links:
Report bugs at https://github.com/sachsmc/stdReg2/issues
This dataset is borrowed from "An introduction to Stata for health reserachers" (Juul and Frydenberg, 2010). The dataset contains data on 189 mothers who have given birth to one or several children. In total, the dataset contains data on 487 births.
data(clslowbwt)data(clslowbwt)
The dataset is structured so that each row corresponds to one birth/child. It contains the following variables:
the identification number of the mother.
the number of the birth, i.e. "1" for the mother's first birth, "2" for the mother's second birth etc.
a categorical variable indicating if the mother is a smoker or not with levels "0. No" and "1. Yes".
the race of the mother with levels "1. White", "2. Black" or "3. Other".
the age of the mother at childbirth.
weight of the mother at last menstruational period (in pounds).
birthweight of the newborn.
a categorical variable indicating if the newborn is categorized as a low birthweight baby (<2500 grams) or not with levels "0. No" and "1. Yes".
a numeric indicator if the mother is a smoker or not. Recoded version of the variable "smoke" where "0.No" is recoded as "0" and "1.Yes" is recoded as "1".
a numeric indicator of whether the newborn is categorized as a low birthweight baby (<2500 grams) or not. Recoded version of the variable "low" where "0.No" is recoded as "0" and "1.Yes" is recoded as "1".
The following changes have been made to the original data in Juul & Frydenberg (2010):
- The variable "low" is recoded into the numeric indicator variable "lbw":
clslowbwt$lbw <- as.numeric(clslowbwt$low == "1. Yes")
- The variable "smoke" is recoded into the numeric indicator variable "smoker":
clslowbwt$smoker <- as.numeric(clslowbwt$smoke == "1. Yes")
Juul, Svend & Frydenberg, Morten (2010). An introduction to Stata for health researchers, Texas, Stata press, 2010 (Third edition).
parfrailty fits shared frailty gamma-Weibull models. It is
specifically designed to work with the function standardize_parfrailty, which
performs regression standardization in shared frailty gamma-Weibull models.
parfrailty(formula, data, clusterid, init)parfrailty(formula, data, clusterid, init)
formula |
an object of class " |
data |
a data frame containing the variables in the model. |
clusterid |
a string containing the name of a cluster identification variable. |
init |
an optional vector of initial values for the model parameters. |
parfrailty fits the shared frailty gamma-Weibull model
where and are the survival time and covariate
vector for subject in cluster , respectively.
is the Weibull baseline hazard function
where is the shape
parameter and is the scale parameter. is the
unobserved frailty term for cluster , which is assumed to have a
gamma distribution with scale = 1/shape = . is
the regression function as specified by the formula argument,
parameterized by a vector . The ML estimates
are
obtained by maximizing the marginal (over ) likelihood.
An object of class "parfrailty" which is a list containing:
est |
the Maximum Likelihood (ML) estimates |
vcov |
the variance-covariance vector of the ML estimates. |
score |
a matrix containing the cluster-specific contributions to the ML score equations. |
If left truncation is present, it is assumed that it is strong left truncation. This means that even if the truncation time may be subject-specific, the whole cluster is unobserved if at least one subject in the cluster dies before his/her truncation time. If all subjects in the cluster survive beyond their subject-specific truncation times, then the whole cluster is observed (Van den Berg and Drepper, 2016).
Arvid Sjölander and Elisabeth Dahlqwist.
Dahlqwist E., Pawitan Y., Sjölander A. (2019). Regression standardization and attributable fraction estimation with between-within frailty models for clustered survival data. Statistical Methods in Medical Research 28(2), 462-485.
Van den Berg G.J., Drepper B. (2016). Inference for shared frailty survival models with left-truncated data. Econometric Reviews, 35(6), 1075-1098.
require(survival) # simulate data set.seed(5) n <- 200 m <- 3 alpha <- 1.5 eta <- 1 phi <- 0.5 beta <- 1 id <- rep(1:n, each = m) U <- rep(rgamma(n, shape = 1 / phi, scale = phi), each = m) X <- rnorm(n * m) # reparameterize scale as in rweibull function weibull.scale <- alpha / (U * exp(beta * X))^(1 / eta) T <- rweibull(n * m, shape = eta, scale = weibull.scale) # right censoring C <- runif(n * m, 0, 10) D <- as.numeric(T < C) T <- pmin(T, C) # strong left-truncation L <- runif(n * m, 0, 2) incl <- T > L incl <- ave(x = incl, id, FUN = sum) == m dd <- data.frame(L, T, D, X, id) dd <- dd[incl, ] fit <- parfrailty(formula = Surv(L, T, D) ~ X, data = dd, clusterid = "id") print(fit)require(survival) # simulate data set.seed(5) n <- 200 m <- 3 alpha <- 1.5 eta <- 1 phi <- 0.5 beta <- 1 id <- rep(1:n, each = m) U <- rep(rgamma(n, shape = 1 / phi, scale = phi), each = m) X <- rnorm(n * m) # reparameterize scale as in rweibull function weibull.scale <- alpha / (U * exp(beta * X))^(1 / eta) T <- rweibull(n * m, shape = eta, scale = weibull.scale) # right censoring C <- runif(n * m, 0, 10) D <- as.numeric(T < C) T <- pmin(T, C) # strong left-truncation L <- runif(n * m, 0, 2) incl <- T > L incl <- ave(x = incl, id, FUN = sum) == m dd <- data.frame(L, T, D, X, id) dd <- dd[incl, ] fit <- parfrailty(formula = Surv(L, T, D) ~ X, data = dd, clusterid = "id") print(fit)
This is a plot method for class "std_glm".
## S3 method for class 'std_glm' plot( x, plot_ci = TRUE, ci_type = "plain", ci_level = 0.95, transform = NULL, contrast = NULL, reference = NULL, summary_fun = "summary_std_glm", ... )## S3 method for class 'std_glm' plot( x, plot_ci = TRUE, ci_type = "plain", ci_level = 0.95, transform = NULL, contrast = NULL, reference = NULL, summary_fun = "summary_std_glm", ... )
x |
An object of class |
plot_ci |
if |
ci_type |
A string, indicating the type of confidence intervals. Either "plain", which gives untransformed intervals, or "log", which gives log-transformed intervals. |
ci_level |
Coverage probability of confidence intervals. |
transform |
If set to |
contrast |
If set to |
reference |
If |
summary_fun |
For internal use only. Do not change. |
... |
Unused. |
None. Creates a plot as a side effect
# see standardize_glm# see standardize_glm
This is a plot method for class "std_surv".
## S3 method for class 'std_surv' plot( x, plot_ci = TRUE, ci_type = "plain", ci_level = 0.95, transform = NULL, contrast = NULL, reference = NULL, legendpos = "bottomleft", summary_fun = "summary_std_coxph", ... )## S3 method for class 'std_surv' plot( x, plot_ci = TRUE, ci_type = "plain", ci_level = 0.95, transform = NULL, contrast = NULL, reference = NULL, legendpos = "bottomleft", summary_fun = "summary_std_coxph", ... )
x |
An object of class |
plot_ci |
if |
ci_type |
A string, indicating the type of confidence intervals. Either "plain", which gives untransformed intervals, or "log", which gives log-transformed intervals. |
ci_level |
Coverage probability of confidence intervals. |
transform |
If set to |
contrast |
If set to |
reference |
If |
legendpos |
position of the legend; see legend. |
summary_fun |
For internal use only. Do not change. |
... |
Unused. |
None. Creates a plot as a side effect
Prints summary of regression standardization fit
## S3 method for class 'std_surv' print(x, ...) ## S3 method for class 'std_glm' print(x, ...) ## S3 method for class 'std_custom' print(x, ...)## S3 method for class 'std_surv' print(x, ...) ## S3 method for class 'std_glm' print(x, ...) ## S3 method for class 'std_custom' print(x, ...)
x |
an object of class |
... |
unused |
The object being printed, invisibly.
Print method for parametric frailty fits
## S3 method for class 'summary.parfrailty' print(x, digits = max(3L, getOption("digits") - 3L), ...)## S3 method for class 'summary.parfrailty' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
An object of class "parfrailty" |
digits |
Number of digits to print |
... |
Not used |
The object being printed, invisibly
Compute the sandwich variance components from a model fit
sandwich(fit, data, weights, t, fit.detail)sandwich(fit, data, weights, t, fit.detail)
fit |
|
data |
The data used to fit the model |
weights |
Optional weights |
t |
Optional fixed time point for survival objects |
fit.detail |
For Cox models, the result of running coxph.detail on the model fit |
A list consisting of the Fisher information matrix (I) and the Score equations (U)
This dataset is borrowed from "Aetiological factors in oesophageal cancer in Singapore Chinese" by De Jong UW, Breslow N, Hong JG, Sridharan M, Shanmugaratnam K (1974).
data(singapore)data(singapore)
The dataset contains the following variables:
age of the patient.
dialect group where 1 represent "Hokhien/Teochew" and 0 represent "Cantonese/Other".
a numeric indicator of whether the patient consumes Samsu wine or not.
number of cigarettes smoked per day.
number of beverage at "burning hot" temperatures ranging between 0 to 3 different drinks per day.
a numeric indicator of whether the patients ever drinks "burning hot beverage" or not. Recoded from the variable "Bev".
matched set identification number.
a numeric variable where 1 represent if the patient is a case, 2 represent if the patient is a control from the same ward as the case and 3 represent if the patient is control from orthopedic hospital.
a numeric indicator variable of whether the patient is a case of oesophageal cancer or not.
The following changes have been made to the data from the original data in De Jong UW (1974):
- The variable "Bev" is recoded into the numeric indicator variable "Everhotbev":
singapore$Everhotbev <- ifelse(singapore$Bev >= 1, 1, 0)
De Jong UW, Breslow N, Hong JG, Sridharan M, Shanmugaratnam K. (1974). Aetiological factors in oesophageal cancer in Singapore Chinese. Int J Cancer Mar 15;13(3), 291-303.
Get standardized estimates using the g-formula with a custom model
standardize( fitter, arguments, predict_fun, data, values, B = NULL, ci_level = 0.95, contrasts = NULL, reference = NULL, seed = NULL, times = NULL, transforms = NULL, progressbar = TRUE )standardize( fitter, arguments, predict_fun, data, values, B = NULL, ci_level = 0.95, contrasts = NULL, reference = NULL, seed = NULL, times = NULL, transforms = NULL, progressbar = TRUE )
fitter |
The function to call to fit the data. |
arguments |
The arguments to be used in the fitter function as a |
predict_fun |
The function used to predict the means/probabilities for a new data set on the response level. For survival data, this should be a matrix where each column is the time, and each row the data. |
data |
The data. |
values |
A named list or data.frame specifying the variables and values at which marginal means of the outcome will be estimated. |
B |
Number of nonparametric bootstrap resamples. Default is |
ci_level |
Coverage probability of confidence intervals. |
contrasts |
A vector of contrasts in the following format:
If set to |
reference |
A vector of reference levels in the following format:
If |
seed |
The seed to use with the nonparametric bootstrap. |
times |
For use with survival data. Set to |
transforms |
A vector of transforms in the following format:
If set to |
progressbar |
Logical, if TRUE will print bootstrapping progress to the console |
Let , , and be the outcome, the exposure, and a
vector of covariates, respectively.
standardize uses a
model to estimate the standardized
mean ,
where is a specific value of ,
and the outer expectation is over the marginal distribution of .
With survival data, ,
and a vector of different time points times () can be given,
where is the uncensored survival time.
Note that the nonparametric bootstrap may not provide valid inference if the outcome model is data-adaptive, e.g., based on machine learning algorithms. In such situations alternative inference methods may be required.
An object of class std_custom. Obtain numeric results using tidy.std_custom.
This is a list with the following components:
An unnamed list with one element for each of the requested contrasts. Each element is itself a list with the elements:
The number of bootstrap replicates
Estimated counterfactual means and standard errors for each exposure level
The estimated regression model for the outcome
A list of estimates, one for each bootstrap resample
A character vector of the exposure variable names
The vector of times at which the calculation is done, if relevant
Data.frame of the estimates of the contrast with inference
The transform argument used for this contrast
The requested contrast type
The reference level of the exposure
Confidence interval level
A named list with the elements:
The number of bootstrap replicates
Estimated counterfactual means and standard errors for each exposure level
The estimated regression model for the outcome
A list of estimates, one for each bootstrap resample
A character vector of the exposure variable names
The vector of times at which the calculation is done, if relevant
Rothman K.J., Greenland S., Lash T.L. (2008). Modern Epidemiology, 3rd edition. Lippincott, Williams & Wilkins.
Sjölander A. (2016). Regression standardization with the R-package stdReg. European Journal of Epidemiology 31(6), 563-574.
Sjölander A. (2016). Estimation of causal effect measures with the R-package stdReg. European Journal of Epidemiology 33(9), 847-858.
set.seed(6) n <- 100 Z <- rnorm(n) X <- rnorm(n, mean = Z) Y <- rbinom(n, 1, prob = (1 + exp(X + Z))^(-1)) dd <- data.frame(Z, X, Y) prob_predict.glm <- function(...) predict.glm(..., type = "response") x <- standardize( fitter = "glm", arguments = list( formula = Y ~ X * Z, family = "binomial" ), predict_fun = prob_predict.glm, data = dd, values = list(X = seq(-1, 1, 0.1)), B = 100, reference = 0, contrasts = "difference" ) x require(survival) prob_predict.coxph <- function(object, newdata, times) { fit.detail <- suppressWarnings(basehaz(object)) cum.haz <- fit.detail$hazard[sapply(times, function(x) max(which(fit.detail$time <= x)))] predX <- predict(object = object, newdata = newdata, type = "risk") res <- matrix(NA, ncol = length(times), nrow = length(predX)) for (ti in seq_len(length(times))) { res[, ti] <- exp(-predX * cum.haz[ti]) } res } set.seed(68) n <- 500 Z <- rnorm(n) X <- rnorm(n, mean = Z) T <- rexp(n, rate = exp(X + Z + X * Z)) # survival time C <- rexp(n, rate = exp(X + Z + X * Z)) # censoring time U <- pmin(T, C) # time at risk D <- as.numeric(T < C) # event indicator dd <- data.frame(Z, X, U, D) x <- standardize( fitter = "coxph", arguments = list( formula = Surv(U, D) ~ X + Z + X * Z, method = "breslow", x = TRUE, y = TRUE ), predict_fun = prob_predict.coxph, data = dd, times = 1:5, values = list(X = c(-1, 0, 1)), B = 100, reference = 0, contrasts = "difference" ) xset.seed(6) n <- 100 Z <- rnorm(n) X <- rnorm(n, mean = Z) Y <- rbinom(n, 1, prob = (1 + exp(X + Z))^(-1)) dd <- data.frame(Z, X, Y) prob_predict.glm <- function(...) predict.glm(..., type = "response") x <- standardize( fitter = "glm", arguments = list( formula = Y ~ X * Z, family = "binomial" ), predict_fun = prob_predict.glm, data = dd, values = list(X = seq(-1, 1, 0.1)), B = 100, reference = 0, contrasts = "difference" ) x require(survival) prob_predict.coxph <- function(object, newdata, times) { fit.detail <- suppressWarnings(basehaz(object)) cum.haz <- fit.detail$hazard[sapply(times, function(x) max(which(fit.detail$time <= x)))] predX <- predict(object = object, newdata = newdata, type = "risk") res <- matrix(NA, ncol = length(times), nrow = length(predX)) for (ti in seq_len(length(times))) { res[, ti] <- exp(-predX * cum.haz[ti]) } res } set.seed(68) n <- 500 Z <- rnorm(n) X <- rnorm(n, mean = Z) T <- rexp(n, rate = exp(X + Z + X * Z)) # survival time C <- rexp(n, rate = exp(X + Z + X * Z)) # censoring time U <- pmin(T, C) # time at risk D <- as.numeric(T < C) # event indicator dd <- data.frame(Z, X, U, D) x <- standardize( fitter = "coxph", arguments = list( formula = Surv(U, D) ~ X + Z + X * Z, method = "breslow", x = TRUE, y = TRUE ), predict_fun = prob_predict.coxph, data = dd, times = 1:5, values = list(X = c(-1, 0, 1)), B = 100, reference = 0, contrasts = "difference" ) x
standardize_coxph performs regression standardization in Cox proportional
hazards models at specified values of the exposure over the sample
covariate distribution. Let , , and be the survival
outcome, the exposure, and a vector of covariates, respectively.
standardize_coxph fits a Cox proportional hazards model and the Breslow estimator
of the baseline hazard in order to estimate the
standardized survival function when measure = "survival" or the standardized restricted mean survival up to time t when measure = "rmean", where
is a specific value of , is a specific value of
, and the expectation is over the marginal distribution of .
standardize_coxph( formula, data, values, times, measure = c("survival", "rmean"), clusterid, ci_level = 0.95, ci_type = "plain", contrasts = NULL, family = "gaussian", reference = NULL, transforms = NULL )standardize_coxph( formula, data, values, times, measure = c("survival", "rmean"), clusterid, ci_level = 0.95, ci_type = "plain", contrasts = NULL, family = "gaussian", reference = NULL, transforms = NULL )
formula |
The formula which is used to fit the model for the outcome. |
data |
The data. |
values |
A named list or data.frame specifying the variables and values at which marginal means of the outcome will be estimated. |
times |
A vector containing the specific values of |
measure |
Either "survival" to estimate the survival function at times or "rmean" for the restricted mean survival up to the largest of times. |
clusterid |
An optional string containing the name of a cluster identification variable when data are clustered. |
ci_level |
Coverage probability of confidence intervals. |
ci_type |
A string, indicating the type of confidence intervals. Either "plain", which gives untransformed intervals, or "log", which gives log-transformed intervals. |
contrasts |
A vector of contrasts in the following format:
If set to |
family |
The family argument which is used to fit the glm model for the outcome. |
reference |
A vector of reference levels in the following format:
If |
transforms |
A vector of transforms in the following format:
If set to |
standardize_coxph fits the Cox proportional hazards model
Breslow's estimator of the cumulative baseline hazard
is used together with the partial
likelihood estimate of to obtain estimates of the survival
function if measure = "survival":
For each in the t argument and for each in the
x argument, these estimates are averaged across all subjects (i.e.
all observed values of ) to produce estimates
where
is the value of for subject , . The variance
for is obtained by the sandwich formula.
If measure = "rmean", then
is used together with the partial
likelihood estimate of to obtain estimates of the restricted mean survival
up to time t: for each element of times. The estimation
and inference is done using the method described in Chen and Tsiatis 2001.
Currently, we can only estimate the difference in RMST for a single binary
exposure. Two separate Cox models are fit for each level of the exposure,
which is expected to be coded as 0/1.
An object of class std_surv. Obtain numeric results by using tidy.std_surv.
This is a list with the following components:
An unnamed list with one element for each of the requested contrasts. Each element is itself a list with the elements:
The function call
A list with components used in the estimation
Either "survival" or "rmean"
Estimated counterfactual means and standard errors for each exposure level
Estimated covariance matrix of counterfactual means for each time
Data.frame of the estimates of the contrast with inference
The vector of times used in the calculation
The transform argument used for this contrast
The requested contrast type
The reference level of the exposure
Confidence interval type
Confidence interval level
A named list with the elements:
The function call
A list with components used in the estimation
Either "survival" or "rmean"
Estimated counterfactual means and standard errors for each exposure level
Estimated covariance matrix of counterfactual means for each time
Standardized survival functions are sometimes referred to as (direct) adjusted survival functions in the literature.
standardize_coxph/standardize_parfrailty does not currently handle time-varying exposures or
covariates.
standardize_coxph/standardize_parfrailty internally loops over all values in the t argument.
Therefore, the function will usually be considerably faster if
length(t) is small.
The variance calculation performed by standardize_coxph does not condition on
the observed covariates . To see how this
matters, note that
The usual parameter in a Cox proportional hazards model does not
depend on . Thus, is independent
of as well (since ), so that
the term in the corresponding variance
decomposition for becomes equal to 0. However,
depends on through the average over the
sample distribution for , and thus the term
is not 0, unless one conditions on
. The variance calculation by Gail and Byar (1986) ignores this
term, and thus effectively conditions on .
Arvid Sjölander, Adam Brand, Michael Sachs
Chang I.M., Gelman G., Pagano M. (1982). Corrected group prognostic curves and summary statistics. Journal of Chronic Diseases 35, 669-674.
Gail M.H. and Byar D.P. (1986). Variance calculations for direct adjusted survival curves, with applications to testing for no treatment effect. Biometrical Journal 28(5), 587-599.
Makuch R.W. (1982). Adjusted survival curve estimation using covariates. Journal of Chronic Diseases 35, 437-443.
Sjölander A. (2016). Regression standardization with the R-package stdReg. European Journal of Epidemiology 31(6), 563-574.
Sjölander A. (2018). Estimation of causal effect measures with the R-package stdReg. European Journal of Epidemiology 33(9), 847-858.
Chen, P. Y., Tsiatis, A. A. (2001). Causal inference on the difference of the restricted mean lifetime between two groups. Biometrics, 57(4), 1030-1038.
require(survival) set.seed(7) n <- 300 Z <- rnorm(n) Zbin <- rbinom(n, 1, .3) X <- rnorm(n, mean = Z) T <- rexp(n, rate = exp(X + Z + X * Z)) # survival time C <- rexp(n, rate = exp(X + Z + X * Z)) # censoring time fact <- factor(sample(letters[1:3], n, replace = TRUE)) U <- pmin(T, C) # time at risk D <- as.numeric(T < C) # event indicator dd <- data.frame(Z, Zbin, X, U, D, fact) fit.std.surv <- standardize_coxph( formula = Surv(U, D) ~ X + Z + X * Z, data = dd, values = list(X = seq(-1, 1, 0.5)), times = 1:5 ) print(fit.std.surv) plot(fit.std.surv) tidy(fit.std.surv) fit.std <- standardize_coxph( formula = Surv(U, D) ~ X + Zbin + X * Zbin + fact, data = dd, values = list(Zbin = 0:1), times = 1.5, measure = "rmean", contrast = "difference", reference = 0 ) print(fit.std) tidy(fit.std)require(survival) set.seed(7) n <- 300 Z <- rnorm(n) Zbin <- rbinom(n, 1, .3) X <- rnorm(n, mean = Z) T <- rexp(n, rate = exp(X + Z + X * Z)) # survival time C <- rexp(n, rate = exp(X + Z + X * Z)) # censoring time fact <- factor(sample(letters[1:3], n, replace = TRUE)) U <- pmin(T, C) # time at risk D <- as.numeric(T < C) # event indicator dd <- data.frame(Z, Zbin, X, U, D, fact) fit.std.surv <- standardize_coxph( formula = Surv(U, D) ~ X + Z + X * Z, data = dd, values = list(X = seq(-1, 1, 0.5)), times = 1:5 ) print(fit.std.surv) plot(fit.std.surv) tidy(fit.std.surv) fit.std <- standardize_coxph( formula = Surv(U, D) ~ X + Zbin + X * Zbin + fact, data = dd, values = list(Zbin = 0:1), times = 1.5, measure = "rmean", contrast = "difference", reference = 0 ) print(fit.std) tidy(fit.std)
standardize_gee performs regression standardization in linear and log-linear
fixed effects models, at specified values of the exposure, over the sample
covariate distribution. Let , , and be the outcome,
the exposure, and a vector of covariates, respectively. It is assumed that
data are clustered with a cluster indicator . standardize_gee uses
fitted fixed effects model, with cluster-specific intercept (see
details), to estimate the standardized mean
, where is a specific value of
, and the outer expectation is over the marginal distribution of
.
standardize_gee( formula, link = "identity", data, values, clusterid, case_control = FALSE, ci_level = 0.95, ci_type = "plain", contrasts = NULL, family = "gaussian", reference = NULL, transforms = NULL )standardize_gee( formula, link = "identity", data, values, clusterid, case_control = FALSE, ci_level = 0.95, ci_type = "plain", contrasts = NULL, family = "gaussian", reference = NULL, transforms = NULL )
formula |
A formula to be used with |
link |
The link function to be used with |
data |
The data. |
values |
A named list or data.frame specifying the variables and values at which marginal means of the outcome will be estimated. |
clusterid |
An optional string containing the name of a cluster identification variable when data are clustered. |
case_control |
Whether the data comes from a case-control study. |
ci_level |
Coverage probability of confidence intervals. |
ci_type |
A string, indicating the type of confidence intervals. Either "plain", which gives untransformed intervals, or "log", which gives log-transformed intervals. |
contrasts |
A vector of contrasts in the following format:
If set to |
family |
The family argument which is used to fit the glm model for the outcome. |
reference |
A vector of reference levels in the following format:
If |
transforms |
A vector of transforms in the following format:
If set to |
standardize_gee assumes that a fixed effects model
has been fitted. The link
function is assumed to be the identity link or the log link. The
conditional generalized estimating equation (CGEE) estimate of
is used to obtain estimates of the cluster-specific means:
where
if is the
identity link, and
if is the log link, and is the value of
for subject in cluster , ,
. The CGEE estimate of and the estimate of
are used to estimate the mean :
For
each in the x argument, these estimates are averaged across
all subjects (i.e. all observed values of and all estimated values
of ) to produce estimates
where .
The variance for is obtained by the sandwich formula.
An object of class std_glm. Obtain numeric results in a data frame with the tidy.std_glm function.
This is a list with the following components:
An unnamed list with one element for each of the requested contrasts. Each element is itself a list with the elements:
Estimated counterfactual means and standard errors for each exposure level
Estimated covariance matrix of counterfactual means
The estimated regression model for the outcome
The estimated exposure model
A character vector of the exposure variable names
Data.frame of the estimates of the contrast with inference
The transform argument used for this contrast
The requested contrast type
The reference level of the exposure
Confidence interval type
Confidence interval level
A named list with the elements:
Estimated counterfactual means and standard errors for each exposure level
Estimated covariance matrix of counterfactual means
The estimated regression model for the outcome
The estimated exposure model
A character vector of the exposure variable names
The variance calculation performed by standardize_gee does not condition
on the observed covariates . To see how
this matters, note that
The usual parameter in a generalized linear model does not
depend on . Thus, is independent
of as well (since ), so that
the term in the corresponding variance
decomposition for becomes equal to 0. However,
depends on through the average over the sample
distribution for , and thus the term
is not 0, unless one conditions on
.
Arvid Sjölander.
Goetgeluk S. and Vansteelandt S. (2008). Conditional generalized estimating equations for the analysis of clustered and longitudinal data. Biometrics 64(3), 772-780.
Martin R.S. (2017). Estimation of average marginal effects in multiplicative unobserved effects panel models. Economics Letters 160, 16-19.
Sjölander A. (2019). Estimation of marginal causal effects in the presence of confounding by cluster. Biostatistics doi: 10.1093/biostatistics/kxz054
require(drgee) set.seed(4) n <- 300 ni <- 2 id <- rep(1:n, each = ni) ai <- rep(rnorm(n), each = ni) Z <- rnorm(n * ni) X <- rnorm(n * ni, mean = ai + Z) Y <- rnorm(n * ni, mean = ai + X + Z + 0.1 * X^2) dd <- data.frame(id, Z, X, Y) fit.std <- standardize_gee( formula = Y ~ X + Z + I(X^2), link = "identity", data = dd, values = list(X = seq(-3, 3, 0.5)), clusterid = "id" ) print(fit.std) plot(fit.std)require(drgee) set.seed(4) n <- 300 ni <- 2 id <- rep(1:n, each = ni) ai <- rep(rnorm(n), each = ni) Z <- rnorm(n * ni) X <- rnorm(n * ni, mean = ai + Z) Y <- rnorm(n * ni, mean = ai + X + Z + 0.1 * X^2) dd <- data.frame(id, Z, X, Y) fit.std <- standardize_gee( formula = Y ~ X + Z + I(X^2), link = "identity", data = dd, values = list(X = seq(-3, 3, 0.5)), clusterid = "id" ) print(fit.std) plot(fit.std)
Get regression standardized estimates from a glm
standardize_glm( formula, data, values, clusterid, matched_density_cases, matched_density_controls, matching_variable, p_population, case_control = FALSE, ci_level = 0.95, ci_type = "plain", contrasts = NULL, family = "gaussian", reference = NULL, transforms = NULL )standardize_glm( formula, data, values, clusterid, matched_density_cases, matched_density_controls, matching_variable, p_population, case_control = FALSE, ci_level = 0.95, ci_type = "plain", contrasts = NULL, family = "gaussian", reference = NULL, transforms = NULL )
formula |
The formula which is used to fit the model for the outcome. |
data |
The data. |
values |
A named list or data.frame specifying the variables and values at which marginal means of the outcome will be estimated. |
clusterid |
An optional string containing the name of a cluster identification variable when data are clustered. |
matched_density_cases |
A function of the matching variable. The probability (or density) of the matched variable among the cases. |
matched_density_controls |
A function of the matching variable. The probability (or density) of the matched variable among the controls. |
matching_variable |
The matching variable extracted from the data set. |
p_population |
Specifies the incidence in the population when |
case_control |
Whether the data comes from a case-control study. |
ci_level |
Coverage probability of confidence intervals. |
ci_type |
A string, indicating the type of confidence intervals. Either "plain", which gives untransformed intervals, or "log", which gives log-transformed intervals. |
contrasts |
A vector of contrasts in the following format:
If set to |
family |
The family argument which is used to fit the glm model for the outcome. |
reference |
A vector of reference levels in the following format:
If |
transforms |
A vector of transforms in the following format:
If set to |
standardize_glm performs regression standardization
in generalized linear models,
at specified values of the exposure, over the sample covariate distribution.
Let , , and be the outcome, the exposure, and a
vector of covariates, respectively.
standardize_glm uses a fitted generalized linear
model to estimate the standardized
mean ,
where is a specific value of ,
and the outer expectation is over the marginal distribution of .
An object of class std_glm. Obtain numeric results in a data frame with the tidy.std_glm function.
This is a list with the following components:
An unnamed list with one element for each of the requested contrasts. Each element is itself a list with the elements:
Estimated counterfactual means and standard errors for each exposure level
Estimated covariance matrix of counterfactual means
The estimated regression model for the outcome
The estimated exposure model
A character vector of the exposure variable names
Data.frame of the estimates of the contrast with inference
The transform argument used for this contrast
The requested contrast type
The reference level of the exposure
Confidence interval type
Confidence interval level
A named list with the elements:
Estimated counterfactual means and standard errors for each exposure level
Estimated covariance matrix of counterfactual means
The estimated regression model for the outcome
The estimated exposure model
A character vector of the exposure variable names
Rothman K.J., Greenland S., Lash T.L. (2008). Modern Epidemiology, 3rd edition. Lippincott, Williams & Wilkins.
Sjölander A. (2016). Regression standardization with the R-package stdReg. European Journal of Epidemiology 31(6), 563-574.
Sjölander A. (2016). Estimation of causal effect measures with the R-package stdReg. European Journal of Epidemiology 33(9), 847-858.
# basic example # needs to correctly specify the outcome model and no unmeasered confounders # (+ standard causal assunmptions) set.seed(6) n <- 100 Z <- rnorm(n) X <- cut(rnorm(n, mean = Z), breaks = c(-Inf, 0, Inf), labels = c("low", "high")) Y <- rbinom(n, 1, prob = (1 + exp(as.numeric(X) + Z))^(-1)) dd <- data.frame(Z, X, Y) x <- standardize_glm( formula = Y ~ X * Z, family = "binomial", data = dd, values = list(X = c("low", "high")), contrasts = c("difference", "ratio"), reference = "low" ) x # different transformations of causal effects # example from Sjölander (2016) with case-control data # here the matching variable needs to be passed as an argument Mi <- singapore$Age m <- mean(Mi) s <- sd(Mi) d <- 5 standardize_glm( formula = Oesophagealcancer ~ (Everhotbev + Age + Dial + Samsu + Cigs)^2, family = binomial, data = singapore, values = list(Everhotbev = 0:1), clusterid = "Set", case_control = TRUE, matched_density_cases = function(x) dnorm(x, m, s), matched_density_controls = function(x) dnorm(x, m - d, s), matching_variable = Mi, p_population = 19.3 / 100000 ) # multiple exposures set.seed(7) n <- 100 Z <- rnorm(n) X1 <- rnorm(n, mean = Z) X2 <- rnorm(n) Y <- rbinom(n, 1, prob = (1 + exp(X1 + X2 + Z))^(-1)) dd <- data.frame(Z, X1, X2, Y) x <- standardize_glm( formula = Y ~ X1 + X2 + Z, family = "binomial", data = dd, values = list(X1 = 0:1, X2 = 0:1), contrasts = c("difference", "ratio"), reference = c(X1 = 0, X2 = 0) ) x tidy(x) # continuous exposure set.seed(2) n <- 100 Z <- rnorm(n) X <- rnorm(n, mean = Z) Y <- rnorm(n, mean = X + Z + 0.1 * X^2) dd <- data.frame(Z, X, Y) x <- standardize_glm( formula = Y ~ X * Z, family = "gaussian", data = dd, values = list(X = seq(-1, 1, 0.1)) ) # plot standardized mean as a function of x plot(x) # plot standardized mean - standardized mean at x = 0 as a function of x plot(x, contrast = "difference", reference = 0)# basic example # needs to correctly specify the outcome model and no unmeasered confounders # (+ standard causal assunmptions) set.seed(6) n <- 100 Z <- rnorm(n) X <- cut(rnorm(n, mean = Z), breaks = c(-Inf, 0, Inf), labels = c("low", "high")) Y <- rbinom(n, 1, prob = (1 + exp(as.numeric(X) + Z))^(-1)) dd <- data.frame(Z, X, Y) x <- standardize_glm( formula = Y ~ X * Z, family = "binomial", data = dd, values = list(X = c("low", "high")), contrasts = c("difference", "ratio"), reference = "low" ) x # different transformations of causal effects # example from Sjölander (2016) with case-control data # here the matching variable needs to be passed as an argument Mi <- singapore$Age m <- mean(Mi) s <- sd(Mi) d <- 5 standardize_glm( formula = Oesophagealcancer ~ (Everhotbev + Age + Dial + Samsu + Cigs)^2, family = binomial, data = singapore, values = list(Everhotbev = 0:1), clusterid = "Set", case_control = TRUE, matched_density_cases = function(x) dnorm(x, m, s), matched_density_controls = function(x) dnorm(x, m - d, s), matching_variable = Mi, p_population = 19.3 / 100000 ) # multiple exposures set.seed(7) n <- 100 Z <- rnorm(n) X1 <- rnorm(n, mean = Z) X2 <- rnorm(n) Y <- rbinom(n, 1, prob = (1 + exp(X1 + X2 + Z))^(-1)) dd <- data.frame(Z, X1, X2, Y) x <- standardize_glm( formula = Y ~ X1 + X2 + Z, family = "binomial", data = dd, values = list(X1 = 0:1, X2 = 0:1), contrasts = c("difference", "ratio"), reference = c(X1 = 0, X2 = 0) ) x tidy(x) # continuous exposure set.seed(2) n <- 100 Z <- rnorm(n) X <- rnorm(n, mean = Z) Y <- rnorm(n, mean = X + Z + 0.1 * X^2) dd <- data.frame(Z, X, Y) x <- standardize_glm( formula = Y ~ X * Z, family = "gaussian", data = dd, values = list(X = seq(-1, 1, 0.1)) ) # plot standardized mean as a function of x plot(x) # plot standardized mean - standardized mean at x = 0 as a function of x plot(x, contrast = "difference", reference = 0)
Get regression standardized doubly-robust estimates from a glm
standardize_glm_dr( formula_outcome, formula_exposure, data, values, ci_level = 0.95, ci_type = "plain", contrasts = NULL, family_outcome = "gaussian", family_exposure = "binomial", reference = NULL, transforms = NULL )standardize_glm_dr( formula_outcome, formula_exposure, data, values, ci_level = 0.95, ci_type = "plain", contrasts = NULL, family_outcome = "gaussian", family_exposure = "binomial", reference = NULL, transforms = NULL )
formula_outcome |
The formula which is used to fit the glm model for the outcome. |
formula_exposure |
The formula which is used to fit the glm model for the exposure.
If not |
data |
The data. |
values |
A named list or data.frame specifying the variables and values at which marginal means of the outcome will be estimated. |
ci_level |
Coverage probability of confidence intervals. |
ci_type |
A string, indicating the type of confidence intervals. Either "plain", which gives untransformed intervals, or "log", which gives log-transformed intervals. |
contrasts |
A vector of contrasts in the following format:
If set to |
family_outcome |
The family argument which is used to fit the glm model for the outcome. |
family_exposure |
The family argument which is used to fit the glm model for the exposure. |
reference |
A vector of reference levels in the following format:
If |
transforms |
A vector of transforms in the following format:
If set to |
standardize_glm_dr performs regression standardization
in generalized linear models, see e.g., documentation for standardize_glm_dr. Specifically,
this version uses a doubly robust estimator for standardization, meaning inference is valid
when either the outcome regression or the exposure model is correctly specified
and there is no unmeasured confounding.
An object of class std_glm. Obtain numeric results in a data frame with the tidy.std_glm function.
This is a list with the following components:
An unnamed list with one element for each of the requested contrasts. Each element is itself a list with the elements:
Estimated counterfactual means and standard errors for each exposure level
Estimated covariance matrix of counterfactual means
The estimated regression model for the outcome
The estimated exposure model
A character vector of the exposure variable names
Data.frame of the estimates of the contrast with inference
The transform argument used for this contrast
The requested contrast type
The reference level of the exposure
Confidence interval type
Confidence interval level
A named list with the elements:
Estimated counterfactual means and standard errors for each exposure level
Estimated covariance matrix of counterfactual means
The estimated regression model for the outcome
The estimated exposure model
A character vector of the exposure variable names
Gabriel E.E., Sachs, M.C., Martinussen T., Waernbaum I., Goetghebeur E., Vansteelandt S., Sjölander A. (2024), Inverse probability of treatment weighting with generalized linear outcome models for doubly robust estimation. Statistics in Medicine, 43(3):534–547.
# doubly robust estimator # needs to correctly specify either the outcome model or the exposure model # for confounding # NOTE: only works with binary exposures x <- standardize_glm_dr( formula_outcome = bwt ~ smoker * (race + age + lwt) + I(age^2) + I(lwt^2), formula_exposure = smoker ~ race * age * lwt + I(age^2) + I(lwt^2), family_outcome = "gaussian", family_exposure = "binomial", data = clslowbwt, values = list(smoker = c(0, 1)), contrasts = "difference", reference = 0 ) set.seed(6) n <- 100 Z <- rnorm(n) X <- rbinom(n, 1, prob = (1 + exp(Z))^(-1)) Y <- rbinom(n, 1, prob = (1 + exp(as.numeric(X) + Z))^(-1)) dd <- data.frame(Z, X, Y) x <- standardize_glm_dr( formula_outcome = Y ~ X * Z, formula_exposure = X ~ Z, family_outcome = "binomial", data = dd, values = list(X = 0:1), reference = 0, contrasts = c("difference"), transforms = c("odds") )# doubly robust estimator # needs to correctly specify either the outcome model or the exposure model # for confounding # NOTE: only works with binary exposures x <- standardize_glm_dr( formula_outcome = bwt ~ smoker * (race + age + lwt) + I(age^2) + I(lwt^2), formula_exposure = smoker ~ race * age * lwt + I(age^2) + I(lwt^2), family_outcome = "gaussian", family_exposure = "binomial", data = clslowbwt, values = list(smoker = c(0, 1)), contrasts = "difference", reference = 0 ) set.seed(6) n <- 100 Z <- rnorm(n) X <- rbinom(n, 1, prob = (1 + exp(Z))^(-1)) Y <- rbinom(n, 1, prob = (1 + exp(as.numeric(X) + Z))^(-1)) dd <- data.frame(Z, X, Y) x <- standardize_glm_dr( formula_outcome = Y ~ X * Z, formula_exposure = X ~ Z, family_outcome = "binomial", data = dd, values = list(X = 0:1), reference = 0, contrasts = c("difference"), transforms = c("odds") )
Get standardized estimates using the g-formula with and separate models for each exposure level in the data
standardize_level( fitter_list, arguments, predict_fun_list, data, values, B = NULL, ci_level = 0.95, contrasts = NULL, reference = NULL, seed = NULL, times = NULL, transforms = NULL, progressbar = TRUE )standardize_level( fitter_list, arguments, predict_fun_list, data, values, B = NULL, ci_level = 0.95, contrasts = NULL, reference = NULL, seed = NULL, times = NULL, transforms = NULL, progressbar = TRUE )
fitter_list |
The function to call to fit the data (as a list). |
arguments |
The arguments to be used in the fitter function as a |
predict_fun_list |
The function used to predict the means/probabilities for a new data set on the response level. For survival data, this should be a matrix where each column is the time, and each row the data (as a list). |
data |
The data. |
values |
A named list or data.frame specifying the variables and values at which marginal means of the outcome will be estimated. |
B |
Number of nonparametric bootstrap resamples. Default is |
ci_level |
Coverage probability of confidence intervals. |
contrasts |
A vector of contrasts in the following format:
If set to |
reference |
A vector of reference levels in the following format:
If |
seed |
The seed to use with the nonparametric bootstrap. |
times |
For use with survival data. Set to |
transforms |
A vector of transforms in the following format:
If set to |
progressbar |
Logical, if TRUE will print bootstrapping progress to the console |
See standardize. The difference is here that different models
can be fitted for each value of x in values.
An object of class std_custom. Obtain numeric results using tidy.std_custom.
This is a list with the following components:
An unnamed list with one element for each of the requested contrasts. Each element is itself a list with the elements:
The number of bootstrap replicates
Estimated counterfactual means and standard errors for each exposure level
The estimated regression model for the outcome
A list of estimates, one for each bootstrap resample
A character vector of the exposure variable names
The vector of times at which the calculation is done, if relevant
Data.frame of the estimates of the contrast with inference
The transform argument used for this contrast
The requested contrast type
The reference level of the exposure
Confidence interval level
A named list with the elements:
The number of bootstrap replicates
Estimated counterfactual means and standard errors for each exposure level
The estimated regression model for the outcome
A list of estimates, one for each bootstrap resample
A character vector of the exposure variable names
The vector of times at which the calculation is done, if relevant
Rothman K.J., Greenland S., Lash T.L. (2008). Modern Epidemiology, 3rd edition. Lippincott, Williams & Wilkins.
Sjölander A. (2016). Regression standardization with the R-package stdReg. European Journal of Epidemiology 31(6), 563-574.
Sjölander A. (2016). Estimation of causal effect measures with the R-package stdReg. European Journal of Epidemiology 33(9), 847-858.
require(survival) prob_predict.coxph <- function(object, newdata, times) { fit.detail <- suppressWarnings(basehaz(object)) cum.haz <- fit.detail$hazard[sapply(times, function(x) max(which(fit.detail$time <= x)))] predX <- predict(object = object, newdata = newdata, type = "risk") res <- matrix(NA, ncol = length(times), nrow = length(predX)) for (ti in seq_len(length(times))) { res[, ti] <- exp(-predX * cum.haz[ti]) } res } set.seed(68) n <- 500 Z <- rnorm(n) X <- rbinom(n, 1, prob = 0.5) T <- rexp(n, rate = exp(X + Z + X * Z)) # survival time C <- rexp(n, rate = exp(X + Z + X * Z)) # censoring time U <- pmin(T, C) # time at risk D <- as.numeric(T < C) # event indicator dd <- data.frame(Z, X, U, D) x <- standardize_level( fitter_list = list("coxph", "coxph"), arguments = list( list( formula = Surv(U, D) ~ X + Z + X * Z, method = "breslow", x = TRUE, y = TRUE ), list( formula = Surv(U, D) ~ X, method = "breslow", x = TRUE, y = TRUE ) ), predict_fun_list = list(prob_predict.coxph, prob_predict.coxph), data = dd, times = seq(1, 5, 0.1), values = list(X = c(0, 1)), B = 100, reference = 0, contrasts = "difference" ) print(x)require(survival) prob_predict.coxph <- function(object, newdata, times) { fit.detail <- suppressWarnings(basehaz(object)) cum.haz <- fit.detail$hazard[sapply(times, function(x) max(which(fit.detail$time <= x)))] predX <- predict(object = object, newdata = newdata, type = "risk") res <- matrix(NA, ncol = length(times), nrow = length(predX)) for (ti in seq_len(length(times))) { res[, ti] <- exp(-predX * cum.haz[ti]) } res } set.seed(68) n <- 500 Z <- rnorm(n) X <- rbinom(n, 1, prob = 0.5) T <- rexp(n, rate = exp(X + Z + X * Z)) # survival time C <- rexp(n, rate = exp(X + Z + X * Z)) # censoring time U <- pmin(T, C) # time at risk D <- as.numeric(T < C) # event indicator dd <- data.frame(Z, X, U, D) x <- standardize_level( fitter_list = list("coxph", "coxph"), arguments = list( list( formula = Surv(U, D) ~ X + Z + X * Z, method = "breslow", x = TRUE, y = TRUE ), list( formula = Surv(U, D) ~ X, method = "breslow", x = TRUE, y = TRUE ) ), predict_fun_list = list(prob_predict.coxph, prob_predict.coxph), data = dd, times = seq(1, 5, 0.1), values = list(X = c(0, 1)), B = 100, reference = 0, contrasts = "difference" ) print(x)
standardize_parfrailty performs regression standardization in shared frailty
gamma-Weibull models, at specified values of the exposure, over the sample
covariate distribution. Let , , and be the survival
outcome, the exposure, and a vector of covariates, respectively.
standardize_parfrailty fits a parametric frailty model to
estimate the standardized survival function
, where is a specific value of
, is a specific value of , and the expectation is over
the marginal distribution of .
standardize_parfrailty( formula, data, values, times, clusterid, ci_level = 0.95, ci_type = "plain", contrasts = NULL, family = "gaussian", reference = NULL, transforms = NULL )standardize_parfrailty( formula, data, values, times, clusterid, ci_level = 0.95, ci_type = "plain", contrasts = NULL, family = "gaussian", reference = NULL, transforms = NULL )
formula |
The formula which is used to fit the model for the outcome. |
data |
The data. |
values |
A named list or data.frame specifying the variables and values at which marginal means of the outcome will be estimated. |
times |
A vector containing the specific values of |
clusterid |
An optional string containing the name of a cluster identification variable when data are clustered. |
ci_level |
Coverage probability of confidence intervals. |
ci_type |
A string, indicating the type of confidence intervals. Either "plain", which gives untransformed intervals, or "log", which gives log-transformed intervals. |
contrasts |
A vector of contrasts in the following format:
If set to |
family |
The family argument which is used to fit the glm model for the outcome. |
reference |
A vector of reference levels in the following format:
If |
transforms |
A vector of transforms in the following format:
If set to |
standardize_parfrailty fits a shared frailty gamma-Weibull model
, with parameterization as described in the help section for parfrailty. Integrating out the gamma frailty gives the survival function
where is the cumulative baseline hazard
The ML estimates of
are used to obtain estimates of the survival function :
For each in the t argument and for each in the
x argument, these estimates are averaged across all subjects (i.e.
all observed values of ) to produce estimates
The variance for
is obtained by the sandwich formula.
An object of class std_surv. Obtain numeric results by using tidy.std_surv.
This is a list with the following components:
An unnamed list with one element for each of the requested contrasts. Each element is itself a list with the elements:
The function call
A list with components used in the estimation
Either "survival" or "rmean"
Estimated counterfactual means and standard errors for each exposure level
Estimated covariance matrix of counterfactual means for each time
Data.frame of the estimates of the contrast with inference
The vector of times used in the calculation
The transform argument used for this contrast
The requested contrast type
The reference level of the exposure
Confidence interval type
Confidence interval level
A named list with the elements:
The function call
A list with components used in the estimation
Either "survival" or "rmean"
Estimated counterfactual means and standard errors for each exposure level
Estimated covariance matrix of counterfactual means for each time
Standardized survival functions are sometimes referred to as (direct) adjusted survival functions in the literature.
standardize_coxph/standardize_parfrailty does not currently handle time-varying exposures or
covariates.
standardize_coxph/standardize_parfrailty internally loops over all values in the t argument.
Therefore, the function will usually be considerably faster if
length(t) is small.
The variance calculation performed by standardize_coxph does not condition on
the observed covariates . To see how this
matters, note that
The usual parameter in a Cox proportional hazards model does not
depend on . Thus, is independent
of as well (since ), so that
the term in the corresponding variance
decomposition for becomes equal to 0. However,
depends on through the average over the
sample distribution for , and thus the term
is not 0, unless one conditions on
. The variance calculation by Gail and Byar (1986) ignores this
term, and thus effectively conditions on .
Arvid Sjölander
Chang I.M., Gelman G., Pagano M. (1982). Corrected group prognostic curves and summary statistics. Journal of Chronic Diseases 35, 669-674.
Dahlqwist E., Pawitan Y., Sjölander A. (2019). Regression standardization and attributable fraction estimation with between-within frailty models for clustered survival data. Statistical Methods in Medical Research 28(2), 462-485.
Gail M.H. and Byar D.P. (1986). Variance calculations for direct adjusted survival curves, with applications to testing for no treatment effect. Biometrical Journal 28(5), 587-599.
Makuch R.W. (1982). Adjusted survival curve estimation using covariates. Journal of Chronic Diseases 35, 437-443.
require(survival) # simulate data set.seed(6) n <- 300 m <- 3 alpha <- 1.5 eta <- 1 phi <- 0.5 beta <- 1 id <- rep(1:n, each = m) U <- rep(rgamma(n, shape = 1 / phi, scale = phi), each = m) X <- rnorm(n * m) # reparameterize scale as in rweibull function weibull.scale <- alpha / (U * exp(beta * X))^(1 / eta) T <- rweibull(n * m, shape = eta, scale = weibull.scale) # right censoring C <- runif(n * m, 0, 10) D <- as.numeric(T < C) T <- pmin(T, C) # strong left-truncation L <- runif(n * m, 0, 2) incl <- T > L incl <- ave(x = incl, id, FUN = sum) == m dd <- data.frame(L, T, D, X, id) dd <- dd[incl, ] fit.std <- standardize_parfrailty( formula = Surv(L, T, D) ~ X, data = dd, values = list(X = seq(-1, 1, 0.5)), times = 1:5, clusterid = "id" ) print(fit.std) plot(fit.std)require(survival) # simulate data set.seed(6) n <- 300 m <- 3 alpha <- 1.5 eta <- 1 phi <- 0.5 beta <- 1 id <- rep(1:n, each = m) U <- rep(rgamma(n, shape = 1 / phi, scale = phi), each = m) X <- rnorm(n * m) # reparameterize scale as in rweibull function weibull.scale <- alpha / (U * exp(beta * X))^(1 / eta) T <- rweibull(n * m, shape = eta, scale = weibull.scale) # right censoring C <- runif(n * m, 0, 10) D <- as.numeric(T < C) T <- pmin(T, C) # strong left-truncation L <- runif(n * m, 0, 2) incl <- T > L incl <- ave(x = incl, id, FUN = sum) == m dd <- data.frame(L, T, D, X, id) dd <- dd[incl, ] fit.std <- standardize_parfrailty( formula = Surv(L, T, D) ~ X, data = dd, values = list(X = seq(-1, 1, 0.5)), times = 1:5, clusterid = "id" ) print(fit.std) plot(fit.std)
This is a summary method for class "parfrailty".
## S3 method for class 'parfrailty' summary( object, ci_type = "plain", ci_level = 0.95, digits = max(3L, getOption("digits") - 3L), ... )## S3 method for class 'parfrailty' summary( object, ci_type = "plain", ci_level = 0.95, digits = max(3L, getOption("digits") - 3L), ... )
object |
an object of class |
ci_type |
string, indicating the type of confidence intervals. Either "plain", which gives untransformed intervals, or "log", which gives log-transformed intervals. |
ci_level |
desired coverage probability of confidence intervals, in decimal form. |
digits |
the number of significant digits to use when printing.. |
... |
not used. |
An object of class "summary.parfrailty", which is a list that contains relevant summary statistics about the fitted model
Arvid Sjölander and Elisabeth Dahlqwist.
## See documentation for parfrailty## See documentation for parfrailty
Tidy summarizes information about the components of the standardized regression fit.
## S3 method for class 'std_custom' tidy(x, ...)## S3 method for class 'std_custom' tidy(x, ...)
x |
An object of class std_custom |
... |
Not currently used |
A data.frame
set.seed(6) n <- 100 Z <- rnorm(n) X <- rnorm(n, mean = Z) Y <- rbinom(n, 1, prob = (1 + exp(X + Z))^(-1)) dd <- data.frame(Z, X, Y) prob_predict.glm <- function(...) predict.glm(..., type = "response") x <- standardize( fitter = "glm", arguments = list( formula = Y ~ X * Z, family = "binomial" ), predict_fun = prob_predict.glm, data = dd, values = list(X = seq(-1, 1, 0.1)), B = 100, reference = 0, contrasts = "difference" ) tidy(x)set.seed(6) n <- 100 Z <- rnorm(n) X <- rnorm(n, mean = Z) Y <- rbinom(n, 1, prob = (1 + exp(X + Z))^(-1)) dd <- data.frame(Z, X, Y) prob_predict.glm <- function(...) predict.glm(..., type = "response") x <- standardize( fitter = "glm", arguments = list( formula = Y ~ X * Z, family = "binomial" ), predict_fun = prob_predict.glm, data = dd, values = list(X = seq(-1, 1, 0.1)), B = 100, reference = 0, contrasts = "difference" ) tidy(x)
Tidy summarizes information about the components of the standardized regression fit.
## S3 method for class 'std_glm' tidy(x, ...)## S3 method for class 'std_glm' tidy(x, ...)
x |
An object of class std_glm |
... |
Not currently used |
A data.frame
set.seed(6) n <- 100 Z <- rnorm(n) X <- rnorm(n, mean = Z) Y <- rbinom(n, 1, prob = (1 + exp(X + Z))^(-1)) dd <- data.frame(Z, X, Y) x <- standardize_glm( formula = Y ~ X * Z, family = "binomial", data = dd, values = list(X = 0:1), contrasts = c("difference", "ratio"), reference = 0 ) tidy(x)set.seed(6) n <- 100 Z <- rnorm(n) X <- rnorm(n, mean = Z) Y <- rbinom(n, 1, prob = (1 + exp(X + Z))^(-1)) dd <- data.frame(Z, X, Y) x <- standardize_glm( formula = Y ~ X * Z, family = "binomial", data = dd, values = list(X = 0:1), contrasts = c("difference", "ratio"), reference = 0 ) tidy(x)
Tidy summarizes information about the components of the standardized model fit.
## S3 method for class 'std_surv' tidy(x, ...)## S3 method for class 'std_surv' tidy(x, ...)
x |
An object of class std_surv |
... |
Not currently used |
A data.frame
require(survival) set.seed(8) n <- 300 Z <- rnorm(n) X <- rnorm(n, mean = Z) time <- rexp(n, rate = exp(X + Z + X * Z)) # survival time C <- rexp(n, rate = exp(X + Z + X * Z)) # censoring time U <- pmin(time, C) # time at risk D <- as.numeric(time < C) # event indicator dd <- data.frame(Z, X, U, D) x <- standardize_coxph( formula = Surv(U, D) ~ X + Z + X * Z, data = dd, values = list(X = seq(-1, 1, 0.5)), times = c(2,3,4) ) tidy(x)require(survival) set.seed(8) n <- 300 Z <- rnorm(n) X <- rnorm(n, mean = Z) time <- rexp(n, rate = exp(X + Z + X * Z)) # survival time C <- rexp(n, rate = exp(X + Z + X * Z)) # censoring time U <- pmin(time, C) # time at risk D <- as.numeric(time < C) # event indicator dd <- data.frame(Z, X, U, D) x <- standardize_coxph( formula = Surv(U, D) ~ X + Z + X * Z, data = dd, values = list(X = seq(-1, 1, 0.5)), times = c(2,3,4) ) tidy(x)