Title: | Methods for Evaluating Principal Surrogates of Treatment Response |
---|---|
Description: | Contains the core methods for the evaluation of principal surrogates in a single clinical trial. Provides a flexible interface for defining models for the risk given treatment and the surrogate, the models for integration over the missing counterfactual surrogate responses, and the estimation methods. Estimated maximum likelihood and pseudo-score can be used for estimation, and the bootstrap for inference. A variety of post-estimation summary methods are provided, including print, summary, plot, and testing. |
Authors: | Michael C Sachs [aut, cre], Erin E Gabriel [aut] |
Maintainer: | Michael C Sachs <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.3.1 |
Built: | 2025-01-21 04:10:59 UTC |
Source: | https://github.com/sachsmc/pseval |
This operator allows you to add objects to a psdesign object, such as integration models and risk models
## S3 method for class 'ps' p1 + p2
## S3 method for class 'ps' p1 + p2
p1 |
An object of class psdesign |
p2 |
Another object to be added to If the first object is an object of class
|
Bootstrap resampling parameters
add_bootstrap(psdesign, bootstrap)
add_bootstrap(psdesign, bootstrap)
psdesign |
A psdesign object, it must have risk model and integration model components |
bootstrap |
A bootstrap object created by ps_bootstrap |
## Not run: test <- psdesign(generate_example_data(n = 100), Z = Z, Y = Y.obs, S = S.obs, BIP = BIP) est1 <- test + integrate_parametric(S.1 ~ BIP) + risk_binary() + ps_estimate(method = "BFGS") est1 + ps_bootstrap(method = "BFGS", start = est1$estimates$par, n.boots = 50) ## End(Not run)
## Not run: test <- psdesign(generate_example_data(n = 100), Z = Z, Y = Y.obs, S = S.obs, BIP = BIP) est1 <- test + integrate_parametric(S.1 ~ BIP) + risk_binary() + ps_estimate(method = "BFGS") est1 + ps_bootstrap(method = "BFGS", start = est1$estimates$par, n.boots = 50) ## End(Not run)
Estimate parameters
add_estimate(psdesign, estimate)
add_estimate(psdesign, estimate)
psdesign |
A psdesign object, it must have risk model and integration model components |
estimate |
An estimate object created by ps_estimate |
test <- psdesign(generate_example_data(n = 100), Z = Z, Y = Y.obs, S = S.obs, BIP = BIP) test + integrate_parametric(S.1 ~ BIP) + risk_binary(D = 50) + ps_estimate(method = "BFGS")
test <- psdesign(generate_example_data(n = 100), Z = Z, Y = Y.obs, S = S.obs, BIP = BIP) test + integrate_parametric(S.1 ~ BIP) + risk_binary(D = 50) + ps_estimate(method = "BFGS")
Add integration model to a psdesign object
add_integration(psdesign, integration)
add_integration(psdesign, integration)
psdesign |
A psdesign object |
integration |
An integration object |
This is a list of the available integration models. The fundamental problem in surrogate evaluation is that there are unobserved values of the counterfactual surrogate responses S(1). In the estimated maximum likelihood framework, for subjects missing the S(1) values, we use an auxiliary pre-treatment variable or set of variables W that is observed for every subject to estimate the distribution of S(1) | W. Typically, this W is a BIP. Then for each missing S(1), we integrate likelihood contributions over each non-missing S(1) given their value of W, and average over the contributions.
integrate_parametric This is a parametric integration model that fits a linear model for the mean of S(1) | W and assumes a Gaussian distribution.
integrate_bivnorm This is another parametric integration model that assumes that S(1) and W are jointly normally distributed. The user must specify their mean, variances and correlation.
integrate_nonparametric This is a non-parametric integration model that is only valid for categorical S(1) and W. It uses the observed proportions to estimate the joint distribution of S(1), W.
integrate_semiparametric This is a semi-parametric model that uses the semi-parametric location scale model of Heagerty and Pepe (1999). Models are specified for the location of S(1) | W and the scale of S(1) | W. Then integrations are drawn from the empirical distribution of the residuals from that model, which are then transformed to the appropriate location and scale.
test <- psdesign(generate_example_data(n = 100), Z = Z, Y = Y.obs, S = S.obs, BIP = BIP) add_integration(test, integrate_parametric(S.1 ~ BIP)) test + integrate_parametric(S.1 ~ BIP) # same as above
test <- psdesign(generate_example_data(n = 100), Z = Z, Y = Y.obs, S = S.obs, BIP = BIP) add_integration(test, integrate_parametric(S.1 ~ BIP)) test + integrate_parametric(S.1 ~ BIP) # same as above
Add risk model to a psdesign object
add_riskmodel(psdesign, riskmodel)
add_riskmodel(psdesign, riskmodel)
psdesign |
A psdesign object |
riskmodel |
A risk model object, from the list above |
The risk model component specifies the likelihood for the data. This involves specifying the distribution of the outcome variable, whether it is binary or time-to-event, and specifying how the surrogate S(1) and the treatment Z interact and affect the outcome. We use the formula notation to be consistent with other regression type models in R. Below is a list of available risk models.
risk_binary This is a generic risk model for binary outcomes. The user can specify the formula, and link function using either risk.logit for the logistic link, or risk.probit for the probit link. Custom link functions may also be specified, which take a single numeric vector argument, and returns a vector of corresponding probabilities.
risk_weibull This is a parameterization of the Weibull model for time-to-event outcomes that is consistent with that of rweibull. The user specifies the formula for the linear predictor of the scale parameter.
risk_exponential This is a simple exponential model for a time-to-event outcome.
risk_poisson This is a Poisson model for count outcomes. It allows for offsets in the formula.
risk_continuous This is a Gaussian model for continuous outcomes. It assumes that larger values of the outcome are harmful (e.g. blood pressure)
test <- psdesign(generate_example_data(n = 100), Z = Z, Y = Y.obs, S = S.obs, BIP = BIP) + integrate_parametric(S.1 ~ BIP) add_riskmodel(test, risk_binary()) test + risk_binary() # same as above
test <- psdesign(generate_example_data(n = 100), Z = Z, Y = Y.obs, S = S.obs, BIP = BIP) + integrate_parametric(S.1 ~ BIP) add_riskmodel(test, risk_binary()) test + risk_binary() # same as above
Computes the treatment efficacy (TE) and other functions of the risk in each treatment arm over the range of surrogate values observed in the data. TE(s) is defined as 1 - risk(s, z = 1)/risk(s, z = 0), where z is the treatment indicator. If any other variables are present in the risk model, then the risk is computed at their median value.
calc_risk(psdesign, contrast = "TE", t, sig.level = 0.05, CI.type = "band", n.samps = 5000, bootstraps = TRUE, newdata = NULL)
calc_risk(psdesign, contrast = "TE", t, sig.level = 0.05, CI.type = "band", n.samps = 5000, bootstraps = TRUE, newdata = NULL)
psdesign |
A psdesign object. It must contain a risk model, an integration model, and estimated parameters. Bootstrapped parameters are optional |
contrast |
The contrast function, or the name of the contrast function. See details. |
t |
For time to event outcomes, a fixed time |
sig.level |
Significance level for bootstrap confidence intervals |
CI.type |
Character string, "pointwise" for pointwise confidence intervals, and "band" for simultaneous confidence band. |
n.samps |
The number of samples to take over the range of S.1 at which the contrast is calculated |
bootstraps |
If true, and bootstrapped estimates are present, will calculate bootstrap standard errors and confidence bands. |
newdata |
Vector of S values. If present, will calculate the contrast function at values of newdata instead of the observed S.1 |
The contrast function is a function that takes 2 inputs, the risk_0
and risk_1, and returns some one dimensional function of those two inputs.
It must be vectorized. Some built-in functions are "TE"
for treatment
efficacy = 1 - risk_1(s)/risk_0(s), "RR"
for relative risk =
risk_1(s)/risk_0(s), "logRR"
for log of the relative risk, and
"RD"
for the risk difference = risk_1(s) - risk_0(s).
A data frame containing columns for the S values, the computed contrast function at S, R0, and R1 at those S values, and optionally standard errors and confidence intervals computed using bootstrapped estimates.
## Not run: # same result passing function name or function calc_risk(binary.boot, contrast = "TE", n.samps = 20) calc_risk(binary.boot, contrast = function(R0, R1) 1 - R1/R0, n.samps = 20) ## End(Not run)
## Not run: # same result passing function name or function calc_risk(binary.boot, contrast = "TE", n.samps = 20) calc_risk(binary.boot, contrast = function(R0, R1) 1 - R1/R0, n.samps = 20) ## End(Not run)
Computes the standardized total gain for the risk difference. Optionally produces bootstrap standard errors and confidence intervals. The standardized total gain is the area between the risk difference curve and the horizontal line at the marginal risk difference. If the outcome is time to event then the STG is time-dependent, and a time point for evaluation is needed. If one is not provided then the restricted mean survival is estimated from the data and used.
calc_STG(psdesign, t, sig.level = 0.05, n.samps = 5000, bootstraps = TRUE, permute = TRUE, permute.times = 2000, progress.bar = TRUE)
calc_STG(psdesign, t, sig.level = 0.05, n.samps = 5000, bootstraps = TRUE, permute = TRUE, permute.times = 2000, progress.bar = TRUE)
psdesign |
A psdesign object. It must contain a risk model, an integration model, and estimated parameters. Bootstrapped parameters are optional |
t |
For time to event outcomes, a fixed time |
sig.level |
Significance level for bootstrap confidence intervals |
n.samps |
The number of samples to take over the range of S.1 at which the VE is calculated |
bootstraps |
If true, and bootstrapped estimates are present, will calculate bootstrap standard errors and confidence interval. |
permute |
Not used, included for backwards compatibility |
permute.times |
Not used, included for backwards compatibility |
progress.bar |
Not used, included for backwards compatibility |
Compute the empirical Treatment Efficacy
empirical_TE(psdesign, t)
empirical_TE(psdesign, t)
psdesign |
An object of class psdesign |
t |
Fixed time for time to event outcomes to compute TE. If missing, uses restricted mean survival. |
Included for backwards compatibility
empirical_VE(psdesign, t)
empirical_VE(psdesign, t)
psdesign |
An object of class psdesign |
t |
Fixed time for time to event outcomes to compute TE. If missing, uses restricted mean survival. |
Expand augmented data using the integration function
expand_augdata(model, psdesign, D = 500)
expand_augdata(model, psdesign, D = 500)
model |
Formula defining the risk model |
psdesign |
An object of class psdesign, that contains at least 1 integration model |
D |
The number of samples to take for the simulated annealing |
Generate sample data used for testing
generate_example_data(n)
generate_example_data(n)
n |
Integer, the sample size |
This model assumes that the pair [S(1), W] is bivariate normal, where W is the BIP. The means, standard deviations, and correlation are estimated or fixed before calling this function. Then the conditional normal formula is applied in order to get the distribution of S(1) | W. That distribution is used to integrate over the missing S(1) values. This method requires a BIP in the design.
integrate_bivnorm(x = "S.1", mu = c(0, 0), sd = c(1, 1), rho = 0.2)
integrate_bivnorm(x = "S.1", mu = c(0, 0), sd = c(1, 1), rho = 0.2)
x |
expression identifying the variable to be integrated. Typically this is S.1 or S.0 |
mu |
means of the pair of surrogates, missing one first |
sd |
standard deviations of the pair, missing one first |
rho |
the correlation between X1 and X2 |
Both S(1) and the BIP or set of BIPs must be categorical. This model integrates over the estimated distribution of S(1) | BIP
integrate_nonparametric(formula, ...)
integrate_nonparametric(formula, ...)
formula |
Formula specifying the integration model for the surrogate under treatment. Generally the candidate surrogate will be on the left side in the formula, and the BIP or BIPs will be on the right side. In this case the BIP and the S(1) must be categorical. |
... |
Not currently used |
Parametric integration model for the missing S(1)
integrate_parametric(formula, family = gaussian, ...)
integrate_parametric(formula, family = gaussian, ...)
formula |
Formula specifying the integration model for the surrogate under treatment. Generally the candidate surrogate will be on the left side in the formula, and the BIP or BIPs will be on the right side |
family |
Assumed distribution for the integration model. Must be
compatible with the |
... |
Arguments passed to glm |
Semiparametric integration model using the location-scale model
integrate_semiparametric(formula.location, formula.scale, ...)
integrate_semiparametric(formula.location, formula.scale, ...)
formula.location |
Formula specifying the integration model for the location component of the surrogate under treatment. Generally the candidate surrogate will be on the left side in the formula, and the BIP or BIPs will be on the right side |
formula.scale |
Formula specifying the integration model for the scale component of the surrogate under treatment. Generally the candidate surrogate will be on the left side in the formula, and the BIP or BIPs will be on the right side |
... |
Other parameters passed to sp_locscale |
Plot the treatment efficacy or another contrast of risk versus S.1 for an estimated psdesign object
## S3 method for class 'psdesign' plot(x, t, contrast = "TE", sig.level = 0.05, CI.type = "band", n.samps = 500, xlab = "S.1", ylab = contrast, col = 1, lty = 1, lwd = 1, ...)
## S3 method for class 'psdesign' plot(x, t, contrast = "TE", sig.level = 0.05, CI.type = "band", n.samps = 500, xlab = "S.1", ylab = contrast, col = 1, lty = 1, lwd = 1, ...)
x |
A psdesign object that contains a risk model, integration model, and valid estimates |
t |
For time to event outcomes, a fixed time |
contrast |
Name of contrast function to plot. |
sig.level |
Significance level used for confidence bands on the contrast curve. This is only used if bootstrapped estimates are available. |
CI.type |
Character string, "pointwise" for pointwise confidence intervals, and "band" for simultaneous confidence band. |
n.samps |
Number of samples to use over the range of S.1 for plotting the curve |
xlab |
X-axis label |
ylab |
Y-axis label |
col |
Vector of integers specifying colors for each curve. |
lty |
Vector of integers specifying linetypes for each curve. |
lwd |
Vector of numeric values for line widths. |
... |
Other arguments passed to plot |
Concisely print information about a psdesign object
## S3 method for class 'psdesign' print(x, digits = 3, sig.level = 0.05, ...)
## S3 method for class 'psdesign' print(x, digits = 3, sig.level = 0.05, ...)
x |
An object of class psdesign |
digits |
Number of significant digits to display |
sig.level |
Significance level to use for computing bootstrapped confidence intervals |
... |
Currently unused |
Estimate parameters from a specified model using bootstrap resampling and estimated maximum likelihood
ps_bootstrap(n.boots = 200, progress.bar = TRUE, start = NULL, method = "BFGS", control = list(), ...)
ps_bootstrap(n.boots = 200, progress.bar = TRUE, start = NULL, method = "BFGS", control = list(), ...)
n.boots |
Number of bootstrap replicates |
progress.bar |
Logical, if true will display a progress bar in the console |
start |
Vector of starting values, if NULL, will come up with starting values |
method |
Method to use for optimization, can be "pseudo-score" for categorical S with nonparametric integration, or any of the methods available in optim. Defaults to "BFGS" |
control |
List of control parameters for passed to optim |
... |
Arguments passed to optim |
Estimate parameters from a specified model using estimated maximum likelihood
ps_estimate(start = NULL, method = "BFGS", control = list(), ...)
ps_estimate(start = NULL, method = "BFGS", control = list(), ...)
start |
Vector of starting values, if NULL, will come up with starting values |
method |
Method to use for optimization, can be "pseudo-score" for categorical BIP, or any of the methods available in optim. Defaults to "BFGS" |
control |
List of control parameters for passed to optim |
... |
Arguments passed to optim or pseudo_score. |
Generate mappings that describe how variables in the data are mapped to
components of the principal surrogate analysis. Other than data
, this
is a list of key-value pairs describing the common elements of a ps analysis.
The required keys are Z, Y, and S. Optional keys are BIP, CPV, BSM, and
weights. These elements are described in details below. Additional keys-value
pairs can be included in ...
. This function generates an augmented
dataset and additional information for subsequent steps in the analysis. In
the subsequent steps, refer to the variables by the keys. See
add_integration and add_riskmodel for information on how to
proceed in the analysis.
psdesign(data, Z, Y, S, BIP = NULL, CPV = NULL, BSM = NULL, weights = NULL, tau, ...)
psdesign(data, Z, Y, S, BIP = NULL, CPV = NULL, BSM = NULL, weights = NULL, tau, ...)
data |
Data frame containing data to be analyzed |
Z |
Expression defining the treatment variable which has 2 levels |
Y |
Expression defining the outcome variable. For binary events this should be coded as 0/1 or a factor with 2 levels. For censored time-to-event outcomes this can be a call to Surv |
S |
Expression defining the candidate surrogate |
BIP |
Optional expression defining the baseline irrelevant predictor |
CPV |
Optional expression defining the closeout placebo vaccination measurement |
BSM |
Optional expression defining the baseline surrogate measurement |
weights |
optional expression defining weights to accommodate nonrandom subsampling, such as case control or two phase |
tau |
numeric, When the outcome Y is a survival time, it is possible that the surrogate was measured at some time tau after enrollment. Use the argument tau to specify the time when the surrogate was measured, in study time. Not required for binary Y. |
... |
Other key-value pairs that will be included in the augmented data, e.g. additional candidate surrogates, covariates for adjustment, variables used for integration |
Estimate parameters from a specified model using pseudo-score
pseudo_score(psdesign, start = NULL, epsilon = 1e-05, maxit = 50)
pseudo_score(psdesign, start = NULL, epsilon = 1e-05, maxit = 50)
psdesign |
An object of class psdesign |
start |
Vector of starting values, if NULL, will come up with starting values |
epsilon |
Convergence criteria |
maxit |
Maximum number of iterations |
Risk model for binary outcome
risk_binary(model = Y ~ S.1 * Z, D = 5000, risk = risk.logit)
risk_binary(model = Y ~ S.1 * Z, D = 5000, risk = risk.logit)
model |
Formula specifying the risk model |
D |
number of samples for the simulated annealing integration |
risk |
Function for transforming a linear predictor into a probability. E.g., risk.logit for the logistic model, risk.probit for the probit model |
This model assumes that the outcome Y is normally distributed conditional on S.1 and Z, with mean determined by the model formula. It also assumes that larger values of Y are more indicative of poor outcomes, e.g., blood pressure.
risk_continuous(model = Y ~ S.1 * Z, D = 5000)
risk_continuous(model = Y ~ S.1 * Z, D = 5000)
model |
Formula specifying the risk model for the mean |
D |
number of samples for the simulated annealing integration |
Exponential risk model for time to event outcome
risk_exponential(model = Y ~ S.1 * Z, D = 5000)
risk_exponential(model = Y ~ S.1 * Z, D = 5000)
model |
Formula specifying the risk model. The outcome should be a Surv object specifying right censoring |
D |
number of samples for simulated annealing |
Poisson risk model for count outcomes
risk_poisson(model = Y ~ S.1 * Z, D = 5000)
risk_poisson(model = Y ~ S.1 * Z, D = 5000)
model |
Formula specifying the risk model. The outcome should be an integer of counts. This right side of the formula may contain an offset term. |
D |
number of samples for simulated annealing |
Weibull risk model for time to event outcome
risk_weibull(model = Y ~ S.1 * Z, D = 5000)
risk_weibull(model = Y ~ S.1 * Z, D = 5000)
model |
Formula specifying the risk model. The outcome should be a Surv object specifying right censoring |
D |
number of samples for simulated annealing |
Logit link function
risk.logit(x)
risk.logit(x)
x |
A vector of linear predictors |
A vector of probabilities
Probit link function
risk.probit(x)
risk.probit(x)
x |
A vector of linear predictors |
A vector of probabilities
Calculate risks with handlers for survival data
riskcalc(risk.function, Y, par, t, dat0, dat1)
riskcalc(risk.function, Y, par, t, dat0, dat1)
risk.function |
Function taking three arguments, a data.frame, parameters, and time. It should return a vector the same number of rows as the data frame |
Y |
The outcome variable |
par |
the vector of parameter values |
t |
Time for a survival outcome, may be missing |
dat0 |
Data frame containing S and Z = 1 |
dat1 |
Data frame containing S and Z = 0 |
This estimates the location-scale model as described in Heagerty and Pepe (1999) using the Newton-Raphson method. The location and scale formulas must have the same outcome, but they may have different predictors.
sp_locscale(formula.location, formula.scale, data, weights, tol = 1e-06, maxit = 100)
sp_locscale(formula.location, formula.scale, data, weights, tol = 1e-06, maxit = 100)
formula.location |
Formula specifying the model for the location |
formula.scale |
Formula specifying the model for the scale |
data |
Data used to estimate the model |
weights |
Weights applied to the estimating equations |
tol |
Convergence tolerance |
maxit |
Maximum number of iterations |
A list containing the parameter estimates, the convergence indicator, and residuals
Compute the standardized total gain
stg(R1, R0, stand = TRUE)
stg(R1, R0, stand = TRUE)
R1 |
Risk in the treatment group |
R0 |
Risk in the control group |
stand |
Standardize? |
Summarize bootstrap samples
summarize_bs(bootdf, estdf = NULL, sig.level = 0.05, CI.type = "band")
summarize_bs(bootdf, estdf = NULL, sig.level = 0.05, CI.type = "band")
bootdf |
Data frame containing bootstrapped estimates, with a column containing a convergence indicator |
estdf |
Data frame containing full sample estimate |
sig.level |
Significance level to use for confidence intervals |
CI.type |
Character string, "pointwise" for pointwise confidence intervals, and "band" for simultaneous confidence band. |
Summary method for psdesign objects
## S3 method for class 'psdesign' summary(object, digits = 3, sig.level = 0.05, ...)
## S3 method for class 'psdesign' summary(object, digits = 3, sig.level = 0.05, ...)
object |
An object of class psdesign |
digits |
Number of significant digits to display |
sig.level |
Significance level to use for computing bootstrapped confidence intervals |
... |
Currently unused |
Invisibly returns the printed table, along with the three estimates of vaccine efficacy. The empirical TE is 1 minus the relative risk comparing the treatment arm to the control arm. The risk is estimated as the proportion in the binary outcome case, or with the Kaplan-Meier estimate at the restricted mean survival in the time-to-event case. The marginal TE estimate is the TE estimate under the specified parametric risk model, ignoring the effect of S.1. The model based average TE is the TE estimate from the specified risk model, averaged over the distribution of S.1. The point of displaying these three is to assess the validity of the parametric model, and to assess the validity of the model estimation. Wild differences among these estimates may indicate problems with the model or convergence.
Treatment efficacy contrast functions
TE(R0, R1)
TE(R0, R1)
R0 |
A vector of risks in the control arm |
R1 |
A vector of risks in the treatment arm |
These functions take the risk in the two treatment arms, and
computes a one-dimensional summary of those risks. Built-in choices are
"TE"
for treatment efficacy = 1 - risk_1(s)/risk_0(s), "RR"
for
relative risk = risk_0(s)/risk_1(s), "logRR"
for log of the relative
risk, and "RD"
for the risk difference = risk_0(s) - risk_1(s).
A vector the same length as R0 and R1.
Checks for two classes and gives a warning message indicating which level is assumed to be 0/1
verify_trt(D)
verify_trt(D)
D |
Vector that will be checked for 2-class labels |
This runs a multivariate Wald test on the interaction terms of the model, using the bootstrap covariance
wem_test(x)
wem_test(x)
x |
An object of class psdesign with bootstrap replicates |