Title: | Exact Inference for Real-Valued Functionals of k-Sample Multinomial Parameters |
---|---|
Description: | We consider the k sample multinomial problem where we observe k vectors (possibly of different lengths), each representing an independent multinomial random vector. For a given function psi which takes in the concatenated vector of multinomial probabilities and outputs a real number, this is a Monte Carlo estimation procedure of an exact p-value and confidence interval. The resulting inference is valid even in small samples, and when the parameter is on the boundary, and when the function is nondifferentiable at the parameter value, all situations where asymptotic methods and the bootstrap would fail. |
Authors: | Michael C Sachs [aut, cre], Michael P Fay [aut] |
Maintainer: | Michael C Sachs <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.4.1 |
Built: | 2025-01-17 14:30:26 UTC |
Source: | https://github.com/sachsmc/xactonomial |
We consider the k sample multinomial problem where we observe k vectors (possibly of different lengths), each representing an independent multinomial random vector. For a given function psi which takes in the concatenated vector of multinomial probabilities and outputs a real number, this is a Monte Carlo estimation procedure of an exact p-value and confidence interval. The resulting inference is valid even in small samples, and when the parameter is on the boundary, and when the function is nondifferentiable at the parameter value, all situations where asymptotic methods and the bootstrap would fail.
Maintainer: Michael C Sachs [email protected]
Authors:
Michael P Fay
Useful links:
Given a set of candidate parameter vectors, check if the null is satisfied, and if so, compute the probability for each element of
the sample space
calc_prob_null(theta_cands, psi, psi0, minus1, SSpacearr, logC, II)
calc_prob_null(theta_cands, psi, psi0, minus1, SSpacearr, logC, II)
theta_cands |
A matrix with samples in the rows and the parameters in the columns |
psi |
The function of interest mapping parameters to the real line |
psi0 |
The null boundary for testing psi <= psi0 |
minus1 |
Either plus or minus 1 |
SSpacearr |
A matrix with the sample space for the given size of the problem |
logC |
log multinomial coefficient for each element of the sample space |
II |
logical vector of sample space psi being more extreme than the observed psi |
A numeric vector of probabilities
Given a set of candidate parameter vectors, check if the null is satisfied, and if so, compute the probability for each element of
the sample space
calc_prob_null2( theta_cands, psi, psi0, minus1, SSpacearr, logC, II, psi_v = FALSE )
calc_prob_null2( theta_cands, psi, psi0, minus1, SSpacearr, logC, II, psi_v = FALSE )
theta_cands |
A matrix with samples in the rows and the parameters in the columns |
psi |
The function of interest mapping parameters to the real line |
psi0 |
The null boundary for testing psi <= psi0 |
minus1 |
Either plus or minus 1 |
SSpacearr |
A matrix with the sample space for the given size of the problem |
logC |
log multinomial coefficient for each element of the sample space |
II |
logical vector of sample space psi being more extreme than the observed psi |
psi_v |
Is psi vectorized by row? |
A numeric vector of probabilities
calculate multinomial probabilities
calc_probs_rust(sar, logt, logc, d, n, nt)
calc_probs_rust(sar, logt, logc, d, n, nt)
Calculate combinations of multinomials
combinate(X, Y)
combinate(X, Y)
X |
Matrix 1 |
Y |
Matrix 2 |
A list of arrays
Like combinate but add to existing
combinate2(X, Y)
combinate2(X, Y)
X |
A list as returned by combinate |
Y |
Matrix 2 |
A list of arrays
This is basically the same as expand.grid, but faster for integers
expand_index(lengths)
expand_index(lengths)
lengths |
A vector with the lengths of each index to expand |
A matrix with length(lengths) columns and prod(lengths) rows
expand_index(c(2, 3, 4)) ## the same as expand.grid(1:2, 1:3, 1:4)
expand_index(c(2, 3, 4)) ## the same as expand.grid(1:2, 1:3, 1:4)
Sample from the unit simplex in d dimensions
get_theta_random(d = 4, nsamp = 75)
get_theta_random(d = 4, nsamp = 75)
d |
the dimension |
nsamp |
the number of samples to take uniformly in the d space |
The grid over Theta, the parameter space. A matrix with d columns and nsamp rows
get_theta_random(3, 10)
get_theta_random(3, 10)
This finds the value such that
using the one-dimensional root finding ITP method (Interpolate Truncate Project). Also see itp.
itp_root( f, a, b, k1 = 0.1, k2 = 2, n0 = 1, eps = 0.005, maxiter = 100, fa = NULL, fb = NULL, verbose = FALSE, ... )
itp_root( f, a, b, k1 = 0.1, k2 = 2, n0 = 1, eps = 0.005, maxiter = 100, fa = NULL, fb = NULL, verbose = FALSE, ... )
f |
The function to find the root of in terms of its first (one-dimensional) argument |
a |
The lower limit |
b |
The upper limit |
k1 |
A tuning parameter |
k2 |
Another tuning parameter |
n0 |
Another tuning parameter |
eps |
Convergence tolerance |
maxiter |
Maximum number of iterations |
fa |
The value of f(a), if NULL then will be calculated |
fb |
The value of f(b), if NULL then will be calculated |
verbose |
Prints out information during iteration |
... |
Other arguments passed on to f |
A numeric vector of length 1, the root at the last iteration
I. F. D. Oliveira and R. H. C. Takahashi. 2020. An Enhancement of the Bisection Method Average Performance Preserving Minmax Optimality. ACM Trans. Math. Softw. 47, 1, Article 5 (March 2021), 24 pages. https://doi.org/10.1145/3423597
fpoly <- function(x) x^3 - x - 2 ## example from the ITP_method wikipedia entry itp_root(fpoly, 1, 2, eps = .0001, verbose = TRUE)
fpoly <- function(x) x^3 - x - 2 ## example from the ITP_method wikipedia entry itp_root(fpoly, 1, 2, eps = .0001, verbose = TRUE)
Calculate log of multinomial coefficient
log_multinom_coef(x, sumx)
log_multinom_coef(x, sumx)
x |
Vector of observed counts in each cell |
size |
Total count |
The log multinomial coefficient
#' @examples S0 <- sspace_multinom(4, 6) S1 <- sspace_multinom(4, 7) logC0<- apply(S0,1,log_multinom_coef,sumx=6) logC1<- apply(S1,1,log_multinom_coef,sumx=7) logC<- outer(logC0,logC1,'+')
#' @examples S0 <- sspace_multinom(4, 6) S1 <- sspace_multinom(4, 7) logC0<- apply(S0,1,log_multinom_coef,sumx=6) logC1<- apply(S1,1,log_multinom_coef,sumx=7) logC<- outer(logC0,logC1,'+')
Compute a p value for the test of psi <= psi0 (lower = TRUE) or psi >= psi0 (lower = FALSE)
pvalue_psi0( psi0, psi, psi_hat, psi_obs, maxit, chunksize, lower = TRUE, target, SSpacearr, logC, d_k, psi_v = FALSE )
pvalue_psi0( psi0, psi, psi_hat, psi_obs, maxit, chunksize, lower = TRUE, target, SSpacearr, logC, d_k, psi_v = FALSE )
psi0 |
The null value |
psi |
The function of interest |
psi_hat |
The vector of psi values at each element of the sample space |
psi_obs |
The observed estimate |
maxit |
Maximum iterations |
chunksize |
Chunk size |
lower |
Do a one sided test of the null that it is less than psi0, otherwise greater. |
target |
Stop the algorithm if p >= target (for speed) |
SSpacearr |
The sample space array |
logC |
The log multinomial coefficient |
d_k |
The vector of dimensions |
psi_v |
Is psi vectorized by row? |
A p-value
Return a random sample from the d unit simplex
sample_unit_simplex(d)
sample_unit_simplex(d)
Return a random sample from the d unit simplex
sample_unit_simplex2(d)
sample_unit_simplex2(d)
Return n random samples from the d unit simplex
sample_unit_simplexn(d, n)
sample_unit_simplexn(d, n)
We have mutually exclusive outcomes and
independent trials.
This function enumerates all possible vectors of length
of counts of
each outcome for
trials, i.e., the sample space. The result is output
as a matrix with
columns where each row represents a possible
observation.
sspace_multinom(d, n)
sspace_multinom(d, n)
d |
Dimension |
n |
Size |
A matrix with d columns
d4s <- sspace_multinom(4, 8) stopifnot(abs(sum(apply(d4s, 1, dmultinom, prob = rep(.25, 4))) - 1) < 1e-12)
d4s <- sspace_multinom(4, 8) stopifnot(abs(sum(apply(d4s, 1, dmultinom, prob = rep(.25, 4))) - 1) < 1e-12)
We consider the k sample multinomial problem where we observe k vectors (possibly of different lengths), each representing an independent sample from a multinomial. For a given function psi which takes in the concatenated vector of multinomial probabilities and outputs a real number, we are interested in constructing a confidence interval for psi.
xactonomial( psi, data, psi0 = NULL, alternative = c("two.sided", "less", "greater"), alpha = 0.05, psi_limits, maxit = 50, chunksize = 500, conf.int = TRUE, psi_is_vectorized = FALSE )
xactonomial( psi, data, psi0 = NULL, alternative = c("two.sided", "less", "greater"), alpha = 0.05, psi_limits, maxit = 50, chunksize = 500, conf.int = TRUE, psi_is_vectorized = FALSE )
psi |
Function that takes in a vector of parameters and outputs a real valued number |
data |
A list with k elements representing the vectors of counts of a k-sample multinomial |
psi0 |
The null hypothesis value for the parameter being tested. A p value for a test of psi <= psi0 is computed. If NULL only a confidence interval is computed. |
alternative |
a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less" |
alpha |
A 1 - alpha percent confidence interval will be computed |
psi_limits |
A vector of length 2 giving the lower and upper limits of
the range of |
maxit |
Maximum number of iterations of the stochastic procedure |
chunksize |
The number of samples taken from the parameter space at each iteration |
conf.int |
Logical. If FALSE, no confidence interval is calculated, only the p-value. |
psi_is_vectorized |
Logical. If TRUE, expect that psi can take a matrix as input, and return a vector of length the number of rows, computing the statistic for each row of the matrix. If possible, this will substantially speed up the computation. See examples. |
Let be distributed
for
and denote
and
. The subscript
denotes the dimension of the multinomial. Suppose one is interested
in the parameter
. Given a sample of size
from
, one
can estimate
with the sample proportions as
and hence
. This function constructs a
percent confidence interval for
and
provides a function to calculate a p value for a test of the null hypothesis
for the two sided case,
for the case
alternative = "greater"
, and
for the case
alternative = "less"
.
We make no assumptions and do not rely on large sample approximations.
The computation is somewhat involved so it is best for small sample sizes.
A list with 3 elements: the estimate, the 1 - alpha percent confidence interval, and p-value
psi_ba <- function(theta) { theta1 <- theta[1:4] theta2 <- theta[5:8] sum(sqrt(theta1 * theta2)) } data <- list(T1 = c(2,1,2,1), T2 = c(0,1,3,3)) xactonomial(psi_ba, data, psi_limits = c(0, 1), maxit = 5, chunksize = 20) psi_ba_v <- function(theta) { theta1 <- theta[,1:4, drop = FALSE] theta2 <- theta[,5:8, drop = FALSE] rowSums(sqrt(theta1 * theta2)) } data <- list(T1 = c(2,1,2,1), T2 = c(0,1,3,3)) xactonomial(psi_ba_v, data, psi_limits = c(0, 1), maxit = 5, chunksize = 20, psi_is_vectorized = TRUE)
psi_ba <- function(theta) { theta1 <- theta[1:4] theta2 <- theta[5:8] sum(sqrt(theta1 * theta2)) } data <- list(T1 = c(2,1,2,1), T2 = c(0,1,3,3)) xactonomial(psi_ba, data, psi_limits = c(0, 1), maxit = 5, chunksize = 20) psi_ba_v <- function(theta) { theta1 <- theta[,1:4, drop = FALSE] theta2 <- theta[,5:8, drop = FALSE] rowSums(sqrt(theta1 * theta2)) } data <- list(T1 = c(2,1,2,1), T2 = c(0,1,3,3)) xactonomial(psi_ba_v, data, psi_limits = c(0, 1), maxit = 5, chunksize = 20, psi_is_vectorized = TRUE)