Package 'xactonomial'

Title: Exact Inference for Real-Valued Functionals of k-Sample Multinomial Parameters
Description: We consider the k sample multinomial problem where we observe k vectors (possibly of different lengths), each representing an independent multinomial random vector. For a given function psi which takes in the concatenated vector of multinomial probabilities and outputs a real number, this is a Monte Carlo estimation procedure of an exact p-value and confidence interval. The resulting inference is valid even in small samples, and when the parameter is on the boundary, and when the function is nondifferentiable at the parameter value, all situations where asymptotic methods and the bootstrap would fail.
Authors: Michael C Sachs [aut, cre], Michael P Fay [aut]
Maintainer: Michael C Sachs <[email protected]>
License: MIT + file LICENSE
Version: 0.4.1
Built: 2025-01-17 14:30:26 UTC
Source: https://github.com/sachsmc/xactonomial

Help Index


xactonomial: Exact Inference for Real-Valued Functionals of k-Sample Multinomial Parameters

Description

We consider the k sample multinomial problem where we observe k vectors (possibly of different lengths), each representing an independent multinomial random vector. For a given function psi which takes in the concatenated vector of multinomial probabilities and outputs a real number, this is a Monte Carlo estimation procedure of an exact p-value and confidence interval. The resulting inference is valid even in small samples, and when the parameter is on the boundary, and when the function is nondifferentiable at the parameter value, all situations where asymptotic methods and the bootstrap would fail.

Author(s)

Maintainer: Michael C Sachs [email protected]

Authors:

  • Michael P Fay

See Also

Useful links:


Calculate probability for given parameters

Description

Given a set of candidate parameter vectors, check if the null ψψ0\psi \leq \psi_0 is satisfied, and if so, compute the probability for each element of the sample space

Usage

calc_prob_null(theta_cands, psi, psi0, minus1, SSpacearr, logC, II)

Arguments

theta_cands

A matrix with samples in the rows and the parameters in the columns

psi

The function of interest mapping parameters to the real line

psi0

The null boundary for testing psi <= psi0

minus1

Either plus or minus 1

SSpacearr

A matrix with the sample space for the given size of the problem

logC

log multinomial coefficient for each element of the sample space

II

logical vector of sample space psi being more extreme than the observed psi

Value

A numeric vector of probabilities


Calculate probability for given parameters

Description

Given a set of candidate parameter vectors, check if the null ψψ0\psi \leq \psi_0 is satisfied, and if so, compute the probability for each element of the sample space

Usage

calc_prob_null2(
  theta_cands,
  psi,
  psi0,
  minus1,
  SSpacearr,
  logC,
  II,
  psi_v = FALSE
)

Arguments

theta_cands

A matrix with samples in the rows and the parameters in the columns

psi

The function of interest mapping parameters to the real line

psi0

The null boundary for testing psi <= psi0

minus1

Either plus or minus 1

SSpacearr

A matrix with the sample space for the given size of the problem

logC

log multinomial coefficient for each element of the sample space

II

logical vector of sample space psi being more extreme than the observed psi

psi_v

Is psi vectorized by row?

Value

A numeric vector of probabilities


calculate multinomial probabilities

Description

calculate multinomial probabilities

Usage

calc_probs_rust(sar, logt, logc, d, n, nt)

Calculate combinations of multinomials

Description

Calculate combinations of multinomials

Usage

combinate(X, Y)

Arguments

X

Matrix 1

Y

Matrix 2

Value

A list of arrays


Like combinate but add to existing

Description

Like combinate but add to existing

Usage

combinate2(X, Y)

Arguments

X

A list as returned by combinate

Y

Matrix 2

Value

A list of arrays


Get a matrix of indices for all possible combinations of vectors of lengths

Description

This is basically the same as expand.grid, but faster for integers

Usage

expand_index(lengths)

Arguments

lengths

A vector with the lengths of each index to expand

Value

A matrix with length(lengths) columns and prod(lengths) rows

Examples

expand_index(c(2, 3, 4))
## the same as
expand.grid(1:2, 1:3, 1:4)

Sample from the unit simplex in d dimensions

Description

Sample from the unit simplex in d dimensions

Usage

get_theta_random(d = 4, nsamp = 75)

Arguments

d

the dimension

nsamp

the number of samples to take uniformly in the d space

Value

The grid over Theta, the parameter space. A matrix with d columns and nsamp rows

Examples

get_theta_random(3, 10)

Find the root of the function f

Description

This finds the value x[a,b]x \in [a, b] such that f(x)=0f(x) = 0 using the one-dimensional root finding ITP method (Interpolate Truncate Project). Also see itp.

Usage

itp_root(
  f,
  a,
  b,
  k1 = 0.1,
  k2 = 2,
  n0 = 1,
  eps = 0.005,
  maxiter = 100,
  fa = NULL,
  fb = NULL,
  verbose = FALSE,
  ...
)

Arguments

f

The function to find the root of in terms of its first (one-dimensional) argument

a

The lower limit

b

The upper limit

k1

A tuning parameter

k2

Another tuning parameter

n0

Another tuning parameter

eps

Convergence tolerance

maxiter

Maximum number of iterations

fa

The value of f(a), if NULL then will be calculated

fb

The value of f(b), if NULL then will be calculated

verbose

Prints out information during iteration

...

Other arguments passed on to f

Value

A numeric vector of length 1, the root at the last iteration

References

I. F. D. Oliveira and R. H. C. Takahashi. 2020. An Enhancement of the Bisection Method Average Performance Preserving Minmax Optimality. ACM Trans. Math. Softw. 47, 1, Article 5 (March 2021), 24 pages. https://doi.org/10.1145/3423597

Examples

fpoly <- function(x) x^3 - x - 2 ## example from the ITP_method wikipedia entry
itp_root(fpoly, 1, 2, eps = .0001, verbose = TRUE)

Calculate log of multinomial coefficient

Description

Calculate log of multinomial coefficient

Usage

log_multinom_coef(x, sumx)

Arguments

x

Vector of observed counts in each cell

size

Total count

Value

The log multinomial coefficient

Examples

#' @examples
S0 <- sspace_multinom(4, 6)
S1 <- sspace_multinom(4, 7)
logC0<- apply(S0,1,log_multinom_coef,sumx=6)
logC1<- apply(S1,1,log_multinom_coef,sumx=7)
logC<- outer(logC0,logC1,'+')

Compute a p value for the test of psi <= psi0 (lower = TRUE) or psi >= psi0 (lower = FALSE)

Description

Compute a p value for the test of psi <= psi0 (lower = TRUE) or psi >= psi0 (lower = FALSE)

Usage

pvalue_psi0(
  psi0,
  psi,
  psi_hat,
  psi_obs,
  maxit,
  chunksize,
  lower = TRUE,
  target,
  SSpacearr,
  logC,
  d_k,
  psi_v = FALSE
)

Arguments

psi0

The null value

psi

The function of interest

psi_hat

The vector of psi values at each element of the sample space

psi_obs

The observed estimate

maxit

Maximum iterations

chunksize

Chunk size

lower

Do a one sided test of the null that it is less than psi0, otherwise greater.

target

Stop the algorithm if p >= target (for speed)

SSpacearr

The sample space array

logC

The log multinomial coefficient

d_k

The vector of dimensions

psi_v

Is psi vectorized by row?

Value

A p-value


Return a random sample from the d unit simplex

Description

Return a random sample from the d unit simplex

Usage

sample_unit_simplex(d)

Return a random sample from the d unit simplex

Description

Return a random sample from the d unit simplex

Usage

sample_unit_simplex2(d)

Return n random samples from the d unit simplex

Description

Return n random samples from the d unit simplex

Usage

sample_unit_simplexn(d, n)

Enumerate the sample space of a multinomial

Description

We have dd mutually exclusive outcomes and nn independent trials. This function enumerates all possible vectors of length dd of counts of each outcome for nn trials, i.e., the sample space. The result is output as a matrix with dd columns where each row represents a possible observation.

Usage

sspace_multinom(d, n)

Arguments

d

Dimension

n

Size

Value

A matrix with d columns

Examples

d4s <- sspace_multinom(4, 8)
stopifnot(abs(sum(apply(d4s, 1, dmultinom, prob = rep(.25, 4))) - 1) < 1e-12)

Exact inference for a real-valued function of multinomial parameters

Description

We consider the k sample multinomial problem where we observe k vectors (possibly of different lengths), each representing an independent sample from a multinomial. For a given function psi which takes in the concatenated vector of multinomial probabilities and outputs a real number, we are interested in constructing a confidence interval for psi.

Usage

xactonomial(
  psi,
  data,
  psi0 = NULL,
  alternative = c("two.sided", "less", "greater"),
  alpha = 0.05,
  psi_limits,
  maxit = 50,
  chunksize = 500,
  conf.int = TRUE,
  psi_is_vectorized = FALSE
)

Arguments

psi

Function that takes in a vector of parameters and outputs a real valued number

data

A list with k elements representing the vectors of counts of a k-sample multinomial

psi0

The null hypothesis value for the parameter being tested. A p value for a test of psi <= psi0 is computed. If NULL only a confidence interval is computed.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less"

alpha

A 1 - alpha percent confidence interval will be computed

psi_limits

A vector of length 2 giving the lower and upper limits of the range of ψ(θ)\psi(\theta)

maxit

Maximum number of iterations of the stochastic procedure

chunksize

The number of samples taken from the parameter space at each iteration

conf.int

Logical. If FALSE, no confidence interval is calculated, only the p-value.

psi_is_vectorized

Logical. If TRUE, expect that psi can take a matrix as input, and return a vector of length the number of rows, computing the statistic for each row of the matrix. If possible, this will substantially speed up the computation. See examples.

Details

Let TjT_j be distributed Multinomialdj(θj,nj)\mbox{Multinomial}_{d_j}(\boldsymbol{\theta}_j, n_j) for j=1,,kj = 1, \ldots, k and denote T=(T1,,Tk)\boldsymbol{T} = (T_1, \ldots, T_k) and θ=(θ1,,θk)\boldsymbol{\theta} = (\theta_1, \ldots, \theta_k). The subscript djd_j denotes the dimension of the multinomial. Suppose one is interested in the parameter ψ(θ)ΘR\psi(\boldsymbol{\theta}) \in \Theta \subseteq \mathbb{R}. Given a sample of size nn from T\boldsymbol{T}, one can estimate θ\boldsymbol{\theta} with the sample proportions as θ^\hat{\boldsymbol{\theta}} and hence π(θ^)\pi(\hat{\boldsymbol{\theta}}). This function constructs a 1α1 - \alpha percent confidence interval for ψ(θ)\psi(\boldsymbol{\theta}) and provides a function to calculate a p value for a test of the null hypothesis H0:ψ(θ)ψ0H_0: \psi(\boldsymbol{\theta}) \neq \psi_0 for the two sided case, H0:ψ(θ)ψ0H_0: \psi(\boldsymbol{\theta}) \leq \psi_0 for the case alternative = "greater", and H0:ψ(θ)ψ0H_0: \psi(\boldsymbol{\theta}) \geq \psi_0 for the case alternative = "less". We make no assumptions and do not rely on large sample approximations. The computation is somewhat involved so it is best for small sample sizes.

Value

A list with 3 elements: the estimate, the 1 - alpha percent confidence interval, and p-value

Examples

psi_ba <- function(theta) {
  theta1 <- theta[1:4]
  theta2 <- theta[5:8]
  sum(sqrt(theta1 * theta2))
  }
data <- list(T1 = c(2,1,2,1), T2 = c(0,1,3,3))
xactonomial(psi_ba, data, psi_limits = c(0, 1), maxit = 5, chunksize = 20)

psi_ba_v <- function(theta) {
theta1 <- theta[,1:4, drop = FALSE]
theta2 <- theta[,5:8, drop = FALSE]
rowSums(sqrt(theta1 * theta2))
}
data <- list(T1 = c(2,1,2,1), T2 = c(0,1,3,3))
xactonomial(psi_ba_v, data, psi_limits = c(0, 1), maxit = 5, chunksize = 20, psi_is_vectorized = TRUE)