Posterior Predictive Infit Statistic for the Hurdle Partial Credit Model

Computes conditional item infit statistics separately for the two submodels of a hurdle partial credit model fitted with brms using the hurdle_acat custom family: (i) the hurdle submodel for $P(Y > 0)$ (Bernoulli) and (ii) the partial credit severity submodel for $P(Y = k \mid Y > 0)$ on the positive categories. For each posterior draw, expected values and variances are derived from the submodel-specific category probabilities, and variance-weighted standardised residuals are computed for both observed and replicated data.

Usage

infit_statistic_hpcm(
  model,
  item_var = item,
  person_var = id,
  ndraws_use = NULL,
  outfit = FALSE
)

Arguments

model: A fitted brmsfit object using the hurdle_acat custom family (i.e., posterior_epred returns an S x N x K_total array whose first category is the hurdle / zero probability).
item_var: An unquoted variable name identifying the item grouping variable in the model data (e.g., item).
person_var: An unquoted variable name identifying the person grouping variable in the model data (e.g., id).
ndraws_use: Optional positive integer. If specified, a random subset of posterior draws of this size is used. If NULL (the default), all draws are used.
outfit: Logical. If TRUE, outfit statistics are computed alongside infit. Default is FALSE.

Value

A list with two elements, each a tibble in the same format as the output of infit_statistic (and directly compatible with infit_post):

hurdle: Item infit for the Bernoulli hurdle submodel on $1[Y > 0]$, evaluated on all observations.
pcm: Item infit for the partial credit severity submodel on $P(Y = k \mid Y > 0)$, evaluated only on the observations with $Y_{obs} > 0$.

Details

The hurdle PCM splits the generative process into:

A Bernoulli hurdle with $hu = P(Y = 0)$.
A partial credit / acat-logit severity process over the positive categories $1, \ldots, K - 1$, applied only when the hurdle is crossed.

posterior_epred for the hurdle_acat family returns an S x N x K_total array whose first category is $hu$ and whose remaining categories are $(1 - hu) \cdot P_{sev}(k)$. The two submodel infits are computed as follows:

Hurdle submodel. All observations contribute. The Bernoulli moments are $E_{hurdle} = 1 - hu$ and $Var_{hurdle} = hu \cdot (1 - hu)$. Observed and replicated scores are $1[Y_{obs} > 0]$ and $1[Y_{rep} > 0]$ with $Y_{rep}$ obtained from posterior_predict.

Partial credit submodel. Only observations with $Y_{obs} > 0$ contribute. Conditional probabilities are $$P(Y = k \mid Y > 0) = epred[, , k+1] / (1 - hu), \quad k = 1, \ldots, K - 1.$$ The conditional expected value and variance use category scores $1, \ldots, K - 1$. Replicated severity values are drawn independently for each (draw, observation) from these conditional probabilities via inverse-CDF sampling, so the partial credit PPC is not contaminated by hurdle misfit.

Within each submodel the infit per item is $$Infit_i^{(s)} = \sum_v (X_{vi} - E_{vi}^{(s)})^2 / \sum_v Var_{vi}^{(s)},$$ with the sum restricted to the rows the submodel applies to (all rows for the hurdle; rows with $Y_{obs} > 0$ for partial credit).

References

Christensen, K. B., Kreiner, S. & Mesbah, M. (Eds.) (2013). Rasch Models in Health. Iste and Wiley, pp. 86–90.

Kreiner, S. & Christensen, K. B. (2011). Exact evaluation of bias in Rasch model residuals. Advances in Mathematics Research, 12, 19–40.

Magnus, B. E. & Garnier-Villarreal, M. (2022). A multidimensional zero-inflated graded response model for ordinal symptom data. Psychological Methods, 27(2), 261-279. doi:10.1037/met0000395