Skip to contents

Several metrics are reported, RMU, PSI, and 'empirical'. It is recommended to also use the function RIrelRep() to evaluate conditional reliability. RMU seems like the main metric to report.

Usage

RIreliability(
  data,
  conf_int = 0.95,
  draws = 1000,
  estim = "WLE",
  boot = FALSE,
  cpu = 4,
  pv = "mirt",
  iter = 50,
  verbose = TRUE,
  theta_range = c(-10, 10)
)

Arguments

data

Dataframe/tibble with only item response data coded as integers

conf_int

Desired confidence interval (HDCI)

draws

Number of plausible values to generate

estim

Estimation method for theta (latent scores)

boot

Optional non-parametric bootstrap for empirical reliability

cpu

Number of cpu cores to use for bootstrap method

pv

Choice of R package. Optional "TAM", requires that you have TAM installed

iter

Number of times the RMU estimation is done on the draws

verbose

Set to FALSE to avoid the messages

theta_range

The range of possible theta values

Details

RMU, Relative Measurement Uncertainty: This function uses the mirt library to estimate the Rasch model using Marginal Maximum Likelihood and then generates plausible values (PVs; Mislevy, 1991). The function uses borrowed code, see ?RMUreliability.

The PVs are then used with the RMU method described by Bignardi et al. (2025) to estimate a mean and confidence interval. The mean is similar to the expected a posteriori (EAP) reliability point estimate (Adams, 2005). The confidence interval uses the 95% highest continuous density interval (HDCI) based on the distribution of correlations.

Default setting is to generate 1000 PVs. More are recommended for stable estimates/CIs. How many more has not been systematically evaluated, but 4000 might be a good starting point. For smaller samples, more PVs is not very demanding computationally, but be wary of the time it takes to create thousands of PVs for each respondent in large samples.

PSI, Person Separation Index: Estimated using functions in the eRm package, see ?eRm::SepRel. Note that this excludes min/max scoring individuals, which may result in unexpected results, especially compared to other methods.

Empirical: Estimated using mirt::empirical_rxx(), see https://stats.stackexchange.com/questions/427631/difference-between-empirical-and-marginal-reliability-of-an-irt-model

References

  • Bignardi, G., Kievit, R., & Bürkner, P. C. (2025). A general method for estimating reliability using Bayesian Measurement Uncertainty. PsyArXiv. doi:10.31234/osf.io/h54k8

  • Mislevy, R. J. (1991). Randomization-Based Inference about Latent Variables from Complex Samples. Psychometrika, 56(2), 177–196. doi:10.1007/BF02294457

  • Adams, R. J. (2005). Reliability as a measurement design effect. Studies in Educational Evaluation, 31(2), 162–172. doi:10.1016/j.stueduc.2005.05.008

Examples

if (FALSE) { # \dontrun{
# comparison of a fully Bayesian Rasch model and PVs
df <- eRm::raschdat1[,1:20] %>%
  rownames_to_column("id") %>%
  pivot_longer(!id, names_to = "item")

library(brms)
brms_model <- brm(
  value ~ 1 + (1 | item) + (1 | id),
  data    = df,
  chains  = 4,
  cores   = 4,
  family = "bernoulli"
)

posterior_draws <- brms_model %>%
  as_draws_df() %>%
  dplyr::select(starts_with("r_id")) %>%
  t()

RMUreliability(posterior_draws)
RIreliability(eRm::raschdat1[,1:20], draws = 4000)
} # }