Several metrics are reported, RMU, PSI, and 'empirical'. It is recommended to also
use the function RIrelRep() to evaluate conditional reliability. RMU seems
like the main metric to report.
Usage
RIreliability(
data,
conf_int = 0.95,
draws = 1000,
estim = "WLE",
boot = FALSE,
cpu = 4,
pv = "mirt",
iter = 50,
verbose = TRUE,
theta_range = c(-10, 10)
)Arguments
- data
Dataframe/tibble with only item response data coded as integers
- conf_int
Desired confidence interval (HDCI)
- draws
Number of plausible values to generate
- estim
Estimation method for theta (latent scores)
- boot
Optional non-parametric bootstrap for empirical reliability
- cpu
Number of cpu cores to use for bootstrap method
- pv
Choice of R package. Optional "TAM", requires that you have TAM installed
- iter
Number of times the RMU estimation is done on the draws
- verbose
Set to
FALSEto avoid the messages- theta_range
The range of possible theta values
Details
RMU, Relative Measurement Uncertainty:
This function uses the mirt library to estimate the Rasch model using
Marginal Maximum Likelihood and then generates plausible values
(PVs; Mislevy, 1991). The function uses borrowed code, see ?RMUreliability.
The PVs are then used with the RMU method described by Bignardi et al. (2025) to estimate a mean and confidence interval. The mean is similar to the expected a posteriori (EAP) reliability point estimate (Adams, 2005). The confidence interval uses the 95% highest continuous density interval (HDCI) based on the distribution of correlations.
Default setting is to generate 1000 PVs. More are recommended for stable estimates/CIs. How many more has not been systematically evaluated, but 4000 might be a good starting point. For smaller samples, more PVs is not very demanding computationally, but be wary of the time it takes to create thousands of PVs for each respondent in large samples.
PSI, Person Separation Index:
Estimated using functions in the eRm package, see ?eRm::SepRel. Note that
this excludes min/max scoring individuals, which may result in unexpected
results, especially compared to other methods.
Empirical:
Estimated using mirt::empirical_rxx(), see https://stats.stackexchange.com/questions/427631/difference-between-empirical-and-marginal-reliability-of-an-irt-model
References
Bignardi, G., Kievit, R., & Bürkner, P. C. (2025). A general method for estimating reliability using Bayesian Measurement Uncertainty. PsyArXiv. doi:10.31234/osf.io/h54k8
Mislevy, R. J. (1991). Randomization-Based Inference about Latent Variables from Complex Samples. Psychometrika, 56(2), 177–196. doi:10.1007/BF02294457
Adams, R. J. (2005). Reliability as a measurement design effect. Studies in Educational Evaluation, 31(2), 162–172. doi:10.1016/j.stueduc.2005.05.008
Examples
if (FALSE) { # \dontrun{
# comparison of a fully Bayesian Rasch model and PVs
df <- eRm::raschdat1[,1:20] %>%
rownames_to_column("id") %>%
pivot_longer(!id, names_to = "item")
library(brms)
brms_model <- brm(
value ~ 1 + (1 | item) + (1 | id),
data = df,
chains = 4,
cores = 4,
family = "bernoulli"
)
posterior_draws <- brms_model %>%
as_draws_df() %>%
dplyr::select(starts_with("r_id")) %>%
t()
RMUreliability(posterior_draws)
RIreliability(eRm::raschdat1[,1:20], draws = 4000)
} # }