Simulation-Based Infit MSQ Cutoff Determination for Multiply Imputed Data
Source:R/infit_cutoff_mi.R
RMitemInfitCutoffMI.RdExtends RMitemInfitCutoff to work with multiply imputed datasets
produced by the mice package. Runs the parametric bootstrap simulation on
each imputed dataset and stacks the resulting distributions, so that the
final cutoff intervals reflect both sampling variability and imputation
uncertainty.
Usage
RMitemInfitCutoffMI(
mids_object,
iterations = 500,
parallel = TRUE,
n_cores = NULL,
verbose = FALSE,
seed = NULL,
cutoff_method = "hdci",
hdci_width = 0.999
)Arguments
- mids_object
A
midsobject (multiply imputed dataset) as returned bymice::mice(). Each completed dataset must contain only the item response columns to be analysed (i.e., no ID or grouping variables). Items must be scored starting at 0 (non-negative integers).- iterations
Integer. Total number of simulation iterations to run across all imputations. These are distributed approximately evenly across the
mimputed datasets (default 500).- parallel
Logical. Use parallel processing via
miraiwithin each imputed dataset (defaultTRUE). Passed toRMitemInfitCutoff.- n_cores
Integer or
NULL. Number of parallel workers. Passed toRMitemInfitCutoff.- verbose
Logical. Show progress messages (default
FALSE).- seed
Integer or
NULL. Master random seed for reproducibility. A unique per-imputation seed is derived from this value.- cutoff_method
Character string specifying how cutoff intervals are computed from the stacked distribution. Either
"hdci"(default) for the Highest Density Interval viaggdist::hdci(), or"quantile"for the 2.5th/97.5th percentiles viastats::quantile().- hdci_width
Numeric. Width of the HDCI when
cutoff_method = "hdci". Default is0.999(99.9% HDCI). Ignored whencutoff_method = "quantile".
Value
A list with the same structure as RMitemInfitCutoff, so
that the result can be passed directly to RMitemInfit,
RMitemInfitMI, and RMitemInfitCutoffPlot:
resultsdata.frame with columns
iteration,imputation,Item,InfitMSQ,OutfitMSQ— the stacked simulation results from all imputed datasets.item_cutoffsdata.frame with per-item cutoff summaries:
Item,infit_low,infit_high,outfit_low,outfit_high. Computed from the stacked distribution.actual_iterationsTotal number of successful iterations across all imputations.
sample_nNumber of rows (respondents) per imputed dataset.
sample_summarySummary statistics of estimated person parameters from the first imputed dataset.
item_namesCharacter vector of item names.
cutoff_methodThe method used to compute cutoffs.
hdci_widthThe HDCI width used.
n_imputationsNumber of imputed datasets used.
iterations_per_imputationInteger vector of requested iterations per imputed dataset.
actual_iterations_per_imputationInteger vector of successful iterations per imputed dataset.
Details
The function completes each of the m imputed datasets via
mice::complete(), then calls RMitemInfitCutoff on each one. The
total number of iterations is split approximately evenly across imputations
(i.e., each imputed dataset receives ceiling(iterations / m) or
floor(iterations / m) iterations). The per-imputation simulation results
are stacked into a single distribution from which cutoff intervals are
computed, naturally incorporating imputation uncertainty.
Imputed datasets that cause model convergence failures are dropped with a warning. If all imputations fail, the function stops with an error.
The mice package must be installed (it is in Suggests, not Imports).
Examples
# \donttest{
if (requireNamespace("mice", quietly = TRUE)) {
# Create example data with missing values
set.seed(42)
sim_data <- as.data.frame(
matrix(sample(0:1, 200 * 8, replace = TRUE), nrow = 200, ncol = 8)
)
colnames(sim_data) <- paste0("Item", 1:8)
# Introduce ~10% MCAR missingness
sim_data[sample(length(sim_data), 0.10 * length(sim_data))] <- NA
# Impute (use more imputations, e.g. m = 5+, in real analyses)
imp <- mice::mice(sim_data, m = 2, method = "polr", seed = 123,
printFlag = FALSE)
# Compute simulation-based cutoffs across imputations
# (use more iterations, e.g. 250+, in real analyses)
cutoff_mi <- RMitemInfitCutoffMI(imp, iterations = 50, parallel = FALSE,
seed = 42)
cutoff_mi$item_cutoffs
# Use with RMitemInfitMI()
RMitemInfitMI(imp, cutoff = cutoff_mi)
}
#>
#>
#> Table: Pooled MSQ values from 2 imputations (Rubin's rules). n = 200 per imputed dataset. Cutoff values based on 50 total simulation iterations across 2 imputations (99.9% HDCI).
#>
#> |Item | Infit MSQ| Infit SE| Infit low| Infit high|Flagged | Relative location|
#> |:-----|---------:|--------:|---------:|----------:|:-------|-----------------:|
#> |Item1 | 1.015| 0.049| 0.869| 1.184|FALSE | -0.23|
#> |Item2 | 1.000| 0.048| 0.870| 1.109|FALSE | 0.13|
#> |Item3 | 1.009| 0.047| 0.921| 1.102|FALSE | -0.01|
#> |Item4 | 1.038| 0.047| 0.920| 1.104|FALSE | -0.05|
#> |Item5 | 0.973| 0.048| 0.887| 1.111|FALSE | 0.09|
#> |Item6 | 1.026| 0.047| 0.913| 1.101|FALSE | -0.09|
#> |Item7 | 0.929| 0.047| 0.859| 1.149|FALSE | -0.05|
#> |Item8 | 1.009| 0.047| 0.904| 1.093|FALSE | -0.03|
# }