Differential Item Functioning (DIF) Analysis for Bayesian IRT Models
Source:R/brms_dif.R
dif_statistic.RdTests for differential item functioning (DIF) in Bayesian Rasch-family models fitted with brms by comparing item parameters across subgroups defined by an exogenous variable. The function fits a DIF model that includes group-by-item interactions and summarizes the posterior distribution of the DIF effects.
Usage
dif_statistic(
model,
group_var,
item_var = item,
person_var = id,
data = NULL,
dif_type = c("uniform", "non-uniform"),
prob = 0.95,
rope = 0.5,
refit = TRUE,
...
)Arguments
- model
A fitted
brmsfitobject from the baseline (no-DIF) model.- group_var
An unquoted variable name identifying the grouping variable for DIF testing (e.g.,
gender). Must be a factor or character variable with exactly 2 levels in the current implementation.- item_var
An unquoted variable name identifying the item grouping variable. Default is
item.- person_var
An unquoted variable name identifying the person grouping variable. Default is
id.- data
An optional data frame containing all variables needed for the DIF model, including the group variable. If
NULL(the default), the function attempts to usemodel$data. Since the baseline model formula typically does not include the group variable, brms will have dropped it from the stored model data. In that case, you must supply the original data frame here.- dif_type
Character. For polytomous ordinal models only.
"uniform"(the default) tests for a uniform location shift per item via agroup:itemfixed-effect interaction."non-uniform"fits group-specific thresholds per item and computes per-threshold DIF effects as the difference between groups. Ignored for dichotomous models.- prob
Numeric in \((0, 1)\). Width of the credible intervals. Default is 0.95.
- rope
Numeric. Half-width of the Region of Practical Equivalence (ROPE) around zero for DIF effects, on the logit scale. Default is 0.5, corresponding to a practically negligible DIF effect. Set to 0 to skip ROPE analysis.
- refit
Logical. If
TRUE(the default), the DIF model is fitted automatically by updating the baseline model viaupdate, which reuses the compiled Stan code for faster sampling. IfFALSE, only the DIF model formula is returned (useful for manual fitting with custom settings).- ...
Additional arguments passed to
update.brmsfitwhen refitting the DIF model (e.g.,cores,control).
Value
A list with the following elements:
- summary
A
tibblewith one row per item (for uniform DIF) or per item × threshold (for non-uniform DIF) containing:item, optionallythreshold,dif_estimate(posterior mean),dif_lower,dif_upper(credible interval),dif_sd(posterior SD),pd(probability of direction),rope_percentage(proportion inside ROPE), andflag(classification).- dif_draws
A matrix of posterior draws for the DIF effects (draws × effects), for further analysis.
- dif_model
The fitted DIF
brmsfitobject (ifrefit = TRUE), orNULL.- dif_formula
The
brmsformulaused for the DIF model.- baseline_model
The original baseline model.
- plot
A
ggplotforest plot of DIF effects with credible intervals and ROPE.
Details
For polytomous models, two types of DIF can be tested:
- Uniform DIF (
dif_type = "uniform", default) A single location shift per item across groups, modelled as a
group:itemfixed-effect interaction. This tests whether the average item difficulty differs between groups.- Non-uniform / threshold-level DIF
(
dif_type = "non-uniform") Each item receives group-specific thresholds via
thres(gr = interaction(item, group)). DIF effects are computed as the difference in each threshold between groups, revealing whether DIF affects specific response categories.
The function constructs a DIF model by adding a group-by-item interaction to the baseline model:
Dichotomous models (
family = bernoulli()): The baselineresponse ~ 1 + (1 | item) + (1 | id)becomesresponse ~ 1 + group + (1 + group | item) + (1 | id), where the group slope varying by item captures item-specific DIF.Polytomous uniform DIF (
dif_type = "uniform"): The baselineresponse | thres(gr = item) ~ 1 + (1 | id)becomesresponse | thres(gr = item) ~ 1 + group:item + (1 | id).Polytomous non-uniform DIF (
dif_type = "non-uniform"): The baseline becomesresponse | thres(gr = item_group) ~ 1 + (1 | id), whereitem_group = interaction(item, group). Each item × group combination gets its own thresholds. DIF effects are the differences between group-specific thresholds for each item, computed draw-by-draw from the posterior.
DIF effects are summarized using:
- Probability of Direction (pd)
The proportion of the posterior on the dominant side of zero. Values > 0.975 indicate strong directional evidence.
- ROPE
The Region of Practical Equivalence (Kruschke, 2018). If > 95\ the DIF effect is practically negligible. If > 95\ outside, the effect is practically significant.
- Credible Interval
If the CI excludes zero, there is evidence of DIF at the specified credibility level.
References
Kruschke, J. K. (2018). Rejecting or accepting parameter values in Bayesian estimation. Advances in Methods and Practices in Psychological Science, 1(2), 270–280.
Bürkner, P.-C. (2021). Bayesian Item Response Modeling in R with brms and Stan. Journal of Statistical Software, 100, 1–54. doi:10.18637/jss.v100.i05
See also
infit_statistic for item fit,
q3_statistic for local dependence,
brm,
hypothesis.
Examples
# \donttest{
library(brms)
library(dplyr)
library(tidyr)
library(tibble)
# --- Dichotomous Rasch with DIF testing ---
set.seed(123)
df <- expand.grid(id = 1:200, item = paste0("I", 1:10)) %>%
mutate(
gender = rep(sample(c("M", "F"), 200, TRUE), each = 10),
theta = rep(rnorm(200), each = 10),
delta = rep(seq(-2, 2, length.out = 10), 200),
dif = ifelse(item == "I3" & gender == "F", 1.0,
ifelse(item == "I7" & gender == "F", -0.8, 0)),
p = plogis(theta - delta - dif),
response = rbinom(n(), 1, p)
)
fit_base <- brm(
response ~ 1 + (1 | item) + (1 | id),
data = df,
family = bernoulli(),
chains = 4,
cores = 2, # use more cores if you have
iter = 1000 # use at least 2000
)
#> Compiling Stan program...
#> Error in .fun(model_code = .x1): Boost not found; call install.packages('BH')
dif_result <- dif_statistic(
model = fit_base,
group_var = gender,
data = df
)
#> Error: object 'fit_base' not found
dif_result$summary
#> Error: object 'dif_result' not found
dif_result$plot
#> Error: object 'dif_result' not found
# --- Partial Credit Model: uniform DIF ---
df_pcm <- eRm::pcmdat2 %>%
mutate(across(everything(), ~ .x + 1)) %>%
rownames_to_column("id") %>%
mutate(gender = sample(c("M", "F"), n(), TRUE)) %>%
pivot_longer(!c(id, gender),
names_to = "item", values_to = "response")
fit_pcm <- brm(
response | thres(gr = item) ~ 1 + (1 | id),
data = df_pcm,
family = acat,
chains = 4,
cores = 2, # use more cores if you have
iter = 1000 # use at least 2000
)
#> Compiling Stan program...
#> Error in .fun(model_code = .x1): Boost not found; call install.packages('BH')
# Uniform DIF (default): one shift per item
dif_uni <- dif_statistic(fit_pcm, group_var = gender, data = df_pcm)
#> Error: object 'fit_pcm' not found
dif_uni$plot
#> Error: object 'dif_uni' not found
# Non-uniform DIF: threshold-level effects
dif_nu <- dif_statistic(fit_pcm, group_var = gender, data = df_pcm,
dif_type = "non-uniform")
#> Error: object 'fit_pcm' not found
dif_nu$summary
#> Error: object 'dif_nu' not found
dif_nu$plot
#> Error: object 'dif_nu' not found
# }