Rasch DIF magnitude & item split

Brief exploration of Differential Item Functioning in R

Author

Affiliation

Magnus Johansson

RISE Research Institutes of Sweden

Published

October 27, 2024

1 Background

Out of interest of better understanding som aspects of differential item functioning (DIF, a form of invariance test):

how much does DIF affect estimated thetas (factor scores)?
how to do item-split in R (creating separate items for subgroups from one item with problematic DIF)
how an item-split compares to removing the item (and keeping the DIF item) in terms of absolute differences in estimated thetas

We’ll use simulated data in order to have knowledge of the thetas used to generate response data (“input thetas” in the text below), and make objective comparisons using the different estimated thetas.

Ideally, this would be a simulation study where we create lots of datasets with a systematic variation in some parameters to investigate effects. But maybe this is a first step towards that.

Code

library(tidyverse)
library(eRm)
library(iarm)
library(mirt)
library(catR)
library(easyRasch) # devtools::install_github("pgmj/easyRasch", dependencies = TRUE)
library(summarytools)
library(ghibli)
library(broom)

### some commands exist in multiple packages, here we define preferred ones that are frequently used
select <- dplyr::select
count <- dplyr::count
recode <- car::recode
rename <- dplyr::rename

theme_rise <- function(fontfamily = "Lato", axissize = 13, titlesize = 15,
                       margins = 12, axisface = "plain", panelDist = 0.6, ...) {
  theme_minimal() +
  theme(
    text = element_text(family = fontfamily),
    axis.title.x = element_text(
      margin = margin(t = margins),
      size = axissize
    ),
    axis.title.y = element_text(
      margin = margin(r = margins),
      size = axissize
    ),
    plot.title = element_text(
      face = "bold",
      size = titlesize
    ),
    axis.title = element_text(
      face = axisface
    ),
    plot.caption = element_text(
      face = "italic"
    ),
    legend.text = element_text(family = fontfamily),
    legend.background = element_rect(color = "lightgrey"),
    strip.background = element_rect(color = "lightgrey"),
    panel.spacing = unit(panelDist, "cm", data = NULL),
    panel.border = element_rect(color = "grey", fill = NA),
    ...
  )
}

theme_set(theme_rise())

1.1 Simulate response data

First define input parameters for items.

Code

# make a tibble/dataframe also, for possible later use
inputParams1 <- tibble(
  q1  = c(1.2, 1.8, 2.4),
  q2  = c(-1.3, -0.5, 0.5),
  q3a = c(-0.3, 0.3, 1.2), # this is the DIF item
  q3b = c(-0.3+1, 0.3+1, 1.2+1), # this is the DIF item
  q4  = c(0.1, 0.6, 1.6),
  q5  = c(-0.3, 0.7, 1.5),
  q6  = c(-1.6, -1, -0.3),
  q7  = c(1, 1.8, 2.5),
  q8  = c(-1.3, -0.7, 0.4),
  q9  = c(-0.8, 1.4, 1.9),
  q10 = c(0.25, 1.25, 2.15)
) %>%
  t() %>%
  as.matrix()

# center to 0
inputParams1c <- inputParams1 - mean(inputParams1)

# item list for simulation for group 1
tlist1 <- list(
  q1 =  list(inputParams1c[1,]),
  q2 =  list(inputParams1c[2,]),
  q3 =  list(inputParams1c[3,]), # this is the DIF item
  q4 =  list(inputParams1c[5,]),
  q5 =  list(inputParams1c[6,]),
  q6 =  list(inputParams1c[7,]),
  q7 =  list(inputParams1c[8,]),
  q8 =  list(inputParams1c[9,]),
  q9 =  list(inputParams1c[10,]),
  q10 = list(inputParams1c[11,])
)

# item list for simulation for group 2
tlist2 <- list(
  q1 =  list(inputParams1c[1,]),
  q2 =  list(inputParams1c[2,]),
  q3 =  list(inputParams1c[4,]), # this is the DIF item
  q4 =  list(inputParams1c[5,]),
  q5 =  list(inputParams1c[6,]),
  q6 =  list(inputParams1c[7,]),
  q7 =  list(inputParams1c[8,]),
  q8 =  list(inputParams1c[9,]),
  q9 =  list(inputParams1c[10,]),
  q10 = list(inputParams1c[11,])
)

Then generate random thetas that we save to file to be able to reproduce the analysis.

Code

# simulate thetas
thetas1 <- rnorm(300, mean = 0, sd = 1.5)
thetas2 <- rnorm(300, mean = 0, sd = 1.5)

input_thetas <- c(thetas1,thetas2)

# simulate response data based on the above defined item thresholds
td1 <- SimPartialScore(
  deltaslist = tlist1,
  thetavec = thetas1
) %>%
  as.data.frame()

td2 <- SimPartialScore(
  deltaslist = tlist2,
  thetavec = thetas2
) %>%
  as.data.frame()

d <- rbind(td1,td2) %>% 
  add_column(group = rep(1:2, each = 300))

dif.group <- factor(d$group)
d$group <- NULL

all_data <- list(simResponses = d,
                 dif_group = dif.group,
                 input_thetas = input_thetas)
# save simulated data for reproducibility
#saveRDS(all_data,"dif_magnitude_1_0.Rdata")

Code

# read simulated data for reprodubility
all_data <- readRDS("dif_magnitude_1_0.Rdata")

d <- all_data$simResponses
dif.group <- all_data$dif_group
input_thetas <- all_data$input_thetas

We now have 10 items with 4 categories each. There are 600 respondents in all, with 300 showing differential item functioning for one item (item q3). DIF is induced at +1 logit uniform difference in location (all thresholds for item q3 are unformly +1 logits).

2 DIF assessment

Let’s test for DIF with some different methods.

2.1 LR-test

Code

RIdifTableLR(d, dif.group)

	Item locations				Standard errors
Item	1	2	MaxDiff	All	SE_1	SE_2	SE_All
q1	1.561	1.309	0.252	1.425	0.215	0.221	0.153
q2	-0.691	-0.888	0.197	-0.792	0.205	0.198	0.142
q3	0.14	0.899	0.759	0.492	0.186	0.196	0.132
q4	0.641	0.628	0.013	0.634	0.184	0.192	0.131
q5	0.345	0.133	0.212	0.233	0.183	0.181	0.128
q6	-1.285	-1.345	0.06	-1.307	0.244	0.222	0.164
q7	1.411	1.46	0.049	1.419	0.206	0.223	0.151
q8	-0.87	-0.891	0.021	-0.873	0.219	0.197	0.146
q9	0.546	0.445	0.101	0.485	0.183	0.185	0.129
q10	0.919	0.927	0.008	0.916	0.187	0.195	0.134
Note:
Values highlighted in red are above the chosen cutoff 0.5 logits. Background color brown and blue indicate the lowest and highest values among the DIF groups.

Code

RIdifThreshFigLR(d, dif.group)

2.2 Partial gamma

Code

RIpartgamDIF(d, dif.group)

Item	Partial gamma	SE	Lower CI	Upper CI	Adjusted p-value (BH)
q3	-0.454	0.071	-0.594	-0.314	0

2.3 Psychotree

Code

RIdifTable(d, dif.group)

Item	2	3	Mean location	StDev	MaxDiff
q1	1.289	1.041	1.165	0.176	0.248
q2	-0.962	-1.155	-1.059	0.136	0.193
q3	-0.132	0.631	0.250	0.540	0.763
q4	0.369	0.360	0.365	0.006	0.009
q5	0.073	-0.135	-0.031	0.147	0.208
q6	-1.557	-1.613	-1.585	0.039	0.056
q7	1.140	1.192	1.166	0.037	0.053
q8	-1.142	-1.159	-1.150	0.012	0.017
q9	0.274	0.177	0.226	0.068	0.097
q10	0.647	0.659	0.653	0.008	0.012

DIF clearly shown, however it is closer to 0.8 logits than the 1.0 used in the input values. This is still generally considered to be a large DIF size, so it should serve our purpose.

2.4 Item split

Now, we’ll do an item split and compare thetas for both groups with and without split, and also with the DIF item removed.

Code

thetas_together <- RIestThetas(d)

thetas_q3_removed <- RIestThetas(d %>% select(!q3))

d2 <- d %>% 
  add_column(group = dif.group) %>% 
  mutate(q3a = if_else(group == 1, q3, NA), 
         q3b = if_else(group == 2, q3, NA)
         ) %>% 
  select(!group) %>%
  select(!q3)

Let’s look at the data and a targeting plot.

Code

RItileplot(d2)

Code

RImissing(d2)

Code

RItargeting(d2)

Item threshold locations for the q3 split items range from -1 logits to +1.5 logits along the theta/latent continuum. This is roughly the range where we expect some impact from DIF.

Comparing targeting to non-split data.

Code

RItargeting(d)

Code

itemlabels <- data.frame(itemnr = names(d), item = "")
RIitemHierarchy(d)

2.5 Estimating thetas

First, we’ll use the convenient function RIestThetas() that estimates item parameters (using eRm) and thetas (using iarm), to see how that works when we have a split item, each with missing data for 50% of respondents.

Code

thetas_separate <- RIestThetas(d2)

hist(thetas_separate$WLE, breaks = 30, col = "lightblue")
hist(thetas_together$WLE, breaks = 30, col = "lightpink")
hist(thetas_q3_removed$WLE, breaks = 30, , col = "sienna4")

The upper range is rather different for the item split subgroup when using this method, with max score of about 2.2, compared to 4.1 for the item set with the original q3 item and entirely without q3.

Let’s review the estimated threshold locations.

2.6 Comparing item parameters

Code

RIitemparams(d)

	Threshold 1	Threshold 2	Threshold 3	Item location
q1	0.74	1.14	1.61	1.16
q2	-1.94	-1.12	-0.10	-1.05
q3	-0.28	0.16	0.80	0.23
q4	-0.28	0.13	1.27	0.37
q5	-0.89	0.04	0.75	-0.03
q6	-2.25	-1.67	-0.78	-1.57
q7	0.65	1.01	1.80	1.16
q8	-1.82	-1.39	-0.20	-1.14
q9	-1.33	0.58	1.42	0.22
q10	-0.24	0.62	1.58	0.65
Note:
Item location is the average of the thresholds for each item.

Code

RIitemparams(d2)

	Threshold 1	Threshold 2	Threshold 3	Item location
q1	0.71	1.12	1.60	1.14
q2	-1.97	-1.15	-0.13	-1.08
q4	-0.32	0.10	1.25	0.35
q5	-0.92	0.02	0.73	-0.06
q6	-2.29	-1.70	-0.81	-1.6
q7	0.62	0.99	1.80	1.14
q8	-1.85	-1.42	-0.22	-1.17
q9	-1.37	0.55	1.41	0.2
q10	-0.27	0.59	1.57	0.63
q3a	-0.88	-0.04	0.35	-0.19
q3b	0.10	0.36	1.47	0.64
Note:
Item location is the average of the thresholds for each item.

Code

RIitemparams(d %>% select(!q3))

	Threshold 1	Threshold 2	Threshold 3	Item location
q1	0.76	1.19	1.64	1.19
q2	-1.91	-1.11	-0.08	-1.03
q4	-0.27	0.15	1.31	0.4
q5	-0.87	0.06	0.79	-0.01
q6	-2.22	-1.66	-0.76	-1.55
q7	0.67	1.05	1.83	1.19
q8	-1.79	-1.38	-0.17	-1.12
q9	-1.32	0.60	1.46	0.25
q10	-0.22	0.65	1.62	0.68
Note:
Item location is the average of the thresholds for each item.

Code

dp <- RIitemparams(d, output = "dataframe") %>% 
  select(!Location) %>% 
  set_names(paste0("orig_",1:3)) %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))
dp2 <- RIitemparams(d2, output = "dataframe") %>% 
  select(!Location) %>% 
  set_names(paste0("split_",1:3)) %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))
dp3 <- RIitemparams(d %>% select(!q3), output = "dataframe") %>% 
  select(!Location) %>% 
  set_names(paste0("q3rem_",1:3))  %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))

d_params <- cbind(dp,dp2[,-1],dp3[,-1]) # bind columns, keeping one "item" column

dpin <- inputParams1c %>% 
  as.data.frame() %>% 
  set_names(paste0("input_",1:3)) %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))

dpin_long <- dpin %>% 
  pivot_longer(!item,
               names_sep = "_",
               names_to = c("source","threshold"),
               values_to = "input")

Code

d_params %>% 
  pivot_longer(!item, 
               names_sep = "_",
               names_to = c("source","threshold")) %>% 
  left_join(dpin_long[,-2], by = c("item","threshold")) %>% 
  group_by(source,item,threshold) %>% 
  summarise(abs_diff = abs(input - value)) %>% 
  
  ggplot(aes(x = threshold, y = abs_diff, color = source)) +
  geom_point(size = 2, alpha = 0.85) +
  geom_line(aes(group = source)) +
  facet_wrap(~ item) +
  labs(x = "Absolute bias (logits)")

Code

itemlocs <- RIitemparams(d %>% select(!q3), output = "dataframe") %>% 
  pull(Location)

item_order <- sort(as.numeric(itemlocs))

d_params %>% 
  pivot_longer(!item, 
               names_sep = "_",
               names_to = c("source","threshold")) %>% 
  left_join(dpin_long[,-2], by = c("item","threshold")) %>% 
  group_by(source,item,threshold) %>% 
  summarise(abs_diff = abs(input - value)) %>% 
  ungroup() %>% 
  group_by(source,item) %>% 
  summarise(sum_diff = sum(abs_diff)) %>% 
  ungroup() %>% 
  pivot_wider(values_from = "sum_diff",
              names_from = "source",
              id_cols = "item") %>% 
  pivot_longer(!item) %>% 
  mutate(item = factor(item, levels = names(itemlocs), 
                        labels = paste0(names(itemlocs),"_",as.numeric(itemlocs)))) %>% 
  ggplot(aes(x = item, 
             y = value, 
             color = name)
         ) +
  geom_point(size = 3) +
  labs(title = "Total estimation bias for each item (all thresholds) and item set",
       y = "Absolute bias (logits)",
       x = "Item (and average location)") +
  scale_y_continuous(limits = c(0,NA))

2.6.1 `mirt` comparison

Perhaps the mirt package could be more accurate in item threshold estimation when we have item split?

Code

mirt_out <- mirt(data = d, model = 1, itemtype = "Rasch", verbose = FALSE)
mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
  as.data.frame() %>%
  select(!a) %>%
  set_names(paste0("orig_",1:3)) %>%   
  as.matrix()
dpm <- mirt_params - mean(mirt_params)

dpm <- dpm %>% 
  as.data.frame() %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))

mirt_out <- mirt(data = d2, model = 1, itemtype = "Rasch", verbose = FALSE)
mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
  as.data.frame() %>%
  select(!a) %>%
  set_names(paste0("split_",1:3)) %>% 
  as.matrix()
dpm2 <- mirt_params - mean(mirt_params)

dpm2 <- dpm2 %>% 
  as.data.frame() %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))

mirt_out <- mirt(data = d %>% select(!q3), model = 1, itemtype = "Rasch", verbose = FALSE)
mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
  as.data.frame() %>%
  select(!a) %>%
  set_names(paste0("q3rem_",1:3)) %>% 
  as.matrix()
dpm3 <- mirt_params - mean(mirt_params)

dpm3 <- dpm3 %>% 
  as.data.frame() %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))


d_params2 <- cbind(dpm,dpm2[,-1],dpm3[,-1]) # bind columns, keeping one "item" column

Plot estimation bias
Summary plot estimation bias

Code

d_params2 %>% 
  pivot_longer(!item, 
               names_sep = "_",
               names_to = c("source","threshold")) %>% 
  left_join(dpin_long[,-2], by = c("item","threshold")) %>% 
  group_by(source,item,threshold) %>% 
  summarise(abs_diff = abs(input - value)) %>% 
  
  ggplot(aes(x = threshold, y = abs_diff, color = source)) +
  geom_point(alpha = 0.85) +
  facet_wrap(~ item)

Seems indentical to eRm estimates.

Code

d_params2 %>% 
  pivot_longer(!item, 
               names_sep = "_",
               names_to = c("source","threshold")) %>% 
  left_join(dpin_long[,-2], by = c("item","threshold")) %>% 
  group_by(source,item,threshold) %>% 
  summarise(abs_diff = abs(input - value)) %>% 
  ungroup() %>% 
  group_by(source,item) %>% 
  summarise(sum_diff = sum(abs_diff)) %>% 
  ungroup() %>% 
  pivot_wider(values_from = "sum_diff",
              names_from = "source",
              id_cols = "item") %>% 
  pivot_longer(!item) %>% 
  mutate(item = factor(item, levels = names(itemlocs), 
                        labels = paste0(names(itemlocs),"_",as.numeric(itemlocs)))) %>% 
  ggplot(aes(x = item, 
             y = value, 
             color = name)
         ) +
  geom_point(size = 3) +
  labs(title = "mirt: Total estimation bias for each item and item set")

No difference here either.

2.7 Theta estimation investigated

The method for estimating item parameters and thetas used in the function RIestThetas() may be at fault for the odd results in thetas estimated by the item set with item split? We can separate the two steps, and use a separate function for theta estimation with manual input of item parameters.

Code

itemps <- RIitemparams(d2, output = "dataframe") %>% 
  select(!Location) %>% 
  as.matrix()

thetas_separate_catR <- RIestThetasOLD(d2, itemParams = itemps, theta_range = c(-8,8))

c <- tibble(together_RIestThetas = thetas_together$WLE,
           together_q3_rem = thetas_q3_removed$WLE,
           separate_catR = thetas_separate_catR,
           separate_RIestThetas = thetas_separate$WLE,
           input_thetas = input_thetas)

c %>% 
  pivot_longer(everything(),
               names_to = "method",
               values_to = "theta") %>% 
  ggplot(aes(x = theta)) +
  geom_histogram(bins = 50) +
  facet_wrap(~method, axes = "all_x") +
  scale_x_continuous(breaks = seq(-5,5,1))

Looks like the two step approach worked a lot better. Since the item parameter estimation is identical (both are using eRm::PCM()), the reason should be the difference in theta estimation. The two-step approach uses catR::thetaEst() for theta estimation, which is probably handling missing data better than iarm::person_estimates(). Note: both approaches use the Weighted Likelihood Estimation to minimize bias (Warm, 1989).

3 Results

3.1 Summarised

First, absolute differences in estimated thetas compared to input thetas. By using absolute differences we can assess both DIF groups simultaneously.

Code

c_diff <- c %>% 
  mutate(with_q3 = abs(together_RIestThetas - input_thetas),
         q3_removed = abs(together_q3_rem - input_thetas),
         q3_split = abs(separate_catR - input_thetas)) %>% 
  select(!names(c))
  
c_diff %>% 
  pivot_longer(everything()) %>% 
  ggplot(aes(x = value)) +
  geom_histogram(bins = 100) +
  facet_wrap(~ name, ncol = 1) +
  labs(x = "Absolute difference in logits",
       title = "Comparing input thetas to estimated",
       subtitle = "Distribution of bias")

Code

c_diff_descr <- descr(c_diff) %>% 
  as.data.frame() %>% 
  rownames_to_column("Parameter") 

c_diff_descr[1:9,] %>% 
  pivot_longer(!Parameter) %>% 
  
  ggplot(aes(x = Parameter, y = value, color = name)) +
  geom_point(alpha = 0.85) +
  scale_color_viridis_d('Item set', end = 0.8) +
  labs(y = "Logits",
       x = "Descriptive metric")

Code

c_diff_descr[1:9,] %>% 
  mutate_if(is.numeric, round, 3) %>% 
  kbl_rise(tbl_width = 50)

Parameter	q3_removed	q3_split	with_q3
Mean	0.378	0.354	0.358
Std.Dev	0.288	0.275	0.278
Min	0.000	0.001	0.000
Q1	0.159	0.125	0.136
Median	0.316	0.292	0.302
Q3	0.546	0.525	0.518
Max	1.729	1.732	1.770
MAD	0.283	0.272	0.278
IQR	0.385	0.400	0.381

No real differences in these summary metrics.

We should look more closely at the particular region where the DIF item is located, since it should have the most impact there.

3.2 Across the latent continuum

First, the test information function (TIF) curve could be of interest to understand what to expect in terms of estimation bias due to reliability limitations. Even more interesting is the table showing range of SEM.

Code

RItif(d, samplePSI = TRUE)

Code

sem_orig <- RIscoreSE(d, output = "dataframe") %>% 
  clean_names() %>% 
  round(3)

sem_orig_min <- min(sem_orig$logit_std_error)

sem_orig %>% 
  ggplot(aes(x = logit_score, y = logit_std_error)) +
  geom_point(size = 2.5) +
  geom_line() +
  geom_text(data = . %>% 
              filter(logit_std_error == sem_orig_min),
            aes(label = sem_orig_min), 
            position = position_nudge(y = -0.03)) +
  scale_x_continuous(breaks = seq(-5,5,0.5)) +
  scale_y_continuous(limits = c(0,NA), breaks = seq(0,1,0.1))

Code

RItif(d %>% select(!q3), samplePSI = TRUE)

Code

sem_noq3 <- RIscoreSE(d %>% select(!q3), output = "dataframe") %>% 
  clean_names() %>% 
  round(3)

sem_noq3_min <- min(sem_noq3$logit_std_error)

sem_noq3 %>% 
  ggplot(aes(x = logit_score, y = logit_std_error)) +
  geom_point(size = 2.5) +
  geom_line() +
  geom_text(data = . %>% 
              filter(logit_std_error == sem_noq3_min) %>% 
              slice(1),
            aes(label = sem_noq3_min), 
            position = position_nudge(y = -0.03)) +
  scale_x_continuous(breaks = seq(-5,5,0.5)) +
  scale_y_continuous(limits = c(0,NA), breaks = seq(0,1,0.1))

The lowest SEM with q3 is 0.384, at logit score 0.356.

Lowest SEM without q3 is 0.416, at logit score 0.322 to 0.491, which makes a difference in minimal SEM of about 0.032 compared to including the DIF item. 0.032 * 1.96 = 0.063 for a 95% CI.

3.3 Theta estimation bias

Code

c_diff %>% 
  add_column(Theta = input_thetas) %>% 
  pivot_longer(!Theta) %>% 
  
  ggplot(aes(x = Theta, y = value, color = factor(name), fill = factor(name))) +
  #geom_point(alpha = 0.85) +
  geom_smooth(method = "loess",
              aes(linetype = factor(name)), alpha = 0.15) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  scale_fill_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Theta estimation bias",
       y = "Absolute difference (logits)") +
  guides(fill = "none", linetype = "none") +
  scale_x_continuous(breaks = seq(-5,5,1)) +
  scale_y_continuous(limits = c(0,NA), breaks = seq(0,1.2,0.1)) +
  theme_rise()

Code

c_diff %>% 
  add_column(Theta = input_thetas) %>% 
  pivot_longer(!Theta) %>% 
  
  ggplot(aes(x = Theta, y = value, color = factor(name), fill = factor(name))) +
  geom_point(alpha = 0.85) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  scale_fill_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Theta estimation bias",
       y = "Absolute difference in logits") +
  guides(fill = "none", linetype = "none") +
  scale_x_continuous(breaks = seq(-5,5,1)) +
  scale_y_continuous(limits = c(0,NA), breaks = seq(0,2,0.1)) +
  theme_rise()

Code

c_diff %>% 
  add_column(Theta = input_thetas) %>% 
  pivot_longer(!Theta) %>% 
  
  ggplot(aes(x = Theta, y = value, color = factor(name), fill = factor(name))) +
  #geom_point(alpha = 0.85) +
  geom_smooth(method = "loess",
              aes(linetype = factor(name)), alpha = 0.15) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  scale_fill_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Theta estimation bias",
       subtitle = "Cropped to improve readability",
       y = "Absolute difference (logits)") +
  guides(fill = "none", linetype = "none") +
  scale_x_continuous(limits = c(-2,2), breaks = seq(-5,5,1)) +
  scale_y_continuous(breaks = seq(0,1.2,0.05)) +
  theme_rise()

Item split slightly reduces bias compared to keeping the DIF item (3), while removing item 3 increases bias in the theta range of about -1 to +1.5 logits. The maximum bias looks like approximately 0.055 logits (at theta = 0), according to the loess smoothed line.

3.4 Statistical analysis of means (limited range)

While the practical impact of theta estimation bias induced by a DIF variable should be judged by how problematic the maximum bias is for the intended use and need for precision, it could be interesting to quantify the differences using statistical analysis. It is sometimes suggested to look at the mean of the groups, but I think this is mistaken. The bias is local and related to the DIF item’s location, which makes it relevant to look at that region separately.

As such, the comparison below is about mean differences in estimated thetas to input thetas, limited to the theta range from -0.5 to +1 logits, where differences seem the biggest according to Figure 1. Note that the figure shows loess smoothed lines that include both groups.

We need to do this separately for the two DIF groups, since the groups will have opposite effects of the DIF.

Code

c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 1) %>% 
  select(c(separate_catR,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  nrow()

[1] 130

Code

c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 2) %>% 
  select(c(separate_catR,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  nrow()

[1] 115

n = 130 and 115. More than a third of each group is located within the range affected by DIF.

Code

lm_all <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 1) %>% 
  select(c(together_RIestThetas,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

lm_q3rem <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 1) %>% 
  select(c(together_q3_rem,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

lm_q3split <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 1) %>% 
  select(c(separate_catR,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

g1 <- bind_rows(tidy(lm_all, conf.int = TRUE, conf.level = .84) %>% slice(2),
          tidy(lm_q3rem, conf.int = TRUE, conf.level = .84) %>% slice(2),
          tidy(lm_q3split, conf.int = TRUE, conf.level = .84) %>% slice(2)
          ) %>% 
  select(!term) %>% 
  round(3) %>% 
  add_column(Model = c("All items","Q3 removed","Q3 split"), .before = "estimate")

Code

lm_all2 <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 2) %>% 
  select(c(together_RIestThetas,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

lm_q3rem2 <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 2) %>% 
  select(c(together_q3_rem,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

lm_q3split2 <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 2) %>% 
  select(c(separate_catR,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

g2 <- bind_rows(tidy(lm_all2, conf.int = TRUE, conf.level = .84) %>% slice(2),
          tidy(lm_q3rem2, conf.int = TRUE, conf.level = .84) %>% slice(2),
          tidy(lm_q3split2, conf.int = TRUE, conf.level = .84) %>% slice(2)
          ) %>% 
  select(!term) %>% 
  round(3) %>% 
  add_column(Model = c("All items","Q3 removed","Q3 split"), .before = "estimate")

3.4.1 Results figure

Code

rbind(g1,g2) %>% 
  add_column(Group = rep(c("Group 1","Group 2"), each = 3)) %>% 
  
  ggplot(aes(x = Group, y = estimate, color = Model)) +
  geom_point(position = position_dodge(width = 0.2)) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
                width = 0.1,
                position = position_dodge(width = 0.2)) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Mean bias in theta estimation",
       subtitle = "Across a limited theta region (-0.5 to 1.0 logits)",
       y = "Model estimate", 
       x = "")

Since we are primarily interested in comparing the different item sets to each other, it is not the difference from input thetas (estimate = 0) that is most relevant here. As such, I chose to display 84% confidence intervals to be able to assess differences between item sets in each group by looking at whether the CI’s overlap or not.

3.4.2 Theta bias - loess by group

Code

c_diff %>% 
  add_column(Theta = input_thetas) %>% 
  mutate(Group = case_match(dif.group, 
                            "1" ~ "Group 1",
                            "2" ~ "Group 2")) %>% 
  pivot_longer(!(c("Theta","Group"))) %>% 
  
  ggplot(aes(x = Theta, y = value, color = factor(name), fill = factor(name))) +
  #geom_point(alpha = 0.85) +
  geom_smooth(method = "loess",
              aes(linetype = factor(name)), alpha = 0.15) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  scale_fill_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Theta estimation bias",
       subtitle = "Cropped to improve readability",
       y = "Absolute difference (logits)") +
  guides(fill = "none", linetype = "none") +
  scale_x_continuous(limits = c(-2,2), breaks = seq(-5,5,1)) +
  scale_y_continuous(breaks = seq(0,1.2,0.05)) +
  facet_wrap(~Group, ncol = 1,) +
  theme_rise()

4 Session info

Code

sessionInfo()

R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Stockholm
tzcode source: internal

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] broom_1.0.7          ghibli_0.3.4         summarytools_1.0.1  
 [4] RISEkbmRasch_0.2.4.6 janitor_2.2.0        hexbin_1.28.4       
 [7] glue_1.7.0           ggrepel_0.9.6        patchwork_1.3.0     
[10] reshape_0.8.9        matrixStats_1.4.1    psychotree_0.16-1   
[13] psychotools_0.7-4    partykit_1.2-22      mvtnorm_1.3-1       
[16] libcoin_1.0-10       psych_2.4.6.26       kableExtra_1.4.0    
[19] formattable_0.2.1    catR_3.17            mirt_1.42           
[22] lattice_0.22-6       iarm_0.4.3           eRm_1.0-6           
[25] lubridate_1.9.3      forcats_1.0.0        stringr_1.5.1       
[28] dplyr_1.1.4          purrr_1.0.2          readr_2.1.5         
[31] tidyr_1.3.1          tibble_3.2.1         ggplot2_3.5.1       
[34] tidyverse_2.0.0     

loaded via a namespace (and not attached):
  [1] vcd_1.4-12           rstudioapi_0.16.0    audio_0.1-11        
  [4] jsonlite_1.8.9       magrittr_2.0.3       magick_2.8.5        
  [7] farver_2.1.2         rmarkdown_2.28       vctrs_0.6.5         
 [10] base64enc_0.1-3      htmltools_0.5.8.1    curl_5.2.3          
 [13] cellranger_1.1.0     Formula_1.2-5        dcurver_0.9.2       
 [16] parallelly_1.38.0    htmlwidgets_1.6.4    plyr_1.8.9          
 [19] testthat_3.2.1.1     zoo_1.8-12           lifecycle_1.0.4     
 [22] pkgconfig_2.0.3      Matrix_1.7-0         R6_2.5.1            
 [25] fastmap_1.2.0        snakecase_0.11.1     future_1.34.0       
 [28] digest_0.6.37        colorspace_2.1-1     rprojroot_2.0.4     
 [31] prismatic_1.1.2      Hmisc_5.1-3          vegan_2.6-8         
 [34] labeling_0.4.3       progressr_0.14.0     fansi_1.0.6         
 [37] timechange_0.3.0     abind_1.4-5          mgcv_1.9-1          
 [40] compiler_4.4.1       here_1.0.1           vcdExtra_0.8-5      
 [43] pander_0.6.5         withr_3.0.1          htmlTable_2.4.3     
 [46] backports_1.5.0      carData_3.0-5        relimp_1.0-5        
 [49] R.utils_2.12.3       MASS_7.3-61          sessioninfo_1.2.2   
 [52] GPArotation_2024.3-1 permute_0.9-7        tools_4.4.1         
 [55] foreign_0.8-87       lmtest_0.9-40        future.apply_1.11.2 
 [58] nnet_7.3-19          R.oo_1.26.0          nlme_3.1-165        
 [61] inum_1.0-5           checkmate_2.3.2      reshape2_1.4.4      
 [64] cluster_2.1.6        generics_0.1.3       snow_0.4-4          
 [67] gtable_0.3.5         RPushbullet_0.3.4    tzdb_0.4.0          
 [70] R.methodsS3_1.8.2    ca_0.71.1            data.table_1.16.0   
 [73] hms_1.1.3            car_3.1-2            xml2_1.3.6          
 [76] Deriv_4.1.3          utf8_1.2.4           pillar_1.9.0        
 [79] splines_4.4.1        pryr_0.1.6           survival_3.7-0      
 [82] tidyselect_1.2.1     pbapply_1.7-2        knitr_1.48          
 [85] gridExtra_2.3        svglite_2.1.3        xfun_0.46           
 [88] gnm_1.1-5            rapportools_1.1      brio_1.1.5          
 [91] stringi_1.8.4        yaml_2.3.10          evaluate_0.24.0     
 [94] codetools_0.2-20     beepr_2.0            tcltk_4.4.1         
 [97] cli_3.6.3            rpart_4.1.23         qvcalc_1.0.3        
[100] systemfonts_1.1.0    munsell_0.5.1        Rcpp_1.0.13         
[103] readxl_1.4.3         globals_0.16.3       parallel_4.4.1      
[106] listenv_0.9.1        viridisLite_0.4.2    SimDesign_2.17.1    
[109] scales_1.3.0         rlang_1.1.4          mnormt_2.1.1

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{johansson2024,
  author = {Johansson, Magnus},
  title = {Rasch {DIF} Magnitude \& Item Split},
  date = {2024-10-27},
  url = {https://pgmj.github.io/dif_magnitude.html},
  langid = {en}
}

For attribution, please cite this work as:

Johansson, M. (2024, October 27). Rasch DIF magnitude & item split. https://pgmj.github.io/dif_magnitude.html

1 Background

1.1 Simulate response data

2 DIF assessment

2.1 LR-test

2.2 Partial gamma

2.3 Psychotree

2.4 Item split

2.5 Estimating thetas

2.6 Comparing item parameters

2.6.1 mirt comparison

2.7 Theta estimation investigated

3 Results

3.1 Summarised

3.2 Across the latent continuum

3.3 Theta estimation bias

3.4 Statistical analysis of means (limited range)

3.4.1 Results figure

3.4.2 Theta bias - loess by group

4 Session info

Reuse

Citation

2.6.1 `mirt` comparison