R, Rasch, etc
  • Home
  • easyRasch vignette
  • About
  • Blog
    • Automating reports with Quarto
    • What’s in a latent trait score?
    • Data retrieval with R using API calls
    • Power analysis for multilevel models
    • Data wrangling for psychometrics in R
    • Simulation based cutoff values for Rasch item fit and residual correlations
    • Comparing Rasch packages/estimators
    • Rasch DIF magnitude & item split

Note: The RISEkbmRasch R package is now known as easyRasch.

Table of contents

  • 1 Background
    • 1.1 Simulate response data
  • 2 DIF assessment
    • 2.1 LR-test
    • 2.2 Partial gamma
    • 2.3 Psychotree
    • 2.4 Item split
    • 2.5 Estimating thetas
    • 2.6 Comparing item parameters
    • 2.7 Theta estimation investigated
  • 3 Results
    • 3.1 Summarised
    • 3.2 Across the latent continuum
    • 3.3 Theta estimation bias
    • 3.4 Statistical analysis of means (limited range)
  • 4 Session info

Rasch DIF magnitude & item split

  • Show All Code
  • Hide All Code

  • View Source

Brief exploration of Differential Item Functioning in R

Author
Affiliation

Magnus Johansson

RISE Research Institutes of Sweden

Published

October 27, 2024

1 Background

Out of interest of better understanding som aspects of differential item functioning (DIF, a form of invariance test):

  • how much does DIF affect estimated thetas (factor scores)?
  • how to do item-split in R (creating separate items for subgroups from one item with problematic DIF)
  • how an item-split compares to removing the item (and keeping the DIF item) in terms of absolute differences in estimated thetas

We’ll use simulated data in order to have knowledge of the thetas used to generate response data (“input thetas” in the text below), and make objective comparisons using the different estimated thetas.

Ideally, this would be a simulation study where we create lots of datasets with a systematic variation in some parameters to investigate effects. But maybe this is a first step towards that.

Code
library(tidyverse)
library(eRm)
library(iarm)
library(mirt)
library(catR)
library(easyRasch) # devtools::install_github("pgmj/easyRasch", dependencies = TRUE)
library(summarytools)
library(ghibli)
library(broom)

### some commands exist in multiple packages, here we define preferred ones that are frequently used
select <- dplyr::select
count <- dplyr::count
recode <- car::recode
rename <- dplyr::rename

theme_rise <- function(fontfamily = "Lato", axissize = 13, titlesize = 15,
                       margins = 12, axisface = "plain", panelDist = 0.6, ...) {
  theme_minimal() +
  theme(
    text = element_text(family = fontfamily),
    axis.title.x = element_text(
      margin = margin(t = margins),
      size = axissize
    ),
    axis.title.y = element_text(
      margin = margin(r = margins),
      size = axissize
    ),
    plot.title = element_text(
      face = "bold",
      size = titlesize
    ),
    axis.title = element_text(
      face = axisface
    ),
    plot.caption = element_text(
      face = "italic"
    ),
    legend.text = element_text(family = fontfamily),
    legend.background = element_rect(color = "lightgrey"),
    strip.background = element_rect(color = "lightgrey"),
    panel.spacing = unit(panelDist, "cm", data = NULL),
    panel.border = element_rect(color = "grey", fill = NA),
    ...
  )
}

theme_set(theme_rise())

1.1 Simulate response data

First define input parameters for items.

Code
# make a tibble/dataframe also, for possible later use
inputParams1 <- tibble(
  q1  = c(1.2, 1.8, 2.4),
  q2  = c(-1.3, -0.5, 0.5),
  q3a = c(-0.3, 0.3, 1.2), # this is the DIF item
  q3b = c(-0.3+1, 0.3+1, 1.2+1), # this is the DIF item
  q4  = c(0.1, 0.6, 1.6),
  q5  = c(-0.3, 0.7, 1.5),
  q6  = c(-1.6, -1, -0.3),
  q7  = c(1, 1.8, 2.5),
  q8  = c(-1.3, -0.7, 0.4),
  q9  = c(-0.8, 1.4, 1.9),
  q10 = c(0.25, 1.25, 2.15)
) %>%
  t() %>%
  as.matrix()

# center to 0
inputParams1c <- inputParams1 - mean(inputParams1)

# item list for simulation for group 1
tlist1 <- list(
  q1 =  list(inputParams1c[1,]),
  q2 =  list(inputParams1c[2,]),
  q3 =  list(inputParams1c[3,]), # this is the DIF item
  q4 =  list(inputParams1c[5,]),
  q5 =  list(inputParams1c[6,]),
  q6 =  list(inputParams1c[7,]),
  q7 =  list(inputParams1c[8,]),
  q8 =  list(inputParams1c[9,]),
  q9 =  list(inputParams1c[10,]),
  q10 = list(inputParams1c[11,])
)

# item list for simulation for group 2
tlist2 <- list(
  q1 =  list(inputParams1c[1,]),
  q2 =  list(inputParams1c[2,]),
  q3 =  list(inputParams1c[4,]), # this is the DIF item
  q4 =  list(inputParams1c[5,]),
  q5 =  list(inputParams1c[6,]),
  q6 =  list(inputParams1c[7,]),
  q7 =  list(inputParams1c[8,]),
  q8 =  list(inputParams1c[9,]),
  q9 =  list(inputParams1c[10,]),
  q10 = list(inputParams1c[11,])
)

Then generate random thetas that we save to file to be able to reproduce the analysis.

Code
# simulate thetas
thetas1 <- rnorm(300, mean = 0, sd = 1.5)
thetas2 <- rnorm(300, mean = 0, sd = 1.5)

input_thetas <- c(thetas1,thetas2)

# simulate response data based on the above defined item thresholds
td1 <- SimPartialScore(
  deltaslist = tlist1,
  thetavec = thetas1
) %>%
  as.data.frame()

td2 <- SimPartialScore(
  deltaslist = tlist2,
  thetavec = thetas2
) %>%
  as.data.frame()

d <- rbind(td1,td2) %>% 
  add_column(group = rep(1:2, each = 300))

dif.group <- factor(d$group)
d$group <- NULL

all_data <- list(simResponses = d,
                 dif_group = dif.group,
                 input_thetas = input_thetas)
# save simulated data for reproducibility
#saveRDS(all_data,"dif_magnitude_1_0.Rdata")
Code
# read simulated data for reprodubility
all_data <- readRDS("dif_magnitude_1_0.Rdata")

d <- all_data$simResponses
dif.group <- all_data$dif_group
input_thetas <- all_data$input_thetas

We now have 10 items with 4 categories each. There are 600 respondents in all, with 300 showing differential item functioning for one item (item q3). DIF is induced at +1 logit uniform difference in location (all thresholds for item q3 are unformly +1 logits).

2 DIF assessment

Let’s test for DIF with some different methods.

2.1 LR-test

Code
RIdifTableLR(d, dif.group)
Item locations
Standard errors
Item 1 2 MaxDiff All SE_1 SE_2 SE_All
q1 1.561 1.309 0.252 1.425 0.215 0.221 0.153
q2 -0.691 -0.888 0.197 -0.792 0.205 0.198 0.142
q3 0.14 0.899 0.759 0.492 0.186 0.196 0.132
q4 0.641 0.628 0.013 0.634 0.184 0.192 0.131
q5 0.345 0.133 0.212 0.233 0.183 0.181 0.128
q6 -1.285 -1.345 0.06 -1.307 0.244 0.222 0.164
q7 1.411 1.46 0.049 1.419 0.206 0.223 0.151
q8 -0.87 -0.891 0.021 -0.873 0.219 0.197 0.146
q9 0.546 0.445 0.101 0.485 0.183 0.185 0.129
q10 0.919 0.927 0.008 0.916 0.187 0.195 0.134
Note:
Values highlighted in red are above the chosen cutoff 0.5 logits. Background color brown and blue indicate the lowest and highest values among the DIF groups.
Code
RIdifThreshFigLR(d, dif.group)

2.2 Partial gamma

Code
RIpartgamDIF(d, dif.group)
Item Partial gamma SE Lower CI Upper CI Adjusted p-value (BH)
q3 -0.454 0.071 -0.594 -0.314 0

2.3 Psychotree

Code
RIdifTable(d, dif.group)

Item 2 3 Mean location StDev MaxDiff
q1 1.289 1.041 1.165 0.176 0.248
q2 -0.962 -1.155 -1.059 0.136 0.193
q3 -0.132 0.631 0.250 0.540 0.763
q4 0.369 0.360 0.365 0.006 0.009
q5 0.073 -0.135 -0.031 0.147 0.208
q6 -1.557 -1.613 -1.585 0.039 0.056
q7 1.140 1.192 1.166 0.037 0.053
q8 -1.142 -1.159 -1.150 0.012 0.017
q9 0.274 0.177 0.226 0.068 0.097
q10 0.647 0.659 0.653 0.008 0.012

DIF clearly shown, however it is closer to 0.8 logits than the 1.0 used in the input values. This is still generally considered to be a large DIF size, so it should serve our purpose.

2.4 Item split

Now, we’ll do an item split and compare thetas for both groups with and without split, and also with the DIF item removed.

Code
thetas_together <- RIestThetas(d)

thetas_q3_removed <- RIestThetas(d %>% select(!q3))

d2 <- d %>% 
  add_column(group = dif.group) %>% 
  mutate(q3a = if_else(group == 1, q3, NA), 
         q3b = if_else(group == 2, q3, NA)
         ) %>% 
  select(!group) %>%
  select(!q3)

Let’s look at the data and a targeting plot.

Code
RItileplot(d2)

Code
RImissing(d2)

Code
RItargeting(d2)

Item threshold locations for the q3 split items range from -1 logits to +1.5 logits along the theta/latent continuum. This is roughly the range where we expect some impact from DIF.

Comparing targeting to non-split data.

Code
RItargeting(d)

Code
itemlabels <- data.frame(itemnr = names(d), item = "")
RIitemHierarchy(d)

2.5 Estimating thetas

First, we’ll use the convenient function RIestThetas() that estimates item parameters (using eRm) and thetas (using iarm), to see how that works when we have a split item, each with missing data for 50% of respondents.

Code
thetas_separate <- RIestThetas(d2)

hist(thetas_separate$WLE, breaks = 30, col = "lightblue")
hist(thetas_together$WLE, breaks = 30, col = "lightpink")
hist(thetas_q3_removed$WLE, breaks = 30, , col = "sienna4")

The upper range is rather different for the item split subgroup when using this method, with max score of about 2.2, compared to 4.1 for the item set with the original q3 item and entirely without q3.

Let’s review the estimated threshold locations.

2.6 Comparing item parameters

  • Original data
  • Item split
  • DIF item removed
  • Plot estimation bias
  • Summary plot estimation bias
Code
RIitemparams(d)
Threshold 1 Threshold 2 Threshold 3 Item location
q1 0.74 1.14 1.61 1.16
q2 -1.94 -1.12 -0.10 -1.05
q3 -0.28 0.16 0.80 0.23
q4 -0.28 0.13 1.27 0.37
q5 -0.89 0.04 0.75 -0.03
q6 -2.25 -1.67 -0.78 -1.57
q7 0.65 1.01 1.80 1.16
q8 -1.82 -1.39 -0.20 -1.14
q9 -1.33 0.58 1.42 0.22
q10 -0.24 0.62 1.58 0.65
Note:
Item location is the average of the thresholds for each item.
Code
RIitemparams(d2)
Threshold 1 Threshold 2 Threshold 3 Item location
q1 0.71 1.12 1.60 1.14
q2 -1.97 -1.15 -0.13 -1.08
q4 -0.32 0.10 1.25 0.35
q5 -0.92 0.02 0.73 -0.06
q6 -2.29 -1.70 -0.81 -1.6
q7 0.62 0.99 1.80 1.14
q8 -1.85 -1.42 -0.22 -1.17
q9 -1.37 0.55 1.41 0.2
q10 -0.27 0.59 1.57 0.63
q3a -0.88 -0.04 0.35 -0.19
q3b 0.10 0.36 1.47 0.64
Note:
Item location is the average of the thresholds for each item.
Code
RIitemparams(d %>% select(!q3))
Threshold 1 Threshold 2 Threshold 3 Item location
q1 0.76 1.19 1.64 1.19
q2 -1.91 -1.11 -0.08 -1.03
q4 -0.27 0.15 1.31 0.4
q5 -0.87 0.06 0.79 -0.01
q6 -2.22 -1.66 -0.76 -1.55
q7 0.67 1.05 1.83 1.19
q8 -1.79 -1.38 -0.17 -1.12
q9 -1.32 0.60 1.46 0.25
q10 -0.22 0.65 1.62 0.68
Note:
Item location is the average of the thresholds for each item.
Code
dp <- RIitemparams(d, output = "dataframe") %>% 
  select(!Location) %>% 
  set_names(paste0("orig_",1:3)) %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))
dp2 <- RIitemparams(d2, output = "dataframe") %>% 
  select(!Location) %>% 
  set_names(paste0("split_",1:3)) %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))
dp3 <- RIitemparams(d %>% select(!q3), output = "dataframe") %>% 
  select(!Location) %>% 
  set_names(paste0("q3rem_",1:3))  %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))

d_params <- cbind(dp,dp2[,-1],dp3[,-1]) # bind columns, keeping one "item" column

dpin <- inputParams1c %>% 
  as.data.frame() %>% 
  set_names(paste0("input_",1:3)) %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))

dpin_long <- dpin %>% 
  pivot_longer(!item,
               names_sep = "_",
               names_to = c("source","threshold"),
               values_to = "input")
Code
d_params %>% 
  pivot_longer(!item, 
               names_sep = "_",
               names_to = c("source","threshold")) %>% 
  left_join(dpin_long[,-2], by = c("item","threshold")) %>% 
  group_by(source,item,threshold) %>% 
  summarise(abs_diff = abs(input - value)) %>% 
  
  ggplot(aes(x = threshold, y = abs_diff, color = source)) +
  geom_point(size = 2, alpha = 0.85) +
  geom_line(aes(group = source)) +
  facet_wrap(~ item) +
  labs(x = "Absolute bias (logits)")

Code
itemlocs <- RIitemparams(d %>% select(!q3), output = "dataframe") %>% 
  pull(Location)

item_order <- sort(as.numeric(itemlocs))

d_params %>% 
  pivot_longer(!item, 
               names_sep = "_",
               names_to = c("source","threshold")) %>% 
  left_join(dpin_long[,-2], by = c("item","threshold")) %>% 
  group_by(source,item,threshold) %>% 
  summarise(abs_diff = abs(input - value)) %>% 
  ungroup() %>% 
  group_by(source,item) %>% 
  summarise(sum_diff = sum(abs_diff)) %>% 
  ungroup() %>% 
  pivot_wider(values_from = "sum_diff",
              names_from = "source",
              id_cols = "item") %>% 
  pivot_longer(!item) %>% 
  mutate(item = factor(item, levels = names(itemlocs), 
                        labels = paste0(names(itemlocs),"_",as.numeric(itemlocs)))) %>% 
  ggplot(aes(x = item, 
             y = value, 
             color = name)
         ) +
  geom_point(size = 3) +
  labs(title = "Total estimation bias for each item (all thresholds) and item set",
       y = "Absolute bias (logits)",
       x = "Item (and average location)") +
  scale_y_continuous(limits = c(0,NA))

2.6.1 mirt comparison

Perhaps the mirt package could be more accurate in item threshold estimation when we have item split?

Code
mirt_out <- mirt(data = d, model = 1, itemtype = "Rasch", verbose = FALSE)
mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
  as.data.frame() %>%
  select(!a) %>%
  set_names(paste0("orig_",1:3)) %>%   
  as.matrix()
dpm <- mirt_params - mean(mirt_params)

dpm <- dpm %>% 
  as.data.frame() %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))

mirt_out <- mirt(data = d2, model = 1, itemtype = "Rasch", verbose = FALSE)
mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
  as.data.frame() %>%
  select(!a) %>%
  set_names(paste0("split_",1:3)) %>% 
  as.matrix()
dpm2 <- mirt_params - mean(mirt_params)

dpm2 <- dpm2 %>% 
  as.data.frame() %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))

mirt_out <- mirt(data = d %>% select(!q3), model = 1, itemtype = "Rasch", verbose = FALSE)
mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
  as.data.frame() %>%
  select(!a) %>%
  set_names(paste0("q3rem_",1:3)) %>% 
  as.matrix()
dpm3 <- mirt_params - mean(mirt_params)

dpm3 <- dpm3 %>% 
  as.data.frame() %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))


d_params2 <- cbind(dpm,dpm2[,-1],dpm3[,-1]) # bind columns, keeping one "item" column
  • Plot estimation bias
  • Summary plot estimation bias
Code
d_params2 %>% 
  pivot_longer(!item, 
               names_sep = "_",
               names_to = c("source","threshold")) %>% 
  left_join(dpin_long[,-2], by = c("item","threshold")) %>% 
  group_by(source,item,threshold) %>% 
  summarise(abs_diff = abs(input - value)) %>% 
  
  ggplot(aes(x = threshold, y = abs_diff, color = source)) +
  geom_point(alpha = 0.85) +
  facet_wrap(~ item)

Seems indentical to eRm estimates.

Code
d_params2 %>% 
  pivot_longer(!item, 
               names_sep = "_",
               names_to = c("source","threshold")) %>% 
  left_join(dpin_long[,-2], by = c("item","threshold")) %>% 
  group_by(source,item,threshold) %>% 
  summarise(abs_diff = abs(input - value)) %>% 
  ungroup() %>% 
  group_by(source,item) %>% 
  summarise(sum_diff = sum(abs_diff)) %>% 
  ungroup() %>% 
  pivot_wider(values_from = "sum_diff",
              names_from = "source",
              id_cols = "item") %>% 
  pivot_longer(!item) %>% 
  mutate(item = factor(item, levels = names(itemlocs), 
                        labels = paste0(names(itemlocs),"_",as.numeric(itemlocs)))) %>% 
  ggplot(aes(x = item, 
             y = value, 
             color = name)
         ) +
  geom_point(size = 3) +
  labs(title = "mirt: Total estimation bias for each item and item set")

No difference here either.

2.7 Theta estimation investigated

The method for estimating item parameters and thetas used in the function RIestThetas() may be at fault for the odd results in thetas estimated by the item set with item split? We can separate the two steps, and use a separate function for theta estimation with manual input of item parameters.

Code
itemps <- RIitemparams(d2, output = "dataframe") %>% 
  select(!Location) %>% 
  as.matrix()

thetas_separate_catR <- RIestThetasOLD(d2, itemParams = itemps, theta_range = c(-8,8))

c <- tibble(together_RIestThetas = thetas_together$WLE,
           together_q3_rem = thetas_q3_removed$WLE,
           separate_catR = thetas_separate_catR,
           separate_RIestThetas = thetas_separate$WLE,
           input_thetas = input_thetas)

c %>% 
  pivot_longer(everything(),
               names_to = "method",
               values_to = "theta") %>% 
  ggplot(aes(x = theta)) +
  geom_histogram(bins = 50) +
  facet_wrap(~method, axes = "all_x") +
  scale_x_continuous(breaks = seq(-5,5,1))

Looks like the two step approach worked a lot better. Since the item parameter estimation is identical (both are using eRm::PCM()), the reason should be the difference in theta estimation. The two-step approach uses catR::thetaEst() for theta estimation, which is probably handling missing data better than iarm::person_estimates(). Note: both approaches use the Weighted Likelihood Estimation to minimize bias (Warm, 1989).

3 Results

3.1 Summarised

First, absolute differences in estimated thetas compared to input thetas. By using absolute differences we can assess both DIF groups simultaneously.

Code
c_diff <- c %>% 
  mutate(with_q3 = abs(together_RIestThetas - input_thetas),
         q3_removed = abs(together_q3_rem - input_thetas),
         q3_split = abs(separate_catR - input_thetas)) %>% 
  select(!names(c))
  
c_diff %>% 
  pivot_longer(everything()) %>% 
  ggplot(aes(x = value)) +
  geom_histogram(bins = 100) +
  facet_wrap(~ name, ncol = 1) +
  labs(x = "Absolute difference in logits",
       title = "Comparing input thetas to estimated",
       subtitle = "Distribution of bias")

Code
c_diff_descr <- descr(c_diff) %>% 
  as.data.frame() %>% 
  rownames_to_column("Parameter") 

c_diff_descr[1:9,] %>% 
  pivot_longer(!Parameter) %>% 
  
  ggplot(aes(x = Parameter, y = value, color = name)) +
  geom_point(alpha = 0.85) +
  scale_color_viridis_d('Item set', end = 0.8) +
  labs(y = "Logits",
       x = "Descriptive metric")

Code
c_diff_descr[1:9,] %>% 
  mutate_if(is.numeric, round, 3) %>% 
  kbl_rise(tbl_width = 50)
Parameter q3_removed q3_split with_q3
Mean 0.378 0.354 0.358
Std.Dev 0.288 0.275 0.278
Min 0.000 0.001 0.000
Q1 0.159 0.125 0.136
Median 0.316 0.292 0.302
Q3 0.546 0.525 0.518
Max 1.729 1.732 1.770
MAD 0.283 0.272 0.278
IQR 0.385 0.400 0.381

No real differences in these summary metrics.

We should look more closely at the particular region where the DIF item is located, since it should have the most impact there.

3.2 Across the latent continuum

First, the test information function (TIF) curve could be of interest to understand what to expect in terms of estimation bias due to reliability limitations. Even more interesting is the table showing range of SEM.

  • TIF original data
  • SEM original data
  • TIF without q3
  • SEM without q3
Code
RItif(d, samplePSI = TRUE)

Code
sem_orig <- RIscoreSE(d, output = "dataframe") %>% 
  clean_names() %>% 
  round(3)

sem_orig_min <- min(sem_orig$logit_std_error)

sem_orig %>% 
  ggplot(aes(x = logit_score, y = logit_std_error)) +
  geom_point(size = 2.5) +
  geom_line() +
  geom_text(data = . %>% 
              filter(logit_std_error == sem_orig_min),
            aes(label = sem_orig_min), 
            position = position_nudge(y = -0.03)) +
  scale_x_continuous(breaks = seq(-5,5,0.5)) +
  scale_y_continuous(limits = c(0,NA), breaks = seq(0,1,0.1))

Code
RItif(d %>% select(!q3), samplePSI = TRUE)

Code
sem_noq3 <- RIscoreSE(d %>% select(!q3), output = "dataframe") %>% 
  clean_names() %>% 
  round(3)

sem_noq3_min <- min(sem_noq3$logit_std_error)

sem_noq3 %>% 
  ggplot(aes(x = logit_score, y = logit_std_error)) +
  geom_point(size = 2.5) +
  geom_line() +
  geom_text(data = . %>% 
              filter(logit_std_error == sem_noq3_min) %>% 
              slice(1),
            aes(label = sem_noq3_min), 
            position = position_nudge(y = -0.03)) +
  scale_x_continuous(breaks = seq(-5,5,0.5)) +
  scale_y_continuous(limits = c(0,NA), breaks = seq(0,1,0.1))

The lowest SEM with q3 is 0.384, at logit score 0.356.

Lowest SEM without q3 is 0.416, at logit score 0.322 to 0.491, which makes a difference in minimal SEM of about 0.032 compared to including the DIF item. 0.032 * 1.96 = 0.063 for a 95% CI.

3.3 Theta estimation bias

  • Loess smoothing
  • Scatterplot
  • Loess “zoomed in”
Code
c_diff %>% 
  add_column(Theta = input_thetas) %>% 
  pivot_longer(!Theta) %>% 
  
  ggplot(aes(x = Theta, y = value, color = factor(name), fill = factor(name))) +
  #geom_point(alpha = 0.85) +
  geom_smooth(method = "loess",
              aes(linetype = factor(name)), alpha = 0.15) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  scale_fill_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Theta estimation bias",
       y = "Absolute difference (logits)") +
  guides(fill = "none", linetype = "none") +
  scale_x_continuous(breaks = seq(-5,5,1)) +
  scale_y_continuous(limits = c(0,NA), breaks = seq(0,1.2,0.1)) +
  theme_rise()
Figure 1
Code
c_diff %>% 
  add_column(Theta = input_thetas) %>% 
  pivot_longer(!Theta) %>% 
  
  ggplot(aes(x = Theta, y = value, color = factor(name), fill = factor(name))) +
  geom_point(alpha = 0.85) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  scale_fill_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Theta estimation bias",
       y = "Absolute difference in logits") +
  guides(fill = "none", linetype = "none") +
  scale_x_continuous(breaks = seq(-5,5,1)) +
  scale_y_continuous(limits = c(0,NA), breaks = seq(0,2,0.1)) +
  theme_rise()

Code
c_diff %>% 
  add_column(Theta = input_thetas) %>% 
  pivot_longer(!Theta) %>% 
  
  ggplot(aes(x = Theta, y = value, color = factor(name), fill = factor(name))) +
  #geom_point(alpha = 0.85) +
  geom_smooth(method = "loess",
              aes(linetype = factor(name)), alpha = 0.15) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  scale_fill_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Theta estimation bias",
       subtitle = "Cropped to improve readability",
       y = "Absolute difference (logits)") +
  guides(fill = "none", linetype = "none") +
  scale_x_continuous(limits = c(-2,2), breaks = seq(-5,5,1)) +
  scale_y_continuous(breaks = seq(0,1.2,0.05)) +
  theme_rise()

Item split slightly reduces bias compared to keeping the DIF item (3), while removing item 3 increases bias in the theta range of about -1 to +1.5 logits. The maximum bias looks like approximately 0.055 logits (at theta = 0), according to the loess smoothed line.

3.4 Statistical analysis of means (limited range)

While the practical impact of theta estimation bias induced by a DIF variable should be judged by how problematic the maximum bias is for the intended use and need for precision, it could be interesting to quantify the differences using statistical analysis. It is sometimes suggested to look at the mean of the groups, but I think this is mistaken. The bias is local and related to the DIF item’s location, which makes it relevant to look at that region separately.

As such, the comparison below is about mean differences in estimated thetas to input thetas, limited to the theta range from -0.5 to +1 logits, where differences seem the biggest according to Figure 1. Note that the figure shows loess smoothed lines that include both groups.

We need to do this separately for the two DIF groups, since the groups will have opposite effects of the DIF.

Code
c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 1) %>% 
  select(c(separate_catR,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  nrow()
[1] 130
Code
c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 2) %>% 
  select(c(separate_catR,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  nrow()
[1] 115

n = 130 and 115. More than a third of each group is located within the range affected by DIF.

Code
lm_all <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 1) %>% 
  select(c(together_RIestThetas,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

lm_q3rem <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 1) %>% 
  select(c(together_q3_rem,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

lm_q3split <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 1) %>% 
  select(c(separate_catR,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

g1 <- bind_rows(tidy(lm_all, conf.int = TRUE, conf.level = .84) %>% slice(2),
          tidy(lm_q3rem, conf.int = TRUE, conf.level = .84) %>% slice(2),
          tidy(lm_q3split, conf.int = TRUE, conf.level = .84) %>% slice(2)
          ) %>% 
  select(!term) %>% 
  round(3) %>% 
  add_column(Model = c("All items","Q3 removed","Q3 split"), .before = "estimate")
Code
lm_all2 <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 2) %>% 
  select(c(together_RIestThetas,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

lm_q3rem2 <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 2) %>% 
  select(c(together_q3_rem,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

lm_q3split2 <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 2) %>% 
  select(c(separate_catR,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

g2 <- bind_rows(tidy(lm_all2, conf.int = TRUE, conf.level = .84) %>% slice(2),
          tidy(lm_q3rem2, conf.int = TRUE, conf.level = .84) %>% slice(2),
          tidy(lm_q3split2, conf.int = TRUE, conf.level = .84) %>% slice(2)
          ) %>% 
  select(!term) %>% 
  round(3) %>% 
  add_column(Model = c("All items","Q3 removed","Q3 split"), .before = "estimate")

3.4.1 Results figure

Code
rbind(g1,g2) %>% 
  add_column(Group = rep(c("Group 1","Group 2"), each = 3)) %>% 
  
  ggplot(aes(x = Group, y = estimate, color = Model)) +
  geom_point(position = position_dodge(width = 0.2)) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
                width = 0.1,
                position = position_dodge(width = 0.2)) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Mean bias in theta estimation",
       subtitle = "Across a limited theta region (-0.5 to 1.0 logits)",
       y = "Model estimate", 
       x = "")

Since we are primarily interested in comparing the different item sets to each other, it is not the difference from input thetas (estimate = 0) that is most relevant here. As such, I chose to display 84% confidence intervals to be able to assess differences between item sets in each group by looking at whether the CI’s overlap or not.

3.4.2 Theta bias - loess by group

Code
c_diff %>% 
  add_column(Theta = input_thetas) %>% 
  mutate(Group = case_match(dif.group, 
                            "1" ~ "Group 1",
                            "2" ~ "Group 2")) %>% 
  pivot_longer(!(c("Theta","Group"))) %>% 
  
  ggplot(aes(x = Theta, y = value, color = factor(name), fill = factor(name))) +
  #geom_point(alpha = 0.85) +
  geom_smooth(method = "loess",
              aes(linetype = factor(name)), alpha = 0.15) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  scale_fill_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Theta estimation bias",
       subtitle = "Cropped to improve readability",
       y = "Absolute difference (logits)") +
  guides(fill = "none", linetype = "none") +
  scale_x_continuous(limits = c(-2,2), breaks = seq(-5,5,1)) +
  scale_y_continuous(breaks = seq(0,1.2,0.05)) +
  facet_wrap(~Group, ncol = 1,) +
  theme_rise()

4 Session info

Code
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Stockholm
tzcode source: internal

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] broom_1.0.7          ghibli_0.3.4         summarytools_1.0.1  
 [4] RISEkbmRasch_0.2.4.6 janitor_2.2.0        hexbin_1.28.4       
 [7] glue_1.7.0           ggrepel_0.9.6        patchwork_1.3.0     
[10] reshape_0.8.9        matrixStats_1.4.1    psychotree_0.16-1   
[13] psychotools_0.7-4    partykit_1.2-22      mvtnorm_1.3-1       
[16] libcoin_1.0-10       psych_2.4.6.26       kableExtra_1.4.0    
[19] formattable_0.2.1    catR_3.17            mirt_1.42           
[22] lattice_0.22-6       iarm_0.4.3           eRm_1.0-6           
[25] lubridate_1.9.3      forcats_1.0.0        stringr_1.5.1       
[28] dplyr_1.1.4          purrr_1.0.2          readr_2.1.5         
[31] tidyr_1.3.1          tibble_3.2.1         ggplot2_3.5.1       
[34] tidyverse_2.0.0     

loaded via a namespace (and not attached):
  [1] vcd_1.4-12           rstudioapi_0.16.0    audio_0.1-11        
  [4] jsonlite_1.8.9       magrittr_2.0.3       magick_2.8.5        
  [7] farver_2.1.2         rmarkdown_2.28       vctrs_0.6.5         
 [10] base64enc_0.1-3      htmltools_0.5.8.1    curl_5.2.3          
 [13] cellranger_1.1.0     Formula_1.2-5        dcurver_0.9.2       
 [16] parallelly_1.38.0    htmlwidgets_1.6.4    plyr_1.8.9          
 [19] testthat_3.2.1.1     zoo_1.8-12           lifecycle_1.0.4     
 [22] pkgconfig_2.0.3      Matrix_1.7-0         R6_2.5.1            
 [25] fastmap_1.2.0        snakecase_0.11.1     future_1.34.0       
 [28] digest_0.6.37        colorspace_2.1-1     rprojroot_2.0.4     
 [31] prismatic_1.1.2      Hmisc_5.1-3          vegan_2.6-8         
 [34] labeling_0.4.3       progressr_0.14.0     fansi_1.0.6         
 [37] timechange_0.3.0     abind_1.4-5          mgcv_1.9-1          
 [40] compiler_4.4.1       here_1.0.1           vcdExtra_0.8-5      
 [43] pander_0.6.5         withr_3.0.1          htmlTable_2.4.3     
 [46] backports_1.5.0      carData_3.0-5        relimp_1.0-5        
 [49] R.utils_2.12.3       MASS_7.3-61          sessioninfo_1.2.2   
 [52] GPArotation_2024.3-1 permute_0.9-7        tools_4.4.1         
 [55] foreign_0.8-87       lmtest_0.9-40        future.apply_1.11.2 
 [58] nnet_7.3-19          R.oo_1.26.0          nlme_3.1-165        
 [61] inum_1.0-5           checkmate_2.3.2      reshape2_1.4.4      
 [64] cluster_2.1.6        generics_0.1.3       snow_0.4-4          
 [67] gtable_0.3.5         RPushbullet_0.3.4    tzdb_0.4.0          
 [70] R.methodsS3_1.8.2    ca_0.71.1            data.table_1.16.0   
 [73] hms_1.1.3            car_3.1-2            xml2_1.3.6          
 [76] Deriv_4.1.3          utf8_1.2.4           pillar_1.9.0        
 [79] splines_4.4.1        pryr_0.1.6           survival_3.7-0      
 [82] tidyselect_1.2.1     pbapply_1.7-2        knitr_1.48          
 [85] gridExtra_2.3        svglite_2.1.3        xfun_0.46           
 [88] gnm_1.1-5            rapportools_1.1      brio_1.1.5          
 [91] stringi_1.8.4        yaml_2.3.10          evaluate_0.24.0     
 [94] codetools_0.2-20     beepr_2.0            tcltk_4.4.1         
 [97] cli_3.6.3            rpart_4.1.23         qvcalc_1.0.3        
[100] systemfonts_1.1.0    munsell_0.5.1        Rcpp_1.0.13         
[103] readxl_1.4.3         globals_0.16.3       parallel_4.4.1      
[106] listenv_0.9.1        viridisLite_0.4.2    SimDesign_2.17.1    
[109] scales_1.3.0         rlang_1.1.4          mnormt_2.1.1        

Reuse

CC BY 4.0

Citation

BibTeX citation:
@online{johansson2024,
  author = {Johansson, Magnus},
  title = {Rasch {DIF} Magnitude \& Item Split},
  date = {2024-10-27},
  url = {https://pgmj.github.io/dif_magnitude.html},
  langid = {en}
}
For attribution, please cite this work as:
Johansson, M. (2024, October 27). Rasch DIF magnitude & item split. https://pgmj.github.io/dif_magnitude.html
Source Code
---
title: "Rasch DIF magnitude & item split"
subtitle: "Brief exploration of Differential Item Functioning in R"
author: 
  name: Magnus Johansson
  affiliation: RISE Research Institutes of Sweden
  affiliation-url: https://www.ri.se/en/shic
  orcid: 0000-0003-1669-592X
date: 2024-10-27
citation:
  type: 'webpage'
csl: apa.csl
execute: 
  cache: true
  warning: false
  message: false
editor: 
  markdown: 
    wrap: 72
editor_options: 
  chunk_output_type: console
---

## Background

Out of interest of better understanding som aspects of differential item functioning (DIF, a form of invariance test):

- how much does DIF affect estimated thetas (factor scores)?
- how to do item-split in R (creating separate items for subgroups from one item with problematic DIF)
- how an item-split compares to removing the item (and keeping the DIF item) in terms of absolute differences in estimated thetas

We'll use simulated data in order to have knowledge of the thetas used to generate response data ("input thetas" in the text below), and make objective comparisons using the different estimated thetas.

Ideally, this would be a simulation study where we create lots of datasets with a systematic variation in some parameters to investigate effects. But maybe this is a first step towards that.


```{r}
#| code-fold: true
library(tidyverse)
library(eRm)
library(iarm)
library(mirt)
library(catR)
library(easyRasch) # devtools::install_github("pgmj/easyRasch", dependencies = TRUE)
library(summarytools)
library(ghibli)
library(broom)

### some commands exist in multiple packages, here we define preferred ones that are frequently used
select <- dplyr::select
count <- dplyr::count
recode <- car::recode
rename <- dplyr::rename

theme_rise <- function(fontfamily = "Lato", axissize = 13, titlesize = 15,
                       margins = 12, axisface = "plain", panelDist = 0.6, ...) {
  theme_minimal() +
  theme(
    text = element_text(family = fontfamily),
    axis.title.x = element_text(
      margin = margin(t = margins),
      size = axissize
    ),
    axis.title.y = element_text(
      margin = margin(r = margins),
      size = axissize
    ),
    plot.title = element_text(
      face = "bold",
      size = titlesize
    ),
    axis.title = element_text(
      face = axisface
    ),
    plot.caption = element_text(
      face = "italic"
    ),
    legend.text = element_text(family = fontfamily),
    legend.background = element_rect(color = "lightgrey"),
    strip.background = element_rect(color = "lightgrey"),
    panel.spacing = unit(panelDist, "cm", data = NULL),
    panel.border = element_rect(color = "grey", fill = NA),
    ...
  )
}

theme_set(theme_rise())
```

### Simulate response data

First define input parameters for items.

```{r}
# make a tibble/dataframe also, for possible later use
inputParams1 <- tibble(
  q1  = c(1.2, 1.8, 2.4),
  q2  = c(-1.3, -0.5, 0.5),
  q3a = c(-0.3, 0.3, 1.2), # this is the DIF item
  q3b = c(-0.3+1, 0.3+1, 1.2+1), # this is the DIF item
  q4  = c(0.1, 0.6, 1.6),
  q5  = c(-0.3, 0.7, 1.5),
  q6  = c(-1.6, -1, -0.3),
  q7  = c(1, 1.8, 2.5),
  q8  = c(-1.3, -0.7, 0.4),
  q9  = c(-0.8, 1.4, 1.9),
  q10 = c(0.25, 1.25, 2.15)
) %>%
  t() %>%
  as.matrix()

# center to 0
inputParams1c <- inputParams1 - mean(inputParams1)

# item list for simulation for group 1
tlist1 <- list(
  q1 =  list(inputParams1c[1,]),
  q2 =  list(inputParams1c[2,]),
  q3 =  list(inputParams1c[3,]), # this is the DIF item
  q4 =  list(inputParams1c[5,]),
  q5 =  list(inputParams1c[6,]),
  q6 =  list(inputParams1c[7,]),
  q7 =  list(inputParams1c[8,]),
  q8 =  list(inputParams1c[9,]),
  q9 =  list(inputParams1c[10,]),
  q10 = list(inputParams1c[11,])
)

# item list for simulation for group 2
tlist2 <- list(
  q1 =  list(inputParams1c[1,]),
  q2 =  list(inputParams1c[2,]),
  q3 =  list(inputParams1c[4,]), # this is the DIF item
  q4 =  list(inputParams1c[5,]),
  q5 =  list(inputParams1c[6,]),
  q6 =  list(inputParams1c[7,]),
  q7 =  list(inputParams1c[8,]),
  q8 =  list(inputParams1c[9,]),
  q9 =  list(inputParams1c[10,]),
  q10 = list(inputParams1c[11,])
)
```

Then generate random thetas that we save to file to be able to reproduce the analysis.

```{r}
#| eval: false
# simulate thetas
thetas1 <- rnorm(300, mean = 0, sd = 1.5)
thetas2 <- rnorm(300, mean = 0, sd = 1.5)

input_thetas <- c(thetas1,thetas2)

# simulate response data based on the above defined item thresholds
td1 <- SimPartialScore(
  deltaslist = tlist1,
  thetavec = thetas1
) %>%
  as.data.frame()

td2 <- SimPartialScore(
  deltaslist = tlist2,
  thetavec = thetas2
) %>%
  as.data.frame()

d <- rbind(td1,td2) %>% 
  add_column(group = rep(1:2, each = 300))

dif.group <- factor(d$group)
d$group <- NULL

all_data <- list(simResponses = d,
                 dif_group = dif.group,
                 input_thetas = input_thetas)
# save simulated data for reproducibility
#saveRDS(all_data,"dif_magnitude_1_0.Rdata")
```

```{r}
# read simulated data for reprodubility
all_data <- readRDS("dif_magnitude_1_0.Rdata")

d <- all_data$simResponses
dif.group <- all_data$dif_group
input_thetas <- all_data$input_thetas

```

We now have 10 items with 4 categories each. There are 600 respondents in all, with 300 showing differential item functioning for one item (item q3). DIF is induced at +1 logit uniform difference in location (all thresholds for item q3 are unformly +1 logits).

## DIF assessment

Let's test for DIF with some different methods.

### LR-test
```{r}
RIdifTableLR(d, dif.group)
RIdifThreshFigLR(d, dif.group)
```

### Partial gamma
```{r}
RIpartgamDIF(d, dif.group)
```

### Psychotree
```{r}
RIdifTable(d, dif.group)
```

DIF clearly shown, however it is closer to 0.8 logits than the 1.0 used in the input values. This is still generally considered to be a large DIF size, so it should serve our purpose.

### Item split

Now, we'll do an item split and compare thetas for both groups with and without split, and also with the DIF item removed.

``` {r}
thetas_together <- RIestThetas(d)

thetas_q3_removed <- RIestThetas(d %>% select(!q3))

d2 <- d %>% 
  add_column(group = dif.group) %>% 
  mutate(q3a = if_else(group == 1, q3, NA), 
         q3b = if_else(group == 2, q3, NA)
         ) %>% 
  select(!group) %>%
  select(!q3)
```

Let's look at the data and a targeting plot.

``` {r}
RItileplot(d2)
RImissing(d2)
RItargeting(d2)
```

Item threshold locations for the q3 split items range from -1 logits to +1.5 logits along the theta/latent continuum. This is roughly the range where we expect some impact from DIF.

Comparing targeting to non-split data.

```{r}
RItargeting(d)
itemlabels <- data.frame(itemnr = names(d), item = "")
RIitemHierarchy(d)
```

### Estimating thetas

First, we'll use the convenient function `RIestThetas()` that estimates item parameters (using `eRm`) and thetas (using `iarm`), to see how that works when we have a split item, each with missing data for 50% of respondents.

```{r}
#| layout-ncol: 2
thetas_separate <- RIestThetas(d2)

hist(thetas_separate$WLE, breaks = 30, col = "lightblue")
hist(thetas_together$WLE, breaks = 30, col = "lightpink")
hist(thetas_q3_removed$WLE, breaks = 30, , col = "sienna4")
```

The upper range is rather different for the item split subgroup when using this method, with max score of about 2.2, compared to 4.1 for the item set with the original q3 item and entirely without q3. 

Let's review the estimated threshold locations.

### Comparing item parameters

::: panel-tabset
#### Original data
```{r}
RIitemparams(d)
```
#### Item split
```{r}
RIitemparams(d2)
```
#### DIF item removed
```{r}
RIitemparams(d %>% select(!q3))
```
#### Plot estimation bias
```{r}
#| code-fold: true
dp <- RIitemparams(d, output = "dataframe") %>% 
  select(!Location) %>% 
  set_names(paste0("orig_",1:3)) %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))
dp2 <- RIitemparams(d2, output = "dataframe") %>% 
  select(!Location) %>% 
  set_names(paste0("split_",1:3)) %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))
dp3 <- RIitemparams(d %>% select(!q3), output = "dataframe") %>% 
  select(!Location) %>% 
  set_names(paste0("q3rem_",1:3))  %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))

d_params <- cbind(dp,dp2[,-1],dp3[,-1]) # bind columns, keeping one "item" column

dpin <- inputParams1c %>% 
  as.data.frame() %>% 
  set_names(paste0("input_",1:3)) %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))

dpin_long <- dpin %>% 
  pivot_longer(!item,
               names_sep = "_",
               names_to = c("source","threshold"),
               values_to = "input")
```


```{r}
#| code-fold: true
d_params %>% 
  pivot_longer(!item, 
               names_sep = "_",
               names_to = c("source","threshold")) %>% 
  left_join(dpin_long[,-2], by = c("item","threshold")) %>% 
  group_by(source,item,threshold) %>% 
  summarise(abs_diff = abs(input - value)) %>% 
  
  ggplot(aes(x = threshold, y = abs_diff, color = source)) +
  geom_point(size = 2, alpha = 0.85) +
  geom_line(aes(group = source)) +
  facet_wrap(~ item) +
  labs(x = "Absolute bias (logits)")
  
```
#### Summary plot estimation bias
```{r}
#| code-fold: true
itemlocs <- RIitemparams(d %>% select(!q3), output = "dataframe") %>% 
  pull(Location)

item_order <- sort(as.numeric(itemlocs))

d_params %>% 
  pivot_longer(!item, 
               names_sep = "_",
               names_to = c("source","threshold")) %>% 
  left_join(dpin_long[,-2], by = c("item","threshold")) %>% 
  group_by(source,item,threshold) %>% 
  summarise(abs_diff = abs(input - value)) %>% 
  ungroup() %>% 
  group_by(source,item) %>% 
  summarise(sum_diff = sum(abs_diff)) %>% 
  ungroup() %>% 
  pivot_wider(values_from = "sum_diff",
              names_from = "source",
              id_cols = "item") %>% 
  pivot_longer(!item) %>% 
  mutate(item = factor(item, levels = names(itemlocs), 
                        labels = paste0(names(itemlocs),"_",as.numeric(itemlocs)))) %>% 
  ggplot(aes(x = item, 
             y = value, 
             color = name)
         ) +
  geom_point(size = 3) +
  labs(title = "Total estimation bias for each item (all thresholds) and item set",
       y = "Absolute bias (logits)",
       x = "Item (and average location)") +
  scale_y_continuous(limits = c(0,NA))
```
:::


#### `mirt` comparison

Perhaps the `mirt` package could be more accurate in item threshold estimation when we have item split?

```{r}
#| code-fold: true
mirt_out <- mirt(data = d, model = 1, itemtype = "Rasch", verbose = FALSE)
mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
  as.data.frame() %>%
  select(!a) %>%
  set_names(paste0("orig_",1:3)) %>%   
  as.matrix()
dpm <- mirt_params - mean(mirt_params)

dpm <- dpm %>% 
  as.data.frame() %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))

mirt_out <- mirt(data = d2, model = 1, itemtype = "Rasch", verbose = FALSE)
mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
  as.data.frame() %>%
  select(!a) %>%
  set_names(paste0("split_",1:3)) %>% 
  as.matrix()
dpm2 <- mirt_params - mean(mirt_params)

dpm2 <- dpm2 %>% 
  as.data.frame() %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))

mirt_out <- mirt(data = d %>% select(!q3), model = 1, itemtype = "Rasch", verbose = FALSE)
mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
  as.data.frame() %>%
  select(!a) %>%
  set_names(paste0("q3rem_",1:3)) %>% 
  as.matrix()
dpm3 <- mirt_params - mean(mirt_params)

dpm3 <- dpm3 %>% 
  as.data.frame() %>% 
  rownames_to_column("item") %>% 
  filter(!str_detect(item,"q3"))


d_params2 <- cbind(dpm,dpm2[,-1],dpm3[,-1]) # bind columns, keeping one "item" column
```
::: panel-tabset
#### Plot estimation bias
```{r}
d_params2 %>% 
  pivot_longer(!item, 
               names_sep = "_",
               names_to = c("source","threshold")) %>% 
  left_join(dpin_long[,-2], by = c("item","threshold")) %>% 
  group_by(source,item,threshold) %>% 
  summarise(abs_diff = abs(input - value)) %>% 
  
  ggplot(aes(x = threshold, y = abs_diff, color = source)) +
  geom_point(alpha = 0.85) +
  facet_wrap(~ item)
  
```

Seems indentical to eRm estimates.

#### Summary plot estimation bias
```{r}
d_params2 %>% 
  pivot_longer(!item, 
               names_sep = "_",
               names_to = c("source","threshold")) %>% 
  left_join(dpin_long[,-2], by = c("item","threshold")) %>% 
  group_by(source,item,threshold) %>% 
  summarise(abs_diff = abs(input - value)) %>% 
  ungroup() %>% 
  group_by(source,item) %>% 
  summarise(sum_diff = sum(abs_diff)) %>% 
  ungroup() %>% 
  pivot_wider(values_from = "sum_diff",
              names_from = "source",
              id_cols = "item") %>% 
  pivot_longer(!item) %>% 
  mutate(item = factor(item, levels = names(itemlocs), 
                        labels = paste0(names(itemlocs),"_",as.numeric(itemlocs)))) %>% 
  ggplot(aes(x = item, 
             y = value, 
             color = name)
         ) +
  geom_point(size = 3) +
  labs(title = "mirt: Total estimation bias for each item and item set")
```

No difference here either.
:::

### Theta estimation investigated

The method for estimating item parameters and thetas used in the function `RIestThetas()` may be at fault for the odd results in thetas estimated by the item set with item split? We can separate the two steps, and use a separate function for theta estimation with manual input of item parameters.

```{r}
itemps <- RIitemparams(d2, output = "dataframe") %>% 
  select(!Location) %>% 
  as.matrix()

thetas_separate_catR <- RIestThetasOLD(d2, itemParams = itemps, theta_range = c(-8,8))

c <- tibble(together_RIestThetas = thetas_together$WLE,
           together_q3_rem = thetas_q3_removed$WLE,
           separate_catR = thetas_separate_catR,
           separate_RIestThetas = thetas_separate$WLE,
           input_thetas = input_thetas)

c %>% 
  pivot_longer(everything(),
               names_to = "method",
               values_to = "theta") %>% 
  ggplot(aes(x = theta)) +
  geom_histogram(bins = 50) +
  facet_wrap(~method, axes = "all_x") +
  scale_x_continuous(breaks = seq(-5,5,1))
```

Looks like the two step approach worked a lot better. Since the item parameter estimation is identical (both are using `eRm::PCM()`), the reason should be the difference in theta estimation. The two-step approach uses `catR::thetaEst()` for theta estimation, which is probably handling missing data better than `iarm::person_estimates()`. Note: both approaches use the Weighted Likelihood Estimation to minimize bias (Warm, 1989).

## Results 

### Summarised

First, absolute differences in estimated thetas compared to input thetas. By using absolute differences we can assess both DIF groups simultaneously.

```{r}
c_diff <- c %>% 
  mutate(with_q3 = abs(together_RIestThetas - input_thetas),
         q3_removed = abs(together_q3_rem - input_thetas),
         q3_split = abs(separate_catR - input_thetas)) %>% 
  select(!names(c))
  
c_diff %>% 
  pivot_longer(everything()) %>% 
  ggplot(aes(x = value)) +
  geom_histogram(bins = 100) +
  facet_wrap(~ name, ncol = 1) +
  labs(x = "Absolute difference in logits",
       title = "Comparing input thetas to estimated",
       subtitle = "Distribution of bias")

```

```{r}
c_diff_descr <- descr(c_diff) %>% 
  as.data.frame() %>% 
  rownames_to_column("Parameter") 

c_diff_descr[1:9,] %>% 
  pivot_longer(!Parameter) %>% 
  
  ggplot(aes(x = Parameter, y = value, color = name)) +
  geom_point(alpha = 0.85) +
  scale_color_viridis_d('Item set', end = 0.8) +
  labs(y = "Logits",
       x = "Descriptive metric")
```

```{r}
c_diff_descr[1:9,] %>% 
  mutate_if(is.numeric, round, 3) %>% 
  kbl_rise(tbl_width = 50)
```

No real differences in these summary metrics.

We should look more closely at the particular region where the DIF item is located, since it should have the most impact there.

### Across the latent continuum

First, the test information function (TIF) curve could be of interest to understand what to expect in terms of estimation bias due to reliability limitations. Even more interesting is the table showing range of SEM.

::: panel-tabset
#### TIF original data

```{r}
#| fig-width: 9
RItif(d, samplePSI = TRUE)
```
#### SEM original data
```{r}
#| code-fold: true
sem_orig <- RIscoreSE(d, output = "dataframe") %>% 
  clean_names() %>% 
  round(3)

sem_orig_min <- min(sem_orig$logit_std_error)

sem_orig %>% 
  ggplot(aes(x = logit_score, y = logit_std_error)) +
  geom_point(size = 2.5) +
  geom_line() +
  geom_text(data = . %>% 
              filter(logit_std_error == sem_orig_min),
            aes(label = sem_orig_min), 
            position = position_nudge(y = -0.03)) +
  scale_x_continuous(breaks = seq(-5,5,0.5)) +
  scale_y_continuous(limits = c(0,NA), breaks = seq(0,1,0.1))
``` 

#### TIF without q3
```{r}
#| fig-width: 9
RItif(d %>% select(!q3), samplePSI = TRUE)
```
#### SEM without q3
```{r}
#| code-fold: true
sem_noq3 <- RIscoreSE(d %>% select(!q3), output = "dataframe") %>% 
  clean_names() %>% 
  round(3)

sem_noq3_min <- min(sem_noq3$logit_std_error)

sem_noq3 %>% 
  ggplot(aes(x = logit_score, y = logit_std_error)) +
  geom_point(size = 2.5) +
  geom_line() +
  geom_text(data = . %>% 
              filter(logit_std_error == sem_noq3_min) %>% 
              slice(1),
            aes(label = sem_noq3_min), 
            position = position_nudge(y = -0.03)) +
  scale_x_continuous(breaks = seq(-5,5,0.5)) +
  scale_y_continuous(limits = c(0,NA), breaks = seq(0,1,0.1))
```
:::

The lowest SEM with q3 is 0.384, at logit score 0.356. 

Lowest SEM without q3 is 0.416, at logit score 0.322 to 0.491, which makes a difference in minimal SEM of about 0.032 compared to including the DIF item. 0.032 * 1.96 = 0.063 for a 95% CI.

### Theta estimation bias

::: panel-tabset
#### Loess smoothing
```{r}
#| code-fold: true
#| label: fig-loess
c_diff %>% 
  add_column(Theta = input_thetas) %>% 
  pivot_longer(!Theta) %>% 
  
  ggplot(aes(x = Theta, y = value, color = factor(name), fill = factor(name))) +
  #geom_point(alpha = 0.85) +
  geom_smooth(method = "loess",
              aes(linetype = factor(name)), alpha = 0.15) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  scale_fill_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Theta estimation bias",
       y = "Absolute difference (logits)") +
  guides(fill = "none", linetype = "none") +
  scale_x_continuous(breaks = seq(-5,5,1)) +
  scale_y_continuous(limits = c(0,NA), breaks = seq(0,1.2,0.1)) +
  theme_rise()
```
#### Scatterplot
```{r}
#| code-fold: true
c_diff %>% 
  add_column(Theta = input_thetas) %>% 
  pivot_longer(!Theta) %>% 
  
  ggplot(aes(x = Theta, y = value, color = factor(name), fill = factor(name))) +
  geom_point(alpha = 0.85) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  scale_fill_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Theta estimation bias",
       y = "Absolute difference in logits") +
  guides(fill = "none", linetype = "none") +
  scale_x_continuous(breaks = seq(-5,5,1)) +
  scale_y_continuous(limits = c(0,NA), breaks = seq(0,2,0.1)) +
  theme_rise()
```
#### Loess "zoomed in"
```{r}
#| code-fold: true
c_diff %>% 
  add_column(Theta = input_thetas) %>% 
  pivot_longer(!Theta) %>% 
  
  ggplot(aes(x = Theta, y = value, color = factor(name), fill = factor(name))) +
  #geom_point(alpha = 0.85) +
  geom_smooth(method = "loess",
              aes(linetype = factor(name)), alpha = 0.15) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  scale_fill_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Theta estimation bias",
       subtitle = "Cropped to improve readability",
       y = "Absolute difference (logits)") +
  guides(fill = "none", linetype = "none") +
  scale_x_continuous(limits = c(-2,2), breaks = seq(-5,5,1)) +
  scale_y_continuous(breaks = seq(0,1.2,0.05)) +
  theme_rise()
```
:::

Item split slightly reduces bias compared to keeping the DIF item (3), while removing item 3 increases bias in the theta range of about -1 to +1.5 logits. The maximum bias looks like approximately 0.055 logits (at theta = 0), according to the loess smoothed line.

### Statistical analysis of means (limited range)

While the practical impact of theta estimation bias induced by a DIF variable should be judged by how problematic the maximum bias is for the intended use and need for precision, it could be interesting to quantify the differences using statistical analysis. It is sometimes suggested to look at the mean of the groups, but I think this is mistaken. The bias is local and related to the DIF item's location, which makes it relevant to look at that region separately. 

As such, the comparison below is about mean differences in estimated thetas to input thetas, limited to the theta range from -0.5 to +1 logits, where differences seem the biggest according to [@fig-loess]. Note that the figure shows loess smoothed lines that include both groups.

We need to do this separately for the two DIF groups, since the groups will have opposite effects of the DIF.

```{r}
#| code-fold: true
c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 1) %>% 
  select(c(separate_catR,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  nrow()

c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 2) %>% 
  select(c(separate_catR,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  nrow()
```

n = 130 and 115. More than a third of each group is located within the range affected by DIF.


```{r}
#| code-fold: true
lm_all <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 1) %>% 
  select(c(together_RIestThetas,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

lm_q3rem <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 1) %>% 
  select(c(together_q3_rem,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

lm_q3split <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 1) %>% 
  select(c(separate_catR,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

g1 <- bind_rows(tidy(lm_all, conf.int = TRUE, conf.level = .84) %>% slice(2),
          tidy(lm_q3rem, conf.int = TRUE, conf.level = .84) %>% slice(2),
          tidy(lm_q3split, conf.int = TRUE, conf.level = .84) %>% slice(2)
          ) %>% 
  select(!term) %>% 
  round(3) %>% 
  add_column(Model = c("All items","Q3 removed","Q3 split"), .before = "estimate")

```


```{r}
#| code-fold: true
lm_all2 <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 2) %>% 
  select(c(together_RIestThetas,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

lm_q3rem2 <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 2) %>% 
  select(c(together_q3_rem,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

lm_q3split2 <- c %>% 
  add_column(group = dif.group) %>% 
  filter(group == 2) %>% 
  select(c(separate_catR,input_thetas)) %>%
  filter(input_thetas > -0.5 & input_thetas < 1) %>% 
  pivot_longer(everything()) %>% 
  lm(value ~ name, data = .)

g2 <- bind_rows(tidy(lm_all2, conf.int = TRUE, conf.level = .84) %>% slice(2),
          tidy(lm_q3rem2, conf.int = TRUE, conf.level = .84) %>% slice(2),
          tidy(lm_q3split2, conf.int = TRUE, conf.level = .84) %>% slice(2)
          ) %>% 
  select(!term) %>% 
  round(3) %>% 
  add_column(Model = c("All items","Q3 removed","Q3 split"), .before = "estimate")

```

#### Results figure

```{r}
rbind(g1,g2) %>% 
  add_column(Group = rep(c("Group 1","Group 2"), each = 3)) %>% 
  
  ggplot(aes(x = Group, y = estimate, color = Model)) +
  geom_point(position = position_dodge(width = 0.2)) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
                width = 0.1,
                position = position_dodge(width = 0.2)) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Mean bias in theta estimation",
       subtitle = "Across a limited theta region (-0.5 to 1.0 logits)",
       y = "Model estimate", 
       x = "")
```

Since we are primarily interested in comparing the different item sets to each other, it is not the difference from input thetas (estimate = 0) that is most relevant here. As such, I chose to display 84% confidence intervals to be able to assess differences between item sets in each group by looking at whether the CI's overlap or not.

#### Theta bias - loess by group

```{r}
#| code-fold: true
c_diff %>% 
  add_column(Theta = input_thetas) %>% 
  mutate(Group = case_match(dif.group, 
                            "1" ~ "Group 1",
                            "2" ~ "Group 2")) %>% 
  pivot_longer(!(c("Theta","Group"))) %>% 
  
  ggplot(aes(x = Theta, y = value, color = factor(name), fill = factor(name))) +
  #geom_point(alpha = 0.85) +
  geom_smooth(method = "loess",
              aes(linetype = factor(name)), alpha = 0.15) +
  scale_color_ghibli_d("MononokeMedium", direction = -1) +
  scale_fill_ghibli_d("MononokeMedium", direction = -1) +
  labs(color = "Item set",
       title = "Theta estimation bias",
       subtitle = "Cropped to improve readability",
       y = "Absolute difference (logits)") +
  guides(fill = "none", linetype = "none") +
  scale_x_continuous(limits = c(-2,2), breaks = seq(-5,5,1)) +
  scale_y_continuous(breaks = seq(0,1.2,0.05)) +
  facet_wrap(~Group, ncol = 1,) +
  theme_rise()
```


## Session info

```{r}
sessionInfo()
```