R, Rasch, etc
  • Home
  • easyRasch vignette
  • About
  • Blog
    • Automating reports with Quarto
    • What’s in a latent trait score?
    • Data retrieval with R using API calls
    • Power analysis for multilevel models
    • Data wrangling for psychometrics in R
    • Simulation based cutoff values for Rasch item fit and residual correlations
    • Comparing Rasch packages/estimators
    • Rasch DIF magnitude & item split
    • Conditional Likelihood Ratio Test and sample size

Note: The RISEkbmRasch R package is now known as easyRasch.

Table of contents

  • 1 Background
  • 2 Data import
  • 3 Estimation
  • 4 Results
    • 4.1 Summary table MAE
    • 4.2 Summary table RMSE
    • 4.3 Thresholds summary MAE
    • 4.4 Figure
  • 5 Smaller sample
    • 5.1 Results
  • 6 Even smaller sample
    • 6.1 Results
  • 7 References

Comparing Rasch packages/estimators

  • Show All Code
  • Hide All Code

  • View Source

Item threshold parameter estimation

Author
Affiliation

Magnus Johansson

RISE Research Institutes of Sweden

Published

2024-09-06

1 Background

I’m doing comparisons of the bias in the estimators for the Partial Credit Model (PCM) implemented in Rasch R packages and thought I would share a bit of code and simulated data. There will be more extensive simulations and comparisons coming along, varying targeting and sample distribution.

For this example we have a simple setup:

  • 250 datasets
  • each dataset has 9 items and 4 thresholds (5 response categories)
  • each dataset has 720 respondents (20 per threshold estimated, which should be plenty)

The .rds data file contains a list object with the 250 datasets, and each dataset is accompanied by a matrix of item threshold parameters and a vector of theta values used to generate the response data.

  • the vector of theta values is normally distributed with a mean and SD closely matching that of the item parameters

We have four packages in the comparison:

  • eRm - Conditional Maximum Likelihood (CML)
  • TAM - Marginal ML (MML)
  • pairwise - Pairwise CML (PCML)
  • mirt - fixed quadrature expectation-maximization algorithm

Other estimation methods are available in TAM and mirt, but we’ll just test these for now.

The datafile is available in the Github repo.

2 Data import

Code
library(tidyverse)
library(eRm)
library(janitor)
library(TAM)
library(mirt)
library(pairwise)
library(ggdist)
library(doParallel)
library(tinytable)

### some commands exist in multiple packages, here we define preferred ones that are frequently used
select <- dplyr::select
count <- dplyr::count
recode <- car::recode
rename <- dplyr::rename

# read 250 simulated datasets, stored in a list() object
sim_data_4t <- readRDS("sim_data_4t.rds")

3 Estimation

We’ll make use of library(doParallel) for multicore processing.

Code
registerDoParallel(cores = 8)
iterations = 250

sim_results_4t <- list()

sim_results_4t <- foreach(i = 1:iterations) %dopar% {

  input_params <- sim_data_4t[[i]]$input_params
  input_thetas <- sim_data_4t[[i]]$input_thetas
  testData <- sim_data_4t[[i]]$data

  # check mean/centrality (item parameters should have been centered before data generation)
  input_params_correction <- mean(input_params)

  # eRm
  erm_out <- PCM(testData)
  erm_params <- thresholds(erm_out)[[3]][[1]][,-1] %>%
    as.data.frame() %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  erm_params_c <- erm_params - mean(erm_params) + input_params_correction

  # TAM
  tam_out <- tam(as.matrix(testData), irtmodel = "PCM", verbose = FALSE)

  tam_params <- tam_out$item_irt %>%
    as.data.frame() %>%
    select(starts_with("tau")) %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  tam_params_c <- tam_params - mean(tam_params) + input_params_correction

  # mirt
  mirt_out <- mirt(data = testData, model = 1, itemtype = "Rasch", verbose = FALSE)
  mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
    as.data.frame() %>%
    select(!a) %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()
  mirt_params_c <- mirt_params - mean(mirt_params) + input_params_correction

  # PAIR
  pair_out <- pair(testData)
  pair_params <- deltapar(pair_out) %>%
    as.data.frame()

  pair_params <- pair_params[,-1] %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  pair_params_c <- pair_params - mean(pair_params) + input_params_correction

  # combine item parameters and calculate lowest to highest threshold distances
  input_params %>%
    as.data.frame() %>%
    rename(T1 = V1,
           T2 = V2,
           T3 = V3,
           T4 = V4) %>%
    add_column(Item = c(1:9),
               Type = "Input") %>%
    bind_rows(
      erm_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "eRm")
    ) %>%
    bind_rows(
      tam_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "TAM")
    ) %>%
    bind_rows(
      mirt_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "mirt")
    ) %>%
    bind_rows(
      pair_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "PAIR")
    ) %>%
    remove_rownames()
}

4 Results

Code
# collect item threshold parameters to a single dataframe
results_params_4t <- map_df(1:iterations, ~ sim_results_4t[[.x]] %>%
                               mutate(iteration = .x))

4.1 Summary table MAE

Mean Absolute Error

Code
# calculate absolute differences between input and estimated item thresholds per estimator, average by item
results_params_diff_4t <- results_params_4t %>%
  pivot_wider(
    values_from = c("T1", "T2", "T3", "T4"),
    names_from = "Type",
    id_cols = c("iteration", "Item")
  ) %>%
  group_by(iteration, Item) %>%
  summarise(
    diff_eRm = sum(abs(T1_Input - T1_eRm) + abs(T2_Input - T2_eRm) + abs(T3_Input - T3_eRm) + abs(T4_Input - T4_eRm))/4,
    diff_TAM = sum(abs(T1_Input - T1_TAM) + abs(T2_Input - T2_TAM) + abs(T3_Input - T3_TAM) + abs(T4_Input - T4_TAM))/4,
    diff_mirt = sum(abs(T1_Input - T1_mirt) + abs(T2_Input - T2_mirt) + abs(T3_Input - T3_mirt) + abs(T4_Input - T4_mirt))/4,
    diff_PAIR = sum(abs(T1_Input - T1_PAIR) + abs(T2_Input - T2_PAIR) + abs(T3_Input - T3_PAIR) + abs(T4_Input - T4_PAIR))/4
  ) %>%
  ungroup() %>%
  pivot_longer(
    cols = c(diff_eRm, diff_TAM,diff_mirt,diff_PAIR),
    names_to = c(NA, "Estimator"),
    names_sep = "_",
    values_to = "diff"
  )

# produce table with summary stats per estimator/package
results_params_diff_4t %>%
  group_by(Estimator) %>%
  summarise(
    Median = median(diff),
    MAD = mad(diff),
    IQR = IQR(diff),
    Mean = mean(diff),
    SD = sd(diff)
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  rename(Package = Estimator) %>%
  arrange(Median) %>%
  tt()
tinytable_0po126oqgr0buj0q9gye
Package Median MAD IQR Mean SD
mirt 0.101 0.043 0.059 0.106 0.043
eRm 0.102 0.043 0.059 0.106 0.043
TAM 0.140 0.064 0.087 0.152 0.069
PAIR 0.146 0.065 0.091 0.157 0.070

4.2 Summary table RMSE

Root Mean Squared Error

Code
# RMSE
results_params_4t %>%
  pivot_wider(
    values_from = c("T1", "T2", "T3", "T4"),
    names_from = "Type",
    id_cols = c("iteration", "Item")
  ) %>%
  group_by(iteration, Item) %>% # sqrt(mean((data$actual - data$predicted)^2))
  summarise(
    rmse_eRm = sqrt(mean(((T1_Input - T1_eRm)^2) + ((T2_Input - T2_eRm)^2) + ((T3_Input - T3_eRm)^2) + ((T4_Input - T4_eRm))^2)),
        rmse_TAM = sqrt(mean(((T1_Input - T1_TAM)^2) + ((T2_Input - T2_TAM)^2) + ((T3_Input - T3_TAM)^2) + ((T4_Input - T4_TAM))^2)),
        rmse_mirt = sqrt(mean(((T1_Input - T1_mirt)^2) + ((T2_Input - T2_mirt)^2) + ((T3_Input - T3_mirt)^2) + ((T4_Input - T4_mirt))^2)),
        rmse_PAIR = sqrt(mean(((T1_Input - T1_PAIR)^2) + ((T2_Input - T2_PAIR)^2) + ((T3_Input - T3_PAIR)^2) + ((T4_Input - T4_PAIR))^2))
  ) %>%
  ungroup() %>%
  pivot_longer(
    cols = c(rmse_eRm, rmse_TAM,rmse_mirt,rmse_PAIR),
    names_to = c(NA, "Estimator"),
    names_sep = "_",
    values_to = "rmse"
  ) %>% 
  group_by(Estimator) %>%
  summarise(
    Median = median(rmse),
    MAD = mad(rmse),
    IQR = IQR(rmse),
    Mean = mean(rmse),
    SD = sd(rmse)
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  rename(Package = Estimator) %>%
  arrange(Median) %>% 
  tt()
tinytable_q26dyrtpg74yhyxvj2vb
Package Median MAD IQR Mean SD
mirt 0.240 0.103 0.138 0.252 0.101
eRm 0.242 0.102 0.137 0.253 0.101
TAM 0.330 0.147 0.200 0.350 0.148
PAIR 0.344 0.149 0.204 0.364 0.151

4.3 Thresholds summary MAE

Code
# table summarizing all thresholds and estimators
results_params_4t %>%
  pivot_wider(values_from = c("T1", "T2", "T3", "T4"),
              names_from = "Type",
              id_cols = c("iteration","Item")
  ) %>%
  group_by(iteration,Item) %>%
  summarise(
    diff_eRm.T1 = abs(T1_eRm - T1_Input),
    diff_eRm.T2 = abs(T2_eRm - T2_Input),
    diff_eRm.T3 = abs(T3_eRm - T3_Input),
    diff_eRm.T4 = abs(T4_eRm - T4_Input),
    diff_TAM.T1 = abs(T1_TAM - T1_Input),
    diff_TAM.T2 = abs(T2_TAM - T2_Input),
    diff_TAM.T3 = abs(T3_TAM - T3_Input),
    diff_TAM.T4 = abs(T4_TAM - T4_Input),
    diff_mirt.T1 = abs(T1_mirt - T1_Input),
    diff_mirt.T2 = abs(T2_mirt - T2_Input),
    diff_mirt.T3 = abs(T3_mirt - T3_Input),
    diff_mirt.T4 = abs(T4_mirt - T4_Input),
    diff_PAIR.T1 = abs(T1_PAIR - T1_Input),
    diff_PAIR.T2 = abs(T2_PAIR - T2_Input),
    diff_PAIR.T3 = abs(T3_PAIR - T3_Input),
    diff_PAIR.T4 = abs(T4_PAIR - T4_Input)
  ) %>%
  ungroup() %>%
  pivot_longer(cols = starts_with("diff"),
               names_to = c("Estimator","Threshold"),
               names_sep = "\\.",
               values_to = "diff") %>%
  mutate(Package = gsub("diff_","",Estimator)) %>%
  group_by(Package, Threshold) %>%
  summarise(
    Median = median(diff),
    MAD = mad(diff),
    IQR = IQR(diff),
    Mean = mean(diff),
    SD = sd(diff)
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  arrange(Threshold,Median) %>%
  tt() %>% 
  style_tt(i = c(1, 5, 9, 13), j = 2, rowspan = 4, alignv = "t")
tinytable_bebptriuqhnmt4cb6k99
Package Threshold Median MAD IQR Mean SD
eRm T1 0.106 0.093 0.130 0.127 0.097
mirt T1 0.106 0.094 0.131 0.126 0.097
TAM T1 0.133 0.116 0.160 0.157 0.118
PAIR T1 0.142 0.118 0.166 0.164 0.121
eRm T2 0.072 0.063 0.088 0.085 0.064
mirt T2 0.072 0.062 0.087 0.085 0.063
TAM T2 0.123 0.108 0.153 0.144 0.106
PAIR T2 0.129 0.113 0.158 0.148 0.109
eRm T3 0.072 0.066 0.094 0.086 0.066
mirt T3 0.073 0.067 0.091 0.085 0.065
TAM T3 0.119 0.109 0.153 0.144 0.110
PAIR T3 0.120 0.108 0.155 0.147 0.113
mirt T4 0.105 0.091 0.131 0.127 0.097
eRm T4 0.106 0.093 0.130 0.128 0.097
TAM T4 0.138 0.117 0.164 0.163 0.121
PAIR T4 0.144 0.121 0.171 0.170 0.129

4.4 Figure

Code
results_params_4t %>%
  pivot_wider(values_from = c("T1", "T2", "T3", "T4"),
              names_from = "Type",
              id_cols = c("iteration","Item")
  ) %>%
  group_by(iteration,Item) %>%
  summarise(
    diff_eRm.T1 = T1_eRm - T1_Input,
    diff_eRm.T2 = T2_eRm - T2_Input,
    diff_eRm.T3 = T3_eRm - T3_Input,
    diff_eRm.T4 = T4_eRm - T4_Input,
    diff_TAM.T1 = T1_TAM - T1_Input,
    diff_TAM.T2 = T2_TAM - T2_Input,
    diff_TAM.T3 = T3_TAM - T3_Input,
    diff_TAM.T4 = T4_TAM - T4_Input,
    diff_mirt.T1 = T1_mirt - T1_Input,
    diff_mirt.T2 = T2_mirt - T2_Input,
    diff_mirt.T3 = T3_mirt - T3_Input,
    diff_mirt.T4 = T4_mirt - T4_Input,
    diff_PAIR.T1 = T1_PAIR - T1_Input,
    diff_PAIR.T2 = T2_PAIR - T2_Input,
    diff_PAIR.T3 = T3_PAIR - T3_Input,
    diff_PAIR.T4 = T4_PAIR - T4_Input
  ) %>%
  ungroup() %>%
  pivot_longer(cols = starts_with("diff"),
               names_to = c("Estimator","Threshold"),
               names_sep = "\\.",
               values_to = "diff") %>%
  mutate(Package = gsub("diff_","",Estimator)) %>%
  mutate(Threshold = car::recode(Threshold,"'T1'='Threshold 1';'T2'='Threshold 2';'T3'='Threshold 3';'T4'='Threshold 4';")) %>%
  ggplot(aes(x = diff, y = Package, slab_fill = after_stat(level))) +
  stat_dotsinterval(quantiles = 250, point_interval = "median_qi",
                    layout = "weave", slab_color = NA, .width = c(.66,.95)) +
  labs(caption = "Point interval: median_qi (.66 and .95). Based on 250 simulated datasets with 9 items and 720 respondents each.",
       x = "Bias (logit scale)",
       y = "Package",
       title = "Distribution of item threshold estimation bias",
       subtitle = "Rasch Partial Credit Model") +
  scale_color_manual(guide = "none", values = scales::brewer_pal()(3)[-1], aesthetics = "slab_fill") +
  facet_wrap(~Threshold, nrow = 1) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  theme_minimal()

5 Smaller sample

Since the pairwise method has been claimed to be good with small samples, at least compared to MML (Finch and French 2019). Notably, the paper mentions using the ltm package for MML estimation (which is referred to as “standard MLE” in the paper), for PCM estimation, but PCM is not available in ltm.

I was interested to have a quick look at this. We’ll use the same 250 datasets again, but randomly select 108 respondents (3 per threshold estimated) from each dataset.

Code
registerDoParallel(cores = 8)
iterations = 250
samplesize = 108 # n = 3 per threshold (instead of 20/threshold)

sim_results_4t2 <- list()

sim_results_4t2 <- foreach(i = 1:iterations) %dopar% {

  input_params <- sim_data_4t[[i]]$input_params
  testData <- sim_data_4t[[i]]$data[sample(1:720, samplesize), ]

  # check mean/centrality (item parameters should have been centered before data generation)
  input_params_correction <- mean(input_params)

  # eRm
  erm_out <- PCM(testData)
  erm_params <- thresholds(erm_out)[[3]][[1]][,-1] %>%
    as.data.frame() %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  erm_params_c <- erm_params - mean(erm_params) + input_params_correction

  # TAM
  tam_out <- tam(as.matrix(testData), irtmodel = "PCM", verbose = FALSE)

  tam_params <- tam_out$item_irt %>%
    as.data.frame() %>%
    select(starts_with("tau")) %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  tam_params_c <- tam_params - mean(tam_params) + input_params_correction

  # mirt
  mirt_out <- mirt(data = testData, model = 1, itemtype = "Rasch", verbose = FALSE)
  mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
    as.data.frame() %>%
    select(!a) %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()
  mirt_params_c <- mirt_params - mean(mirt_params) + input_params_correction

  # PAIR
  pair_out <- pair(testData, m = 5)
  pair_params <- deltapar(pair_out) %>%
    as.data.frame()

  pair_params <- pair_params[,-1] %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  pair_params_c <- pair_params - mean(pair_params) + input_params_correction

  # combine item parameters and calculate lowest to highest threshold distances
  input_params %>%
    as.data.frame() %>%
    rename(T1 = V1,
           T2 = V2,
           T3 = V3,
           T4 = V4) %>%
    add_column(Item = c(1:9),
               Type = "Input") %>%
    bind_rows(
      erm_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "eRm")
    ) %>%
    bind_rows(
      tam_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "TAM")
    ) %>%
    bind_rows(
      mirt_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "mirt")
    ) %>%
    bind_rows(
      pair_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "PAIR")
    ) %>%
    remove_rownames()
}

5.1 Results

Code
# collect item threshold parameters to a single dataframe
results_params_4t2 <- map_df(1:iterations, ~ sim_results_4t2[[.x]] %>%
                               mutate(iteration = .x))

5.1.1 Summary table MAE

Code
# calculate absolute differences between input and estimated item thresholds per estimator, average by item
results_params_diff_4t2 <- results_params_4t2 %>%
  pivot_wider(
    values_from = c("T1", "T2", "T3", "T4"),
    names_from = "Type",
    id_cols = c("iteration", "Item")
  ) %>%
  group_by(iteration, Item) %>%
  summarise(
    diff_eRm = sum(abs(T1_Input - T1_eRm) + abs(T2_Input - T2_eRm) + abs(T3_Input - T3_eRm) + abs(T4_Input - T4_eRm))/4,
    diff_TAM = sum(abs(T1_Input - T1_TAM) + abs(T2_Input - T2_TAM) + abs(T3_Input - T3_TAM) + abs(T4_Input - T4_TAM))/4,
    diff_mirt = sum(abs(T1_Input - T1_mirt) + abs(T2_Input - T2_mirt) + abs(T3_Input - T3_mirt) + abs(T4_Input - T4_mirt))/4,
    diff_PAIR = sum(abs(T1_Input - T1_PAIR) + abs(T2_Input - T2_PAIR) + abs(T3_Input - T3_PAIR) + abs(T4_Input - T4_PAIR))/4
  ) %>%
  ungroup() %>%
  pivot_longer(
    cols = c(diff_eRm, diff_TAM,diff_mirt,diff_PAIR),
    names_to = c(NA, "Estimator"),
    names_sep = "_",
    values_to = "diff"
  )

# produce table with summary stats per estimator/package
results_params_diff_4t2 %>%
  group_by(Estimator) %>%
  summarise(
    Median = median(diff, na.rm = TRUE),
    MAD = mad(diff, na.rm = TRUE),
    IQR = IQR(diff, na.rm = TRUE),
    Mean = mean(diff, na.rm = TRUE),
    SD = sd(diff, na.rm = TRUE),
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  rename(Package = Estimator) %>%
  arrange(Median) %>%
  tt()
tinytable_eat8y8n90g0mp1tyxubx
Package Median MAD IQR Mean SD
eRm 0.273 0.121 0.167 0.293 0.127
mirt 0.273 0.121 0.166 0.292 0.126
TAM 0.283 0.122 0.168 0.301 0.131
PAIR 0.297 0.130 0.182 0.326 0.184

5.1.2 Summary table RMSE

Code
# RMSE
results_params_4t2 %>%
  pivot_wider(
    values_from = c("T1", "T2", "T3", "T4"),
    names_from = "Type",
    id_cols = c("iteration", "Item")
  ) %>%
  group_by(iteration, Item) %>% # sqrt(mean((data$actual - data$predicted)^2))
  summarise(
    rmse_eRm = sqrt(mean(((T1_Input - T1_eRm)^2) + ((T2_Input - T2_eRm)^2) + ((T3_Input - T3_eRm)^2) + ((T4_Input - T4_eRm))^2)),
        rmse_TAM = sqrt(mean(((T1_Input - T1_TAM)^2) + ((T2_Input - T2_TAM)^2) + ((T3_Input - T3_TAM)^2) + ((T4_Input - T4_TAM))^2)),
        rmse_mirt = sqrt(mean(((T1_Input - T1_mirt)^2) + ((T2_Input - T2_mirt)^2) + ((T3_Input - T3_mirt)^2) + ((T4_Input - T4_mirt))^2)),
        rmse_PAIR = sqrt(mean(((T1_Input - T1_PAIR)^2) + ((T2_Input - T2_PAIR)^2) + ((T3_Input - T3_PAIR)^2) + ((T4_Input - T4_PAIR))^2))
  ) %>%
  ungroup() %>%
  pivot_longer(
    cols = c(rmse_eRm, rmse_TAM,rmse_mirt,rmse_PAIR),
    names_to = c(NA, "Estimator"),
    names_sep = "_",
    values_to = "rmse"
  ) %>% 
  group_by(Estimator) %>%
  summarise(
    Median = median(rmse, na.rm = T),
    MAD = mad(rmse, na.rm = T),
    IQR = IQR(rmse, na.rm = T),
    Mean = mean(rmse, na.rm = T),
    SD = sd(rmse, na.rm = T)
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  rename(Package = Estimator) %>%
  arrange(Median) %>% 
  tt()
tinytable_1005n7a3vvcypthp701e
Package Median MAD IQR Mean SD
mirt 0.657 0.287 0.394 0.704 0.319
eRm 0.658 0.289 0.400 0.708 0.321
TAM 0.668 0.283 0.380 0.709 0.296
PAIR 0.700 0.302 0.419 0.768 0.418

5.1.3 Thresholds summary MAE

Code
# table summarizing all thresholds and estimators
results_params_4t2 %>%
  pivot_wider(values_from = c("T1", "T2", "T3", "T4"),
              names_from = "Type",
              id_cols = c("iteration","Item")
  ) %>%
  group_by(iteration,Item) %>%
  summarise(
    diff_eRm.T1 = abs(T1_eRm - T1_Input),
    diff_eRm.T2 = abs(T2_eRm - T2_Input),
    diff_eRm.T3 = abs(T3_eRm - T3_Input),
    diff_eRm.T4 = abs(T4_eRm - T4_Input),
    diff_TAM.T1 = abs(T1_TAM - T1_Input),
    diff_TAM.T2 = abs(T2_TAM - T2_Input),
    diff_TAM.T3 = abs(T3_TAM - T3_Input),
    diff_TAM.T4 = abs(T4_TAM - T4_Input),
    diff_mirt.T1 = abs(T1_mirt - T1_Input),
    diff_mirt.T2 = abs(T2_mirt - T2_Input),
    diff_mirt.T3 = abs(T3_mirt - T3_Input),
    diff_mirt.T4 = abs(T4_mirt - T4_Input),
    diff_PAIR.T1 = abs(T1_PAIR - T1_Input),
    diff_PAIR.T2 = abs(T2_PAIR - T2_Input),
    diff_PAIR.T3 = abs(T3_PAIR - T3_Input),
    diff_PAIR.T4 = abs(T4_PAIR - T4_Input)
  ) %>%
  ungroup() %>%
  pivot_longer(cols = starts_with("diff"),
               names_to = c("Estimator","Threshold"),
               names_sep = "\\.",
               values_to = "diff") %>%
  mutate(Package = gsub("diff_","",Estimator)) %>%
  group_by(Package, Threshold) %>%
  summarise(
    Median = median(diff, na.rm = TRUE),
    MAD = mad(diff, na.rm = TRUE),
    IQR = IQR(diff, na.rm = TRUE),
    Mean = mean(diff, na.rm = TRUE),
    SD = sd(diff, na.rm = TRUE)
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  arrange(Threshold,Median) %>%
  tt() %>% 
  style_tt(i = c(1, 5, 9, 13), j = 2, rowspan = 4, alignv = "t")
tinytable_777lxuxj5dm3imapdb4u
Package Threshold Median MAD IQR Mean SD
TAM T1 0.277 0.240 0.332 0.330 0.259
PAIR T1 0.286 0.252 0.353 0.357 0.308
eRm T1 0.290 0.258 0.374 0.360 0.303
mirt T1 0.292 0.261 0.367 0.358 0.299
mirt T2 0.184 0.162 0.233 0.223 0.177
eRm T2 0.187 0.166 0.231 0.224 0.178
TAM T2 0.226 0.206 0.290 0.270 0.209
PAIR T2 0.247 0.214 0.296 0.289 0.229
mirt T3 0.194 0.171 0.241 0.232 0.179
eRm T3 0.196 0.170 0.245 0.234 0.180
TAM T3 0.236 0.207 0.290 0.278 0.211
PAIR T3 0.252 0.220 0.302 0.296 0.239
TAM T4 0.258 0.234 0.335 0.325 0.264
eRm T4 0.281 0.251 0.349 0.353 0.299
mirt T4 0.284 0.250 0.352 0.353 0.297
PAIR T4 0.290 0.262 0.368 0.363 0.360

5.1.4 Figure

Code
results_params_4t2 %>%
  pivot_wider(values_from = c("T1", "T2", "T3", "T4"),
              names_from = "Type",
              id_cols = c("iteration","Item")
  ) %>%
  group_by(iteration,Item) %>%
  summarise(
    diff_eRm.T1 = T1_eRm - T1_Input,
    diff_eRm.T2 = T2_eRm - T2_Input,
    diff_eRm.T3 = T3_eRm - T3_Input,
    diff_eRm.T4 = T4_eRm - T4_Input,
    diff_TAM.T1 = T1_TAM - T1_Input,
    diff_TAM.T2 = T2_TAM - T2_Input,
    diff_TAM.T3 = T3_TAM - T3_Input,
    diff_TAM.T4 = T4_TAM - T4_Input,
    diff_mirt.T1 = T1_mirt - T1_Input,
    diff_mirt.T2 = T2_mirt - T2_Input,
    diff_mirt.T3 = T3_mirt - T3_Input,
    diff_mirt.T4 = T4_mirt - T4_Input,
    diff_PAIR.T1 = T1_PAIR - T1_Input,
    diff_PAIR.T2 = T2_PAIR - T2_Input,
    diff_PAIR.T3 = T3_PAIR - T3_Input,
    diff_PAIR.T4 = T4_PAIR - T4_Input
  ) %>%
  ungroup() %>%
  pivot_longer(cols = starts_with("diff"),
               names_to = c("Estimator","Threshold"),
               names_sep = "\\.",
               values_to = "diff") %>%
  mutate(Package = gsub("diff_","",Estimator)) %>%
  mutate(Threshold = car::recode(Threshold,"'T1'='Threshold 1';'T2'='Threshold 2';'T3'='Threshold 3';'T4'='Threshold 4';")) %>%
  ggplot(aes(x = diff, y = Package, slab_fill = after_stat(level))) +
  stat_dotsinterval(quantiles = 250, point_interval = "median_qi",
                    layout = "weave", slab_color = NA, .width = c(.66,.95)) +
  labs(caption = str_wrap("Point interval: median_qi (.66 and .95). Based on 250 simulated datasets with 9 items and 108 respondents each."),
       x = "Bias (logit scale)",
       y = "Package",
       title = "Distribution of item threshold estimation bias",
       subtitle = "Rasch Partial Credit Model") +
  scale_color_manual(guide = "none", values = scales::brewer_pal()(3)[-1], aesthetics = "slab_fill") +
  facet_wrap(~Threshold, nrow = 1) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  theme_minimal()

6 Even smaller sample

n = 54, 2 per threshold estimated.

Code
registerDoParallel(cores = 8)
iterations = 250
samplesize = 54 # n = 2 per threshold (instead of 20/threshold)

sim_results_4t3 <- list()

sim_results_4t3 <- foreach(i = 1:iterations) %dopar% {

  input_params <- sim_data_4t[[i]]$input_params
  testData <- sim_data_4t[[i]]$data[sample(1:720, samplesize), ]

  # check mean/centrality (item parameters should have been centered before data generation)
  input_params_correction <- mean(input_params)

  # eRm
  erm_out <- PCM(testData)
  erm_params <- thresholds(erm_out)[[3]][[1]][,-1] %>%
    as.data.frame() %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  erm_params_c <- erm_params - mean(erm_params) + input_params_correction

  # TAM
  tam_out <- tam(as.matrix(testData), irtmodel = "PCM", verbose = FALSE)

  tam_params <- tam_out$item_irt %>%
    as.data.frame() %>%
    select(starts_with("tau")) %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  tam_params_c <- tam_params - mean(tam_params) + input_params_correction

  # mirt
  mirt_out <- mirt(data = testData, model = 1, itemtype = "Rasch", verbose = FALSE)
  mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
    as.data.frame() %>%
    select(!a) %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()
  mirt_params_c <- mirt_params - mean(mirt_params) + input_params_correction

  # PAIR
  pair_out <- pair(testData, m = 5)
  pair_params <- deltapar(pair_out) %>%
    as.data.frame()

  pair_params <- pair_params[,-1] %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  pair_params_c <- pair_params - mean(pair_params) + input_params_correction

  # combine item parameters and calculate lowest to highest threshold distances
  input_params %>%
    as.data.frame() %>%
    rename(T1 = V1,
           T2 = V2,
           T3 = V3,
           T4 = V4) %>%
    add_column(Item = c(1:9),
               Type = "Input") %>%
    bind_rows(
      erm_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "eRm")
    ) %>%
    bind_rows(
      tam_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "TAM")
    ) %>%
    bind_rows(
      mirt_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "mirt")
    ) %>%
    bind_rows(
      pair_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "PAIR")
    ) %>%
    remove_rownames()
}

6.1 Results

Code
# collect item threshold parameters to a single dataframe
results_params_4t3 <- map_df(1:iterations, ~ sim_results_4t3[[.x]] %>%
                               mutate(iteration = .x))

6.1.1 Summary table MAE

Code
# calculate absolute differences between input and estimated item thresholds per estimator, average by item
results_params_diff_4t3 <- results_params_4t3 %>%
  pivot_wider(
    values_from = c("T1", "T2", "T3", "T4"),
    names_from = "Type",
    id_cols = c("iteration", "Item")
  ) %>%
  group_by(iteration, Item) %>%
  summarise(
    diff_eRm = sum(abs(T1_Input - T1_eRm) + abs(T2_Input - T2_eRm) + abs(T3_Input - T3_eRm) + abs(T4_Input - T4_eRm))/4,
    diff_TAM = sum(abs(T1_Input - T1_TAM) + abs(T2_Input - T2_TAM) + abs(T3_Input - T3_TAM) + abs(T4_Input - T4_TAM))/4,
    diff_mirt = sum(abs(T1_Input - T1_mirt) + abs(T2_Input - T2_mirt) + abs(T3_Input - T3_mirt) + abs(T4_Input - T4_mirt))/4,
    diff_PAIR = sum(abs(T1_Input - T1_PAIR) + abs(T2_Input - T2_PAIR) + abs(T3_Input - T3_PAIR) + abs(T4_Input - T4_PAIR))/4
  ) %>%
  ungroup() %>%
  pivot_longer(
    cols = c(diff_eRm, diff_TAM,diff_mirt,diff_PAIR),
    names_to = c(NA, "Estimator"),
    names_sep = "_",
    values_to = "diff"
  )

# produce table with summary stats per estimator/package
results_params_diff_4t3 %>%
  group_by(Estimator) %>%
  summarise(
    Median = median(diff, na.rm = TRUE),
    MAD = mad(diff, na.rm = TRUE),
    IQR = IQR(diff, na.rm = TRUE),
    Mean = mean(diff, na.rm = TRUE),
    SD = sd(diff, na.rm = TRUE),
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  rename(Package = Estimator) %>%
  arrange(Median) %>%
  tt()
tinytable_0avvuv15de1s8xf3hktk
Package Median MAD IQR Mean SD
TAM 0.384 0.167 0.237 0.575 1.471
mirt 0.390 0.170 0.233 0.410 0.177
eRm 0.392 0.177 0.237 0.413 0.180
PAIR 0.422 0.207 0.293 0.535 0.470

6.1.2 Summary table RMSE

Code
# RMSE
results_params_4t3 %>%
  pivot_wider(
    values_from = c("T1", "T2", "T3", "T4"),
    names_from = "Type",
    id_cols = c("iteration", "Item")
  ) %>%
  group_by(iteration, Item) %>% # sqrt(mean((data$actual - data$predicted)^2))
  summarise(
    rmse_eRm = sqrt(mean(((T1_Input - T1_eRm)^2) + ((T2_Input - T2_eRm)^2) + ((T3_Input - T3_eRm)^2) + ((T4_Input - T4_eRm))^2)),
        rmse_TAM = sqrt(mean(((T1_Input - T1_TAM)^2) + ((T2_Input - T2_TAM)^2) + ((T3_Input - T3_TAM)^2) + ((T4_Input - T4_TAM))^2)),
        rmse_mirt = sqrt(mean(((T1_Input - T1_mirt)^2) + ((T2_Input - T2_mirt)^2) + ((T3_Input - T3_mirt)^2) + ((T4_Input - T4_mirt))^2)),
        rmse_PAIR = sqrt(mean(((T1_Input - T1_PAIR)^2) + ((T2_Input - T2_PAIR)^2) + ((T3_Input - T3_PAIR)^2) + ((T4_Input - T4_PAIR))^2))
  ) %>%
  ungroup() %>%
  pivot_longer(
    cols = c(rmse_eRm, rmse_TAM,rmse_mirt,rmse_PAIR),
    names_to = c(NA, "Estimator"),
    names_sep = "_",
    values_to = "rmse"
  ) %>% 
  group_by(Estimator) %>%
  summarise(
    Median = median(rmse, na.rm = T),
    MAD = mad(rmse, na.rm = T),
    IQR = IQR(rmse, na.rm = T),
    Mean = mean(rmse, na.rm = T),
    SD = sd(rmse, na.rm = T)
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  rename(Package = Estimator) %>%
  arrange(Median) %>% 
  tt()
tinytable_zmjrgt9crv6rrmgkpnid
Package Median MAD IQR Mean SD
TAM 0.902 0.370 0.507 1.342 3.381
mirt 0.925 0.400 0.543 0.987 0.430
eRm 0.937 0.405 0.550 0.994 0.439
PAIR 1.009 0.468 0.655 1.250 1.078

6.1.3 Thresholds summary MAE

Code
# table summarizing all thresholds and estimators
results_params_4t3 %>%
  pivot_wider(values_from = c("T1", "T2", "T3", "T4"),
              names_from = "Type",
              id_cols = c("iteration","Item")
  ) %>%
  group_by(iteration,Item) %>%
  summarise(
    diff_eRm.T1 = abs(T1_eRm - T1_Input),
    diff_eRm.T2 = abs(T2_eRm - T2_Input),
    diff_eRm.T3 = abs(T3_eRm - T3_Input),
    diff_eRm.T4 = abs(T4_eRm - T4_Input),
    diff_TAM.T1 = abs(T1_TAM - T1_Input),
    diff_TAM.T2 = abs(T2_TAM - T2_Input),
    diff_TAM.T3 = abs(T3_TAM - T3_Input),
    diff_TAM.T4 = abs(T4_TAM - T4_Input),
    diff_mirt.T1 = abs(T1_mirt - T1_Input),
    diff_mirt.T2 = abs(T2_mirt - T2_Input),
    diff_mirt.T3 = abs(T3_mirt - T3_Input),
    diff_mirt.T4 = abs(T4_mirt - T4_Input),
    diff_PAIR.T1 = abs(T1_PAIR - T1_Input),
    diff_PAIR.T2 = abs(T2_PAIR - T2_Input),
    diff_PAIR.T3 = abs(T3_PAIR - T3_Input),
    diff_PAIR.T4 = abs(T4_PAIR - T4_Input)
  ) %>%
  ungroup() %>%
  pivot_longer(cols = starts_with("diff"),
               names_to = c("Estimator","Threshold"),
               names_sep = "\\.",
               values_to = "diff") %>%
  mutate(Package = gsub("diff_","",Estimator)) %>%
  group_by(Package, Threshold) %>%
  summarise(
    Median = median(diff, na.rm = TRUE),
    MAD = mad(diff, na.rm = TRUE),
    IQR = IQR(diff, na.rm = TRUE),
    Mean = mean(diff, na.rm = TRUE),
    SD = sd(diff, na.rm = TRUE)
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  arrange(Threshold,Median) %>%
  tt() %>% 
  style_tt(i = c(1, 5, 9, 13), j = 2, rowspan = 4, alignv = "t")
tinytable_4111b9318frvetbjqouw
Package Threshold Median MAD IQR Mean SD
TAM T1 0.386 0.343 0.486 0.794 2.957
mirt T1 0.407 0.366 0.512 0.508 0.417
eRm T1 0.412 0.368 0.524 0.514 0.425
PAIR T1 0.433 0.399 0.575 0.619 0.749
mirt T2 0.272 0.239 0.332 0.319 0.249
eRm T2 0.274 0.246 0.339 0.322 0.251
TAM T2 0.312 0.279 0.398 0.479 1.013
PAIR T2 0.353 0.315 0.448 0.451 0.404
mirt T3 0.266 0.237 0.336 0.320 0.249
eRm T3 0.270 0.235 0.329 0.322 0.250
TAM T3 0.302 0.271 0.394 0.473 1.011
PAIR T3 0.342 0.309 0.431 0.446 0.409
TAM T4 0.367 0.339 0.474 0.555 1.025
eRm T4 0.412 0.361 0.503 0.496 0.396
mirt T4 0.416 0.367 0.507 0.494 0.392
PAIR T4 0.425 0.403 0.590 0.626 0.809

6.1.4 Figure

Code
results_params_4t3 %>%
  pivot_wider(values_from = c("T1", "T2", "T3", "T4"),
              names_from = "Type",
              id_cols = c("iteration","Item")
  ) %>%
  group_by(iteration,Item) %>%
  summarise(
    diff_eRm.T1 = T1_eRm - T1_Input,
    diff_eRm.T2 = T2_eRm - T2_Input,
    diff_eRm.T3 = T3_eRm - T3_Input,
    diff_eRm.T4 = T4_eRm - T4_Input,
    diff_TAM.T1 = T1_TAM - T1_Input,
    diff_TAM.T2 = T2_TAM - T2_Input,
    diff_TAM.T3 = T3_TAM - T3_Input,
    diff_TAM.T4 = T4_TAM - T4_Input,
    diff_mirt.T1 = T1_mirt - T1_Input,
    diff_mirt.T2 = T2_mirt - T2_Input,
    diff_mirt.T3 = T3_mirt - T3_Input,
    diff_mirt.T4 = T4_mirt - T4_Input,
    diff_PAIR.T1 = T1_PAIR - T1_Input,
    diff_PAIR.T2 = T2_PAIR - T2_Input,
    diff_PAIR.T3 = T3_PAIR - T3_Input,
    diff_PAIR.T4 = T4_PAIR - T4_Input
  ) %>%
  ungroup() %>%
  pivot_longer(cols = starts_with("diff"),
               names_to = c("Estimator","Threshold"),
               names_sep = "\\.",
               values_to = "diff") %>%
  mutate(Package = gsub("diff_","",Estimator)) %>%
  mutate(Threshold = car::recode(Threshold,"'T1'='Threshold 1';'T2'='Threshold 2';'T3'='Threshold 3';'T4'='Threshold 4';")) %>%
  ggplot(aes(x = diff, y = Package, slab_fill = after_stat(level))) +
  stat_dotsinterval(quantiles = 250, point_interval = "median_qi",
                    layout = "weave", slab_color = NA, .width = c(.66,.95)) +
  labs(caption = str_wrap("Point interval: median_qi (.66 and .95). Based on 250 simulated datasets with 9 items and 54 respondents each. X-axis omits extreme values."),
       x = "Bias (logit scale)",
       y = "Package",
       title = "Distribution of item threshold estimation bias",
       subtitle = "Rasch Partial Credit Model") +
  scale_color_manual(guide = "none", values = scales::brewer_pal()(3)[-1], aesthetics = "slab_fill") +
  facet_wrap(~Threshold, nrow = 1) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  theme_minimal() +
  coord_cartesian(xlim = c(-2.5,2.5))

Code
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Stockholm
tzcode source: internal

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] tinytable_0.3.0   doParallel_1.0.17 iterators_1.0.14  foreach_1.5.2    
 [5] ggdist_3.3.2      pairwise_0.6.1-0  mirt_1.42         lattice_0.22-6   
 [9] TAM_4.2-21        CDM_8.2-6         mvtnorm_1.2-5     janitor_2.2.0    
[13] eRm_1.0-6         lubridate_1.9.3   forcats_1.0.0     stringr_1.5.1    
[17] dplyr_1.1.4       purrr_1.0.2       readr_2.1.5       tidyr_1.3.1      
[21] tibble_3.2.1      ggplot2_3.5.1     tidyverse_2.0.0  

loaded via a namespace (and not attached):
 [1] mnormt_2.1.1         pbapply_1.7-2        gridExtra_2.3       
 [4] permute_0.9-7        testthat_3.2.1.1     rlang_1.1.4         
 [7] magrittr_2.0.3       snakecase_0.11.1     compiler_4.4.1      
[10] mgcv_1.9-1           vctrs_0.6.5          quadprog_1.5-8      
[13] pkgconfig_2.0.3      fastmap_1.2.0        labeling_0.4.3      
[16] utf8_1.2.4           rmarkdown_2.27       sessioninfo_1.2.2   
[19] tzdb_0.4.0           xfun_0.46            jsonlite_1.8.8      
[22] Deriv_4.1.3          psych_2.4.6.26       cluster_2.1.6       
[25] R6_2.5.1             stringi_1.8.4        RColorBrewer_1.1-3  
[28] parallelly_1.38.0    car_3.1-2            brio_1.1.5          
[31] Rcpp_1.0.13          knitr_1.48           future.apply_1.11.2 
[34] snow_0.4-4           audio_0.1-11         R.utils_2.12.3      
[37] polycor_0.8-1        Matrix_1.7-0         splines_4.4.1       
[40] timechange_0.3.0     tidyselect_1.2.1     rstudioapi_0.16.0   
[43] abind_1.4-5          yaml_2.3.10          vegan_2.6-6.1       
[46] codetools_0.2-20     dcurver_0.9.2        curl_5.2.1          
[49] admisc_0.35          listenv_0.9.1        withr_3.0.1         
[52] evaluate_0.24.0      future_1.34.0        pillar_1.9.0        
[55] carData_3.0-5        distributional_0.4.0 generics_0.1.3      
[58] hms_1.1.3            munsell_0.5.1        scales_1.3.0        
[61] globals_0.16.3       glue_1.7.0           RPushbullet_0.3.4   
[64] tools_4.4.1          beepr_2.0            SimDesign_2.17.1    
[67] grid_4.4.1           colorspace_2.1-1     nlme_3.1-165        
[70] cli_3.6.3            fansi_1.0.6          gtable_0.3.5        
[73] R.methodsS3_1.8.2    digest_0.6.36        progressr_0.14.0    
[76] GPArotation_2024.3-1 htmlwidgets_1.6.4    farver_2.1.2        
[79] htmltools_0.5.8.1    R.oo_1.26.0          lifecycle_1.0.4     
[82] MASS_7.3-61         

7 References

Finch, Holmes, and Brian F. French. 2019. “A Comparison of Estimation Techniques for IRT Models with Small Samples.” Applied Measurement in Education 32 (2): 77–96. https://doi.org/10.1080/08957347.2019.1577243.

Reuse

CC BY 4.0

Citation

BibTeX citation:
@online{johansson2024,
  author = {Johansson, Magnus},
  title = {Comparing {Rasch} Packages/Estimators},
  date = {2024-09-06},
  url = {https://pgmj.github.io/est_comp.html},
  langid = {en}
}
For attribution, please cite this work as:
Johansson, Magnus. 2024. “Comparing Rasch Packages/Estimators.” September 6, 2024. https://pgmj.github.io/est_comp.html.
Source Code
---
title: "Comparing Rasch packages/estimators"
subtitle: "Item threshold parameter estimation"
author:
  name: 'Magnus Johansson'
  affiliation: 'RISE Research Institutes of Sweden'
  affiliation-url: 'https://ri.se/shic'
  orcid: '0000-0003-1669-592X'
date: 2024-09-06
date-format: iso
google-scholar: true
citation:
  type: 'webpage'
format: 
  html:
    code-fold: true
execute: 
  cache: true
  warning: false
  message: false
editor_options: 
  chunk_output_type: console
bibliography: est_comp.bib
---

## Background

I'm doing comparisons of the bias in the estimators for the Partial Credit Model (PCM) implemented in Rasch R packages and thought I would share a bit of code and simulated data. There will be more extensive simulations and comparisons coming along, varying targeting and sample distribution.

For this example we have a simple setup:

- 250 datasets
- each dataset has 9 items and 4 thresholds (5 response categories)
- each dataset has 720 respondents (20 per threshold estimated, which should be plenty)

The .rds data file contains a list object with the 250 datasets, and each dataset is accompanied by a matrix of item threshold parameters and a vector of theta values used to generate the response data.

- the vector of theta values is normally distributed with a mean and SD closely matching that of the item parameters

We have four packages in the comparison:

- `eRm` - Conditional Maximum Likelihood (CML)
- `TAM` - Marginal ML (MML)
- `pairwise` - Pairwise CML (PCML)
- `mirt` - fixed quadrature expectation-maximization algorithm

Other estimation methods are available in `TAM` and `mirt`, but we'll just test these for now.

The datafile is available in the [Github repo](https://github.com/pgmj/pgmj.github.io/blob/main/sim_data_4t.rds).

## Data import

```{r}
library(tidyverse)
library(eRm)
library(janitor)
library(TAM)
library(mirt)
library(pairwise)
library(ggdist)
library(doParallel)
library(tinytable)

### some commands exist in multiple packages, here we define preferred ones that are frequently used
select <- dplyr::select
count <- dplyr::count
recode <- car::recode
rename <- dplyr::rename

# read 250 simulated datasets, stored in a list() object
sim_data_4t <- readRDS("sim_data_4t.rds")
```

## Estimation

We'll make use of `library(doParallel)` for multicore processing.

```{r}
registerDoParallel(cores = 8)
iterations = 250

sim_results_4t <- list()

sim_results_4t <- foreach(i = 1:iterations) %dopar% {

  input_params <- sim_data_4t[[i]]$input_params
  input_thetas <- sim_data_4t[[i]]$input_thetas
  testData <- sim_data_4t[[i]]$data

  # check mean/centrality (item parameters should have been centered before data generation)
  input_params_correction <- mean(input_params)

  # eRm
  erm_out <- PCM(testData)
  erm_params <- thresholds(erm_out)[[3]][[1]][,-1] %>%
    as.data.frame() %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  erm_params_c <- erm_params - mean(erm_params) + input_params_correction

  # TAM
  tam_out <- tam(as.matrix(testData), irtmodel = "PCM", verbose = FALSE)

  tam_params <- tam_out$item_irt %>%
    as.data.frame() %>%
    select(starts_with("tau")) %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  tam_params_c <- tam_params - mean(tam_params) + input_params_correction

  # mirt
  mirt_out <- mirt(data = testData, model = 1, itemtype = "Rasch", verbose = FALSE)
  mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
    as.data.frame() %>%
    select(!a) %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()
  mirt_params_c <- mirt_params - mean(mirt_params) + input_params_correction

  # PAIR
  pair_out <- pair(testData)
  pair_params <- deltapar(pair_out) %>%
    as.data.frame()

  pair_params <- pair_params[,-1] %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  pair_params_c <- pair_params - mean(pair_params) + input_params_correction

  # combine item parameters and calculate lowest to highest threshold distances
  input_params %>%
    as.data.frame() %>%
    rename(T1 = V1,
           T2 = V2,
           T3 = V3,
           T4 = V4) %>%
    add_column(Item = c(1:9),
               Type = "Input") %>%
    bind_rows(
      erm_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "eRm")
    ) %>%
    bind_rows(
      tam_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "TAM")
    ) %>%
    bind_rows(
      mirt_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "mirt")
    ) %>%
    bind_rows(
      pair_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "PAIR")
    ) %>%
    remove_rownames()
}
```

## Results

```{r}
# collect item threshold parameters to a single dataframe
results_params_4t <- map_df(1:iterations, ~ sim_results_4t[[.x]] %>%
                               mutate(iteration = .x))

```

### Summary table MAE

Mean Absolute Error

```{r}
# calculate absolute differences between input and estimated item thresholds per estimator, average by item
results_params_diff_4t <- results_params_4t %>%
  pivot_wider(
    values_from = c("T1", "T2", "T3", "T4"),
    names_from = "Type",
    id_cols = c("iteration", "Item")
  ) %>%
  group_by(iteration, Item) %>%
  summarise(
    diff_eRm = sum(abs(T1_Input - T1_eRm) + abs(T2_Input - T2_eRm) + abs(T3_Input - T3_eRm) + abs(T4_Input - T4_eRm))/4,
    diff_TAM = sum(abs(T1_Input - T1_TAM) + abs(T2_Input - T2_TAM) + abs(T3_Input - T3_TAM) + abs(T4_Input - T4_TAM))/4,
    diff_mirt = sum(abs(T1_Input - T1_mirt) + abs(T2_Input - T2_mirt) + abs(T3_Input - T3_mirt) + abs(T4_Input - T4_mirt))/4,
    diff_PAIR = sum(abs(T1_Input - T1_PAIR) + abs(T2_Input - T2_PAIR) + abs(T3_Input - T3_PAIR) + abs(T4_Input - T4_PAIR))/4
  ) %>%
  ungroup() %>%
  pivot_longer(
    cols = c(diff_eRm, diff_TAM,diff_mirt,diff_PAIR),
    names_to = c(NA, "Estimator"),
    names_sep = "_",
    values_to = "diff"
  )

# produce table with summary stats per estimator/package
results_params_diff_4t %>%
  group_by(Estimator) %>%
  summarise(
    Median = median(diff),
    MAD = mad(diff),
    IQR = IQR(diff),
    Mean = mean(diff),
    SD = sd(diff)
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  rename(Package = Estimator) %>%
  arrange(Median) %>%
  tt()
```

### Summary table RMSE

Root Mean Squared Error

```{r}
# RMSE
results_params_4t %>%
  pivot_wider(
    values_from = c("T1", "T2", "T3", "T4"),
    names_from = "Type",
    id_cols = c("iteration", "Item")
  ) %>%
  group_by(iteration, Item) %>% # sqrt(mean((data$actual - data$predicted)^2))
  summarise(
    rmse_eRm = sqrt(mean(((T1_Input - T1_eRm)^2) + ((T2_Input - T2_eRm)^2) + ((T3_Input - T3_eRm)^2) + ((T4_Input - T4_eRm))^2)),
        rmse_TAM = sqrt(mean(((T1_Input - T1_TAM)^2) + ((T2_Input - T2_TAM)^2) + ((T3_Input - T3_TAM)^2) + ((T4_Input - T4_TAM))^2)),
        rmse_mirt = sqrt(mean(((T1_Input - T1_mirt)^2) + ((T2_Input - T2_mirt)^2) + ((T3_Input - T3_mirt)^2) + ((T4_Input - T4_mirt))^2)),
        rmse_PAIR = sqrt(mean(((T1_Input - T1_PAIR)^2) + ((T2_Input - T2_PAIR)^2) + ((T3_Input - T3_PAIR)^2) + ((T4_Input - T4_PAIR))^2))
  ) %>%
  ungroup() %>%
  pivot_longer(
    cols = c(rmse_eRm, rmse_TAM,rmse_mirt,rmse_PAIR),
    names_to = c(NA, "Estimator"),
    names_sep = "_",
    values_to = "rmse"
  ) %>% 
  group_by(Estimator) %>%
  summarise(
    Median = median(rmse),
    MAD = mad(rmse),
    IQR = IQR(rmse),
    Mean = mean(rmse),
    SD = sd(rmse)
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  rename(Package = Estimator) %>%
  arrange(Median) %>% 
  tt()
```

### Thresholds summary MAE

```{r}
# table summarizing all thresholds and estimators
results_params_4t %>%
  pivot_wider(values_from = c("T1", "T2", "T3", "T4"),
              names_from = "Type",
              id_cols = c("iteration","Item")
  ) %>%
  group_by(iteration,Item) %>%
  summarise(
    diff_eRm.T1 = abs(T1_eRm - T1_Input),
    diff_eRm.T2 = abs(T2_eRm - T2_Input),
    diff_eRm.T3 = abs(T3_eRm - T3_Input),
    diff_eRm.T4 = abs(T4_eRm - T4_Input),
    diff_TAM.T1 = abs(T1_TAM - T1_Input),
    diff_TAM.T2 = abs(T2_TAM - T2_Input),
    diff_TAM.T3 = abs(T3_TAM - T3_Input),
    diff_TAM.T4 = abs(T4_TAM - T4_Input),
    diff_mirt.T1 = abs(T1_mirt - T1_Input),
    diff_mirt.T2 = abs(T2_mirt - T2_Input),
    diff_mirt.T3 = abs(T3_mirt - T3_Input),
    diff_mirt.T4 = abs(T4_mirt - T4_Input),
    diff_PAIR.T1 = abs(T1_PAIR - T1_Input),
    diff_PAIR.T2 = abs(T2_PAIR - T2_Input),
    diff_PAIR.T3 = abs(T3_PAIR - T3_Input),
    diff_PAIR.T4 = abs(T4_PAIR - T4_Input)
  ) %>%
  ungroup() %>%
  pivot_longer(cols = starts_with("diff"),
               names_to = c("Estimator","Threshold"),
               names_sep = "\\.",
               values_to = "diff") %>%
  mutate(Package = gsub("diff_","",Estimator)) %>%
  group_by(Package, Threshold) %>%
  summarise(
    Median = median(diff),
    MAD = mad(diff),
    IQR = IQR(diff),
    Mean = mean(diff),
    SD = sd(diff)
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  arrange(Threshold,Median) %>%
  tt() %>% 
  style_tt(i = c(1, 5, 9, 13), j = 2, rowspan = 4, alignv = "t")
```

### Figure

```{r}
results_params_4t %>%
  pivot_wider(values_from = c("T1", "T2", "T3", "T4"),
              names_from = "Type",
              id_cols = c("iteration","Item")
  ) %>%
  group_by(iteration,Item) %>%
  summarise(
    diff_eRm.T1 = T1_eRm - T1_Input,
    diff_eRm.T2 = T2_eRm - T2_Input,
    diff_eRm.T3 = T3_eRm - T3_Input,
    diff_eRm.T4 = T4_eRm - T4_Input,
    diff_TAM.T1 = T1_TAM - T1_Input,
    diff_TAM.T2 = T2_TAM - T2_Input,
    diff_TAM.T3 = T3_TAM - T3_Input,
    diff_TAM.T4 = T4_TAM - T4_Input,
    diff_mirt.T1 = T1_mirt - T1_Input,
    diff_mirt.T2 = T2_mirt - T2_Input,
    diff_mirt.T3 = T3_mirt - T3_Input,
    diff_mirt.T4 = T4_mirt - T4_Input,
    diff_PAIR.T1 = T1_PAIR - T1_Input,
    diff_PAIR.T2 = T2_PAIR - T2_Input,
    diff_PAIR.T3 = T3_PAIR - T3_Input,
    diff_PAIR.T4 = T4_PAIR - T4_Input
  ) %>%
  ungroup() %>%
  pivot_longer(cols = starts_with("diff"),
               names_to = c("Estimator","Threshold"),
               names_sep = "\\.",
               values_to = "diff") %>%
  mutate(Package = gsub("diff_","",Estimator)) %>%
  mutate(Threshold = car::recode(Threshold,"'T1'='Threshold 1';'T2'='Threshold 2';'T3'='Threshold 3';'T4'='Threshold 4';")) %>%
  ggplot(aes(x = diff, y = Package, slab_fill = after_stat(level))) +
  stat_dotsinterval(quantiles = 250, point_interval = "median_qi",
                    layout = "weave", slab_color = NA, .width = c(.66,.95)) +
  labs(caption = "Point interval: median_qi (.66 and .95). Based on 250 simulated datasets with 9 items and 720 respondents each.",
       x = "Bias (logit scale)",
       y = "Package",
       title = "Distribution of item threshold estimation bias",
       subtitle = "Rasch Partial Credit Model") +
  scale_color_manual(guide = "none", values = scales::brewer_pal()(3)[-1], aesthetics = "slab_fill") +
  facet_wrap(~Threshold, nrow = 1) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  theme_minimal()

```

## Smaller sample

Since the pairwise method has been claimed to be good with small samples, at least compared to MML [@Finch_French_2019]. Notably, the paper mentions using the `ltm` package for MML estimation (which is referred to as "standard MLE" in the paper), for PCM estimation, but PCM is not available in `ltm`.

I was interested to have a quick look at this. We'll use the same 250 datasets again, but randomly select 108 respondents (3 per threshold estimated) from each dataset.

```{r}
registerDoParallel(cores = 8)
iterations = 250
samplesize = 108 # n = 3 per threshold (instead of 20/threshold)

sim_results_4t2 <- list()

sim_results_4t2 <- foreach(i = 1:iterations) %dopar% {

  input_params <- sim_data_4t[[i]]$input_params
  testData <- sim_data_4t[[i]]$data[sample(1:720, samplesize), ]

  # check mean/centrality (item parameters should have been centered before data generation)
  input_params_correction <- mean(input_params)

  # eRm
  erm_out <- PCM(testData)
  erm_params <- thresholds(erm_out)[[3]][[1]][,-1] %>%
    as.data.frame() %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  erm_params_c <- erm_params - mean(erm_params) + input_params_correction

  # TAM
  tam_out <- tam(as.matrix(testData), irtmodel = "PCM", verbose = FALSE)

  tam_params <- tam_out$item_irt %>%
    as.data.frame() %>%
    select(starts_with("tau")) %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  tam_params_c <- tam_params - mean(tam_params) + input_params_correction

  # mirt
  mirt_out <- mirt(data = testData, model = 1, itemtype = "Rasch", verbose = FALSE)
  mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
    as.data.frame() %>%
    select(!a) %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()
  mirt_params_c <- mirt_params - mean(mirt_params) + input_params_correction

  # PAIR
  pair_out <- pair(testData, m = 5)
  pair_params <- deltapar(pair_out) %>%
    as.data.frame()

  pair_params <- pair_params[,-1] %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  pair_params_c <- pair_params - mean(pair_params) + input_params_correction

  # combine item parameters and calculate lowest to highest threshold distances
  input_params %>%
    as.data.frame() %>%
    rename(T1 = V1,
           T2 = V2,
           T3 = V3,
           T4 = V4) %>%
    add_column(Item = c(1:9),
               Type = "Input") %>%
    bind_rows(
      erm_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "eRm")
    ) %>%
    bind_rows(
      tam_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "TAM")
    ) %>%
    bind_rows(
      mirt_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "mirt")
    ) %>%
    bind_rows(
      pair_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "PAIR")
    ) %>%
    remove_rownames()
}
```

### Results

```{r}
# collect item threshold parameters to a single dataframe
results_params_4t2 <- map_df(1:iterations, ~ sim_results_4t2[[.x]] %>%
                               mutate(iteration = .x))

```

#### Summary table MAE

```{r}
# calculate absolute differences between input and estimated item thresholds per estimator, average by item
results_params_diff_4t2 <- results_params_4t2 %>%
  pivot_wider(
    values_from = c("T1", "T2", "T3", "T4"),
    names_from = "Type",
    id_cols = c("iteration", "Item")
  ) %>%
  group_by(iteration, Item) %>%
  summarise(
    diff_eRm = sum(abs(T1_Input - T1_eRm) + abs(T2_Input - T2_eRm) + abs(T3_Input - T3_eRm) + abs(T4_Input - T4_eRm))/4,
    diff_TAM = sum(abs(T1_Input - T1_TAM) + abs(T2_Input - T2_TAM) + abs(T3_Input - T3_TAM) + abs(T4_Input - T4_TAM))/4,
    diff_mirt = sum(abs(T1_Input - T1_mirt) + abs(T2_Input - T2_mirt) + abs(T3_Input - T3_mirt) + abs(T4_Input - T4_mirt))/4,
    diff_PAIR = sum(abs(T1_Input - T1_PAIR) + abs(T2_Input - T2_PAIR) + abs(T3_Input - T3_PAIR) + abs(T4_Input - T4_PAIR))/4
  ) %>%
  ungroup() %>%
  pivot_longer(
    cols = c(diff_eRm, diff_TAM,diff_mirt,diff_PAIR),
    names_to = c(NA, "Estimator"),
    names_sep = "_",
    values_to = "diff"
  )

# produce table with summary stats per estimator/package
results_params_diff_4t2 %>%
  group_by(Estimator) %>%
  summarise(
    Median = median(diff, na.rm = TRUE),
    MAD = mad(diff, na.rm = TRUE),
    IQR = IQR(diff, na.rm = TRUE),
    Mean = mean(diff, na.rm = TRUE),
    SD = sd(diff, na.rm = TRUE),
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  rename(Package = Estimator) %>%
  arrange(Median) %>%
  tt()
```

#### Summary table RMSE

```{r}
# RMSE
results_params_4t2 %>%
  pivot_wider(
    values_from = c("T1", "T2", "T3", "T4"),
    names_from = "Type",
    id_cols = c("iteration", "Item")
  ) %>%
  group_by(iteration, Item) %>% # sqrt(mean((data$actual - data$predicted)^2))
  summarise(
    rmse_eRm = sqrt(mean(((T1_Input - T1_eRm)^2) + ((T2_Input - T2_eRm)^2) + ((T3_Input - T3_eRm)^2) + ((T4_Input - T4_eRm))^2)),
        rmse_TAM = sqrt(mean(((T1_Input - T1_TAM)^2) + ((T2_Input - T2_TAM)^2) + ((T3_Input - T3_TAM)^2) + ((T4_Input - T4_TAM))^2)),
        rmse_mirt = sqrt(mean(((T1_Input - T1_mirt)^2) + ((T2_Input - T2_mirt)^2) + ((T3_Input - T3_mirt)^2) + ((T4_Input - T4_mirt))^2)),
        rmse_PAIR = sqrt(mean(((T1_Input - T1_PAIR)^2) + ((T2_Input - T2_PAIR)^2) + ((T3_Input - T3_PAIR)^2) + ((T4_Input - T4_PAIR))^2))
  ) %>%
  ungroup() %>%
  pivot_longer(
    cols = c(rmse_eRm, rmse_TAM,rmse_mirt,rmse_PAIR),
    names_to = c(NA, "Estimator"),
    names_sep = "_",
    values_to = "rmse"
  ) %>% 
  group_by(Estimator) %>%
  summarise(
    Median = median(rmse, na.rm = T),
    MAD = mad(rmse, na.rm = T),
    IQR = IQR(rmse, na.rm = T),
    Mean = mean(rmse, na.rm = T),
    SD = sd(rmse, na.rm = T)
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  rename(Package = Estimator) %>%
  arrange(Median) %>% 
  tt()
```


#### Thresholds summary MAE

```{r}
# table summarizing all thresholds and estimators
results_params_4t2 %>%
  pivot_wider(values_from = c("T1", "T2", "T3", "T4"),
              names_from = "Type",
              id_cols = c("iteration","Item")
  ) %>%
  group_by(iteration,Item) %>%
  summarise(
    diff_eRm.T1 = abs(T1_eRm - T1_Input),
    diff_eRm.T2 = abs(T2_eRm - T2_Input),
    diff_eRm.T3 = abs(T3_eRm - T3_Input),
    diff_eRm.T4 = abs(T4_eRm - T4_Input),
    diff_TAM.T1 = abs(T1_TAM - T1_Input),
    diff_TAM.T2 = abs(T2_TAM - T2_Input),
    diff_TAM.T3 = abs(T3_TAM - T3_Input),
    diff_TAM.T4 = abs(T4_TAM - T4_Input),
    diff_mirt.T1 = abs(T1_mirt - T1_Input),
    diff_mirt.T2 = abs(T2_mirt - T2_Input),
    diff_mirt.T3 = abs(T3_mirt - T3_Input),
    diff_mirt.T4 = abs(T4_mirt - T4_Input),
    diff_PAIR.T1 = abs(T1_PAIR - T1_Input),
    diff_PAIR.T2 = abs(T2_PAIR - T2_Input),
    diff_PAIR.T3 = abs(T3_PAIR - T3_Input),
    diff_PAIR.T4 = abs(T4_PAIR - T4_Input)
  ) %>%
  ungroup() %>%
  pivot_longer(cols = starts_with("diff"),
               names_to = c("Estimator","Threshold"),
               names_sep = "\\.",
               values_to = "diff") %>%
  mutate(Package = gsub("diff_","",Estimator)) %>%
  group_by(Package, Threshold) %>%
  summarise(
    Median = median(diff, na.rm = TRUE),
    MAD = mad(diff, na.rm = TRUE),
    IQR = IQR(diff, na.rm = TRUE),
    Mean = mean(diff, na.rm = TRUE),
    SD = sd(diff, na.rm = TRUE)
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  arrange(Threshold,Median) %>%
  tt() %>% 
  style_tt(i = c(1, 5, 9, 13), j = 2, rowspan = 4, alignv = "t")
```


#### Figure

```{r}
results_params_4t2 %>%
  pivot_wider(values_from = c("T1", "T2", "T3", "T4"),
              names_from = "Type",
              id_cols = c("iteration","Item")
  ) %>%
  group_by(iteration,Item) %>%
  summarise(
    diff_eRm.T1 = T1_eRm - T1_Input,
    diff_eRm.T2 = T2_eRm - T2_Input,
    diff_eRm.T3 = T3_eRm - T3_Input,
    diff_eRm.T4 = T4_eRm - T4_Input,
    diff_TAM.T1 = T1_TAM - T1_Input,
    diff_TAM.T2 = T2_TAM - T2_Input,
    diff_TAM.T3 = T3_TAM - T3_Input,
    diff_TAM.T4 = T4_TAM - T4_Input,
    diff_mirt.T1 = T1_mirt - T1_Input,
    diff_mirt.T2 = T2_mirt - T2_Input,
    diff_mirt.T3 = T3_mirt - T3_Input,
    diff_mirt.T4 = T4_mirt - T4_Input,
    diff_PAIR.T1 = T1_PAIR - T1_Input,
    diff_PAIR.T2 = T2_PAIR - T2_Input,
    diff_PAIR.T3 = T3_PAIR - T3_Input,
    diff_PAIR.T4 = T4_PAIR - T4_Input
  ) %>%
  ungroup() %>%
  pivot_longer(cols = starts_with("diff"),
               names_to = c("Estimator","Threshold"),
               names_sep = "\\.",
               values_to = "diff") %>%
  mutate(Package = gsub("diff_","",Estimator)) %>%
  mutate(Threshold = car::recode(Threshold,"'T1'='Threshold 1';'T2'='Threshold 2';'T3'='Threshold 3';'T4'='Threshold 4';")) %>%
  ggplot(aes(x = diff, y = Package, slab_fill = after_stat(level))) +
  stat_dotsinterval(quantiles = 250, point_interval = "median_qi",
                    layout = "weave", slab_color = NA, .width = c(.66,.95)) +
  labs(caption = str_wrap("Point interval: median_qi (.66 and .95). Based on 250 simulated datasets with 9 items and 108 respondents each."),
       x = "Bias (logit scale)",
       y = "Package",
       title = "Distribution of item threshold estimation bias",
       subtitle = "Rasch Partial Credit Model") +
  scale_color_manual(guide = "none", values = scales::brewer_pal()(3)[-1], aesthetics = "slab_fill") +
  facet_wrap(~Threshold, nrow = 1) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  theme_minimal()

```

## Even smaller sample

n = 54, 2 per threshold estimated.

```{r}
registerDoParallel(cores = 8)
iterations = 250
samplesize = 54 # n = 2 per threshold (instead of 20/threshold)

sim_results_4t3 <- list()

sim_results_4t3 <- foreach(i = 1:iterations) %dopar% {

  input_params <- sim_data_4t[[i]]$input_params
  testData <- sim_data_4t[[i]]$data[sample(1:720, samplesize), ]

  # check mean/centrality (item parameters should have been centered before data generation)
  input_params_correction <- mean(input_params)

  # eRm
  erm_out <- PCM(testData)
  erm_params <- thresholds(erm_out)[[3]][[1]][,-1] %>%
    as.data.frame() %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  erm_params_c <- erm_params - mean(erm_params) + input_params_correction

  # TAM
  tam_out <- tam(as.matrix(testData), irtmodel = "PCM", verbose = FALSE)

  tam_params <- tam_out$item_irt %>%
    as.data.frame() %>%
    select(starts_with("tau")) %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  tam_params_c <- tam_params - mean(tam_params) + input_params_correction

  # mirt
  mirt_out <- mirt(data = testData, model = 1, itemtype = "Rasch", verbose = FALSE)
  mirt_params <- coef(mirt_out, simplify = TRUE, IRTpars = TRUE)$items %>%
    as.data.frame() %>%
    select(!a) %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()
  mirt_params_c <- mirt_params - mean(mirt_params) + input_params_correction

  # PAIR
  pair_out <- pair(testData, m = 5)
  pair_params <- deltapar(pair_out) %>%
    as.data.frame()

  pair_params <- pair_params[,-1] %>%
    set_names(c("T1","T2","T3","T4")) %>%
    as.matrix()

  pair_params_c <- pair_params - mean(pair_params) + input_params_correction

  # combine item parameters and calculate lowest to highest threshold distances
  input_params %>%
    as.data.frame() %>%
    rename(T1 = V1,
           T2 = V2,
           T3 = V3,
           T4 = V4) %>%
    add_column(Item = c(1:9),
               Type = "Input") %>%
    bind_rows(
      erm_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "eRm")
    ) %>%
    bind_rows(
      tam_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "TAM")
    ) %>%
    bind_rows(
      mirt_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "mirt")
    ) %>%
    bind_rows(
      pair_params_c %>%
        as.data.frame() %>%
        add_column(Item = c(1:9),
                   Type = "PAIR")
    ) %>%
    remove_rownames()
}
```

### Results

```{r}
# collect item threshold parameters to a single dataframe
results_params_4t3 <- map_df(1:iterations, ~ sim_results_4t3[[.x]] %>%
                               mutate(iteration = .x))

```

#### Summary table MAE

```{r}
# calculate absolute differences between input and estimated item thresholds per estimator, average by item
results_params_diff_4t3 <- results_params_4t3 %>%
  pivot_wider(
    values_from = c("T1", "T2", "T3", "T4"),
    names_from = "Type",
    id_cols = c("iteration", "Item")
  ) %>%
  group_by(iteration, Item) %>%
  summarise(
    diff_eRm = sum(abs(T1_Input - T1_eRm) + abs(T2_Input - T2_eRm) + abs(T3_Input - T3_eRm) + abs(T4_Input - T4_eRm))/4,
    diff_TAM = sum(abs(T1_Input - T1_TAM) + abs(T2_Input - T2_TAM) + abs(T3_Input - T3_TAM) + abs(T4_Input - T4_TAM))/4,
    diff_mirt = sum(abs(T1_Input - T1_mirt) + abs(T2_Input - T2_mirt) + abs(T3_Input - T3_mirt) + abs(T4_Input - T4_mirt))/4,
    diff_PAIR = sum(abs(T1_Input - T1_PAIR) + abs(T2_Input - T2_PAIR) + abs(T3_Input - T3_PAIR) + abs(T4_Input - T4_PAIR))/4
  ) %>%
  ungroup() %>%
  pivot_longer(
    cols = c(diff_eRm, diff_TAM,diff_mirt,diff_PAIR),
    names_to = c(NA, "Estimator"),
    names_sep = "_",
    values_to = "diff"
  )

# produce table with summary stats per estimator/package
results_params_diff_4t3 %>%
  group_by(Estimator) %>%
  summarise(
    Median = median(diff, na.rm = TRUE),
    MAD = mad(diff, na.rm = TRUE),
    IQR = IQR(diff, na.rm = TRUE),
    Mean = mean(diff, na.rm = TRUE),
    SD = sd(diff, na.rm = TRUE),
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  rename(Package = Estimator) %>%
  arrange(Median) %>%
  tt()
```

#### Summary table RMSE

```{r}
# RMSE
results_params_4t3 %>%
  pivot_wider(
    values_from = c("T1", "T2", "T3", "T4"),
    names_from = "Type",
    id_cols = c("iteration", "Item")
  ) %>%
  group_by(iteration, Item) %>% # sqrt(mean((data$actual - data$predicted)^2))
  summarise(
    rmse_eRm = sqrt(mean(((T1_Input - T1_eRm)^2) + ((T2_Input - T2_eRm)^2) + ((T3_Input - T3_eRm)^2) + ((T4_Input - T4_eRm))^2)),
        rmse_TAM = sqrt(mean(((T1_Input - T1_TAM)^2) + ((T2_Input - T2_TAM)^2) + ((T3_Input - T3_TAM)^2) + ((T4_Input - T4_TAM))^2)),
        rmse_mirt = sqrt(mean(((T1_Input - T1_mirt)^2) + ((T2_Input - T2_mirt)^2) + ((T3_Input - T3_mirt)^2) + ((T4_Input - T4_mirt))^2)),
        rmse_PAIR = sqrt(mean(((T1_Input - T1_PAIR)^2) + ((T2_Input - T2_PAIR)^2) + ((T3_Input - T3_PAIR)^2) + ((T4_Input - T4_PAIR))^2))
  ) %>%
  ungroup() %>%
  pivot_longer(
    cols = c(rmse_eRm, rmse_TAM,rmse_mirt,rmse_PAIR),
    names_to = c(NA, "Estimator"),
    names_sep = "_",
    values_to = "rmse"
  ) %>% 
  group_by(Estimator) %>%
  summarise(
    Median = median(rmse, na.rm = T),
    MAD = mad(rmse, na.rm = T),
    IQR = IQR(rmse, na.rm = T),
    Mean = mean(rmse, na.rm = T),
    SD = sd(rmse, na.rm = T)
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  rename(Package = Estimator) %>%
  arrange(Median) %>% 
  tt()
```

#### Thresholds summary MAE

```{r}
# table summarizing all thresholds and estimators
results_params_4t3 %>%
  pivot_wider(values_from = c("T1", "T2", "T3", "T4"),
              names_from = "Type",
              id_cols = c("iteration","Item")
  ) %>%
  group_by(iteration,Item) %>%
  summarise(
    diff_eRm.T1 = abs(T1_eRm - T1_Input),
    diff_eRm.T2 = abs(T2_eRm - T2_Input),
    diff_eRm.T3 = abs(T3_eRm - T3_Input),
    diff_eRm.T4 = abs(T4_eRm - T4_Input),
    diff_TAM.T1 = abs(T1_TAM - T1_Input),
    diff_TAM.T2 = abs(T2_TAM - T2_Input),
    diff_TAM.T3 = abs(T3_TAM - T3_Input),
    diff_TAM.T4 = abs(T4_TAM - T4_Input),
    diff_mirt.T1 = abs(T1_mirt - T1_Input),
    diff_mirt.T2 = abs(T2_mirt - T2_Input),
    diff_mirt.T3 = abs(T3_mirt - T3_Input),
    diff_mirt.T4 = abs(T4_mirt - T4_Input),
    diff_PAIR.T1 = abs(T1_PAIR - T1_Input),
    diff_PAIR.T2 = abs(T2_PAIR - T2_Input),
    diff_PAIR.T3 = abs(T3_PAIR - T3_Input),
    diff_PAIR.T4 = abs(T4_PAIR - T4_Input)
  ) %>%
  ungroup() %>%
  pivot_longer(cols = starts_with("diff"),
               names_to = c("Estimator","Threshold"),
               names_sep = "\\.",
               values_to = "diff") %>%
  mutate(Package = gsub("diff_","",Estimator)) %>%
  group_by(Package, Threshold) %>%
  summarise(
    Median = median(diff, na.rm = TRUE),
    MAD = mad(diff, na.rm = TRUE),
    IQR = IQR(diff, na.rm = TRUE),
    Mean = mean(diff, na.rm = TRUE),
    SD = sd(diff, na.rm = TRUE)
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x,3))) %>%
  arrange(Threshold,Median) %>%
  tt() %>% 
  style_tt(i = c(1, 5, 9, 13), j = 2, rowspan = 4, alignv = "t")
```


#### Figure

```{r}
results_params_4t3 %>%
  pivot_wider(values_from = c("T1", "T2", "T3", "T4"),
              names_from = "Type",
              id_cols = c("iteration","Item")
  ) %>%
  group_by(iteration,Item) %>%
  summarise(
    diff_eRm.T1 = T1_eRm - T1_Input,
    diff_eRm.T2 = T2_eRm - T2_Input,
    diff_eRm.T3 = T3_eRm - T3_Input,
    diff_eRm.T4 = T4_eRm - T4_Input,
    diff_TAM.T1 = T1_TAM - T1_Input,
    diff_TAM.T2 = T2_TAM - T2_Input,
    diff_TAM.T3 = T3_TAM - T3_Input,
    diff_TAM.T4 = T4_TAM - T4_Input,
    diff_mirt.T1 = T1_mirt - T1_Input,
    diff_mirt.T2 = T2_mirt - T2_Input,
    diff_mirt.T3 = T3_mirt - T3_Input,
    diff_mirt.T4 = T4_mirt - T4_Input,
    diff_PAIR.T1 = T1_PAIR - T1_Input,
    diff_PAIR.T2 = T2_PAIR - T2_Input,
    diff_PAIR.T3 = T3_PAIR - T3_Input,
    diff_PAIR.T4 = T4_PAIR - T4_Input
  ) %>%
  ungroup() %>%
  pivot_longer(cols = starts_with("diff"),
               names_to = c("Estimator","Threshold"),
               names_sep = "\\.",
               values_to = "diff") %>%
  mutate(Package = gsub("diff_","",Estimator)) %>%
  mutate(Threshold = car::recode(Threshold,"'T1'='Threshold 1';'T2'='Threshold 2';'T3'='Threshold 3';'T4'='Threshold 4';")) %>%
  ggplot(aes(x = diff, y = Package, slab_fill = after_stat(level))) +
  stat_dotsinterval(quantiles = 250, point_interval = "median_qi",
                    layout = "weave", slab_color = NA, .width = c(.66,.95)) +
  labs(caption = str_wrap("Point interval: median_qi (.66 and .95). Based on 250 simulated datasets with 9 items and 54 respondents each. X-axis omits extreme values."),
       x = "Bias (logit scale)",
       y = "Package",
       title = "Distribution of item threshold estimation bias",
       subtitle = "Rasch Partial Credit Model") +
  scale_color_manual(guide = "none", values = scales::brewer_pal()(3)[-1], aesthetics = "slab_fill") +
  facet_wrap(~Threshold, nrow = 1) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  theme_minimal() +
  coord_cartesian(xlim = c(-2.5,2.5))

```

```{r}
sessionInfo()
```

## References