Reliability and measurement error in psychometrics

1 Introduction

Note

Purposes of writing this text:

to provide an applied example of how to estimate latent/factor scores (thetas) and their Standard Error of Measurement (SEM).
demonstrate how SEM relates to the true score, based on simulation.
test the functions in brms that enables one to specify measurement error in a regression model.

This post is mainly about the Standard Error of Measurement (SEM) metric estimated by Item Response Theory (IRT)/Rasch theta estimation tools, but since this will probably not be familiar to most readers I will to provide some additional information and context. I write this post primarily from the perspective of psychological measurement using questionnaires with multiple items to assess a latent variable, such as a depression or well-being questionnaire. First, I need to point out that the Rasch measurement model allows one to estimate reliability properties of the set of items themselves, independent of a sample. This may seem like a strange thing to many readers, especially if you mostly have experience with psychometrics within the Classical Test Theory (CTT) paradigm (factor analysis and other related methods). Second, the estimated SEM in Rasch/IRT varies across the latent continuum, it is not a constant point estimate, nor is it the same for every individual (Samejima 1994). The Fisher information function is used to estimate a standard error dependent on location on the latent continuum (Kreiner and Christensen 2012). This is often described in a curve showing the Test Information Function (see Figure 6). The connection between item threshold locations and TIF can also be seen by reviewing the targeting figure (Figure 3) and its middle section that aggregates the threshold locations. To summarise, here are two things regarding CTT/CFA methods that are not unknown but very seldom discussed:

reliability within CTT is sample dependent. You have most likely never read a CTT psychometrics paper that provides information about the reliability of the measure or test itself.
- This is due to the inability of the measurement model to separate person and item properties (see “specific objectivity”).
- This is also a problem with IRT models with more than one parameter¹.
reliability within CTT is always² a constant value claimed to be the same for all participants at all levels of the latent variable. (I think this is an unreasonable assumption.)

There has been a lot of discussion within CTT about reliability metrics during the last 15 years or so, mostly justified critique of Cronbach’s alpha (i.e Sijtsma 2008; McNeish 2018; Flora 2020; Cortina et al. 2020). But also mostly proposing other sample dependent metrics that provide a point estimate of reliability (some argue for also presenting some form of confidence interval). This is not to say that we should not talk about sample reliability, we absolutely should! But it is important to differentiate between test/measure properties and sample properties. This is particularly important in the context of psychometric analyses and the “validation” of measures, where the intent usually is for others to be able to make use of a measure. If you are interested in learning more about psychometrics and IRT/Rasch measurement models, you might find these resources helpful:

Intro slides that are rather verbose and should help clarify the basic concepts.
Our preprint on psychometric criteria, which also includes a Rasch psychometric analysis reporting checklist.
The package vignette for the RISEkbmRasch R package.

2 Setup

Many of the functions used in this post are available in my R package for Rasch analysis, RISEkbmRasch, including the function that simulates response data. The package is not available on CRAN, see below (or the link) for installation instructions.

Code

# You need to use devtools or remotes to install the package from GitHub.
# First install devtools by:
# install.packages('devtools')
# then 
# devtools::install_github("pgmj/RISEkbmRasch", dependencies = TRUE)
library(RISEkbmRasch) # this also loads a bunch of other packages
library(tidyverse)
library(eRm)
library(catR)
library(readxl)
library(janitor)
library(tinytable)
library(faux)
library(ggrain)
library(patchwork)
library(lme4)
library(brms)
library(modelsummary)
library(broom.mixed)

### some commands exist in multiple packages, here we define preferred ones that are frequently used
select <- dplyr::select
count <- dplyr::count
rename <- dplyr::rename

# get theming/colors
source("RISE_theme.R")
extrafont::loadfonts(quiet = TRUE)

We’ll use data and item parameters for the Perceived Stress Scale’s seven negative items (Rozental, Forsström, and Johansson 2023).

Code

df.all <- read_excel("data/Swedish_PSS_Rasch_analysis.xlsx") %>% 
  select(starts_with("PSS"))

names(df.all) <- paste0("q",c(1:14))

final_items <- c("q1","q2","q3","q8","q11","q12","q14")

df <- df.all %>% 
  select(all_of(final_items))

3 Ordinal/interval transformation table

Code

RIscoreSE(df, score_range = c(-5,5))

Table 1: Ordinal/interval transformation table

Ordinal sum score	Logit score	Logit std.error
0	-4.409	1.546
1	-3.167	0.935
2	-2.514	0.755
3	-2.038	0.662
4	-1.651	0.604
5	-1.319	0.563
6	-1.026	0.532
7	-0.760	0.508
8	-0.515	0.489
9	-0.287	0.473
10	-0.073	0.460
11	0.131	0.450
12	0.326	0.441
13	0.513	0.434
14	0.696	0.429
15	0.874	0.426
16	1.051	0.425
17	1.226	0.426
18	1.404	0.430
19	1.584	0.436
20	1.772	0.446
21	1.968	0.460
22	2.180	0.479
23	2.412	0.506
24	2.676	0.544
25	2.988	0.601
26	3.384	0.692
27	3.951	0.871
28	5.000	1.400

This provides us with a lookup table (Table 1) for transforming ordinal sum scores from the seven items to logit scores, with their respective SEM. Multiplying SEM by 1.96 gives us a 95% confidence interval. Logit scores and SEM are estimated using the Weighted Likelihood (WL) method (Warm 1989), which is less biased than estimation with Maximum Likelihood (ML). WL is virtually unbiased in the middle of the scale, while more biased than ML at extreme theta values (Kreiner and Christensen 2012). The lookup table works for respondents with complete response data. If you have respondents with missing responses the estimation method we will use below will take this into account when estimating the SEM. No need for imputation.

Code

RIscoreSE(df, score_range = c(-5,5), output = "figure", error_multiplier = 1.96)

Figure 1: Ordinal sum scores and corresponding interval scores with 95% CI

Figure 1 illustrates the transformation table information. We can also look at the SEM separately, across the latent variable continuum.

Code

ord_int_table<- RIscoreSE(df, 
                          score_range = c(-5,5), 
                          output = "dataframe") %>% 
  janitor::clean_names() 
  
ord_int_table %>% 
  ggplot(aes(x = logit_score, y = logit_std_error)) +
  geom_point() +
  geom_line() +
  coord_cartesian(ylim = c(0,1.6), xlim = c(-5,5)) +
  scale_x_continuous(breaks = seq(-5,5,1)) +
  labs(x = "Interval (logit) score",
       y = "Measurement error")

Figure 2: Measurement error across the latent continuum

4 Simulation

We’ll now simulate a dataset with 2000 respondents based on the item parameters and a vector of latent/factor scores (thetas). Then we’ll estimate thetas and SEM based on the response data generated. Finally, we’ll look at the coverage of estimated values compared to the input vector of scores.

Note

This is just a single simulated dataset for illustration, just as the regression example below also only uses one dataset. This is not a simulation study meant to show generalizable results. As such, the results should be interpreted with care. However, the code provided could be helpful if you want to conduct such as study.

4.1 Input parameters

The item parameters need to be in a matrix object, with items as rows and their thresholds (which may vary in numbers) as columns.

Code

# read item parameters file
item_params <- read_csv("data/itemParameters.csv") %>% 
  as.matrix()

item_params

     Threshold 1 Threshold 2 Threshold 3 Threshold 4
[1,]     -0.4271      0.8707      2.2086      3.1729
[2,]     -0.5380      0.6746      1.4325      2.6219
[3,]     -2.0240     -0.6276      0.8315      1.6901
[4,]     -1.6764     -0.1577      1.1670      1.8740
[5,]     -0.6826      0.8884      1.7903      2.9425
[6,]     -2.9467     -1.6746     -0.3566      0.8507
[7,]     -0.5125      0.7619      1.7508      2.2217

4.2 Targeting

Just to get a picture of how the item category thresholds are distributed, we can use the data from our paper to produce a targeting figure.

Code

RItargeting(df)

Figure 3: Targeting properties of the PSS-7

4.3 Simulating data

Now we can use the item parameters to simulate data. We’ll use rnorm() with a mean that matches the item parameters and SD = 1.5, and generate 2000 theta values. Then these are used to simulate response data for the 7 items.

Code

set.seed(1437)

# simulation function needs item parameters as a list object
itemlist <- list(
  i1 = list(item_params[1,]),
  i2 = list(item_params[2,]),
  i3 = list(item_params[3,]),
  i4 = list(item_params[4,]),
  i5 = list(item_params[5,]),
  i6 = list(item_params[6,]),
  i7 = list(item_params[7,])
)

# randomly generated normal distribution of thetas
nsim = 2000
inputThetas <- rnorm(n = nsim, 
                     mean = mean(item_params), # 0.57 
                     sd = 1.5)

# simulate response data based on thetas and items above
simData <- SimPartialScore(
  deltaslist = itemlist,
  thetavec = inputThetas
) %>%
  as.data.frame()

4.4 Estimating theta & SEM

Next, we estimate theta values and their SEM from the response data, using the known item parameters. This is how you would/should use a calibrated item set with real data as well.

Code

# estimate theta values.
est_thetas <- RIestThetas(simData, 
                          itemParams = item_params, 
                          theta_range = c(-5,5)
                          )
# estimate SEM
est_sem <- 
  map_vec(est_thetas, ~ semTheta(thEst = .x,
                                 it = item_params,
                                 model = "PCM",
                                 method = "WL",
                                 range = c(-5, 5)))

5 Results

Code

# get all variables in the same dataframe
sim_data <- simData %>% 
  add_column(est_thetas = est_thetas,
             est_sem = est_sem,
             input_thetas = inputThetas)

5.1 Input vs estimated thetas

Code

sim_data %>% 
  add_column(id = c(1:nsim)) %>% 
  pivot_longer(c(input_thetas,est_thetas)) %>% 
  ggplot(aes(x = name, y = value, fill = name)) +
  geom_rain(id.long.var = "id", alpha = 0.7) +
  scale_y_continuous(limits = c(-5,5), breaks = seq(-5,5,1)) +
  guides(fill = "none") +
  theme_rise() +
  labs(x = "", y = "Theta (logit scale)")

Figure 4: Comparing input and estimated theta values

As you can see, the generated sample of “input_thetas” has a continuous gaussian distribution, while the estimated thetas are limited to certain values. This is due the the number and distribution of items and item category thresholds (see Figure 3).

5.2 Coverage

Code

cov_data <- sim_data %>% 
  mutate(absdiff = abs(est_thetas - input_thetas)) %>%
  mutate(coverage_95 = factor(ifelse(absdiff < 1.96*est_sem, 1, 0)),
         coverage_90 = factor(ifelse(absdiff < 1.645*est_sem, 1, 0)),
         coverage_75 = factor(ifelse(absdiff < 1.15*est_sem, 1, 0))) %>% 
  pivot_longer(cols = c(coverage_95, coverage_90, coverage_75),
               names_to = "coverage",
               values_to = "coverage_n") %>% 
  count(coverage_n, by = coverage) %>% 
  filter(coverage_n == 1) %>% 
  mutate(`Actual coverage` = n/nsim*100, .before = "n") %>% 
  select(!c(coverage_n,n,by)) %>% 
  add_column(`Expected coverage` = c("75%","90%","95%"))

uncov <- 100-cov_data[3,1] %>% pull()
tt(cov_data)

Table 2: Expected and actual coverage in sample

tinytable_b37a5wf4q9un8dawfz17

Actual coverage	Expected coverage
75.45	75%
90.40	90%
95.40	95%

Code

sim_data %>% 
  mutate(error = est_thetas - input_thetas,
         absdiff = abs(est_thetas - input_thetas),
         coverage_95 = factor(ifelse(absdiff < 1.96*est_sem, 1, 0))
         ) %>% 
  mutate_if(is.numeric, round, 2) %>% 
  left_join(.,ord_int_table %>% mutate_if(is.numeric, round, 2), by = join_by("est_thetas" == "logit_score")) %>% 
  ggplot(aes(x = est_thetas, y = error)) +
  geom_errorbar(aes(ymin = 0 - (1.96 * logit_std_error),
                    ymax = 0 + (1.96 * logit_std_error)),
                width = 0.1, color = "darkgrey", alpha = 0.3) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_point(aes(color = coverage_95), alpha = 0.6) +
  labs(title = "",
       x = "Estimated theta values",
       y = "Measurement error") +
  guides(color = "none") +
  labs(caption = paste0("Red dots (",round(uncov,2),"%) indicate estimated thetas outside of 95% confidence interval (SEM * 1.96, indicated by grey error bars).")) +
  theme_rise()

Figure 5: Distribution of measurement error across the estimated theta values

5.3 SEM and Mean Absolute Error (MAE)

Code

sim_data %>% 
  mutate(absdiff = abs(est_thetas - input_thetas)) %>% 
  pivot_longer(cols = c(absdiff,est_sem)) %>% 
  mutate(name = dplyr::recode(name, "est_sem" = "SEM",
                       "absdiff" = "MAE")) %>% 
  
  ggplot(aes(x = est_thetas, y = value, color = name, group = name)) +
  geom_point() +
  guides(fill = "none") +
  labs(x = "Estimated theta value",
       y = "Error")

Code

sim_data %>% 
  mutate(absdiff = abs(est_thetas - input_thetas)) %>% 
  summarise(mae = mean(absdiff))

        mae
1 0.4073037

Code

mean(sim_data$est_sem)

[1] 0.5145337

Code

sim_data %>% 
  select(starts_with("i")) %>% 
  select(!input_thetas) %>% 
  RItif(samplePSI = TRUE)

Figure 6: Test Information Function for simulated response data

6 Regression with measurement error

This is just exploratory, and I am no expert in Bayesian models, so please if something should be modified or if there are other models/tools that should be considered.

There is, as far as I know, no implementation of a frequentist regression tool in R that allows one to specify a vector of measurement errors. I have been looking into this recently and found some interesting R packages (galamm, mecor, and simex), but none seem to be (easily) applicable for this type of data and setup. There are also reasonable arguments for using regression models that accomodate heteroscedastic errors (Wang, Xu, and Zhang 2019) and this approach might be added to this text in a future revision.

Wang and colleagues (2019) state that:

It is known that ignoring the measurement error in \(\hat{\theta}\) when \(\hat{\theta}\) is treated as a dependent variable still yields a consistent and unbiased estimate of fixed effects (i.e., \(\hat{\beta}\)), but the standard error of \(\hat{\beta}\) will be inflated, and the random effects estimates (i.e., \(\hat{\displaystyle\sum}_u\) ) as well as residual variances will be distorted. (p.691)

We’ll simulate new data with two “time points”, n = 200 each, that have mean theta locations 1 logit apart, SD = 1.5 for both groups, and ICC = 0.5. Then we’ll run a mixed model to see how well we can recover the input parameters. Since we only have two time points, we’ll use random intercepts for id.

Code

# randomly generated normal distributions of thetas with correlation
data <- rnorm_multi(n = 200, 
                    mu = c(-0.5, +0.5),
                    sd = c(1.5, 1.5),
                    r = 0.5, 
                    varnames = c("pre", "post"),
                    empirical = FALSE)

# light wrangling to get long format and an id variable
data_long <- data %>% 
  add_column(id = 1:(nrow(.))) %>% 
  pivot_longer(cols = c("pre", "post"), 
               names_to = "time", 
               values_to = "input_theta") %>% 
  mutate(time = dplyr::recode(time, "pre" = 0, "post" = 1))

# simulate response data based on thetas and items above
sim_data_g <- SimPartialScore(
  deltaslist = itemlist,
  thetavec = data_long$input_theta
) %>%
  as.data.frame()

# estimate theta values.
sim_data_g$est_theta <- RIestThetas(sim_data_g, 
                                    itemParams = item_params, 
                                    theta_range = c(-5,5))
# estimate SEM
sim_data_g$est_sem <- 
  map_vec(sim_data_g$est_theta, ~ semTheta(thEst = .x,
                                           it = item_params,
                                           model = "PCM",
                                           method = "WL",
                                           range = c(-5, 5)))

sim_data_g <- cbind(sim_data_g,data_long)

Code

sim_data_g %>% 
  group_by(time) %>% 
  summarise(mean_input = mean(input_theta),
            mean_estimated = mean(est_theta),
            median_input = median(input_theta),
            median_estimated = median(est_theta),
            sd_input = sd(input_theta),
            sd_estimated = sd(est_theta),
            mad_input = mad(input_theta),
            mad_estimated = mad(est_theta))

# A tibble: 2 × 9
   time mean_input mean_estimated median_input median_estimated sd_input
  <dbl>      <dbl>          <dbl>        <dbl>            <dbl>    <dbl>
1     0     -0.598         -0.559       -0.604           -0.516     1.59
2     1      0.498          0.517        0.474            0.513     1.52
# ℹ 3 more variables: sd_estimated <dbl>, mad_input <dbl>, mad_estimated <dbl>

6.1 Plot your data

Code

fig_est_theta_rain <- sim_data_g %>% 
  ggplot(aes(x = factor(time), y = est_theta, group = time)) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.8) +
  geom_rain(id.long.var = "id",
            fill = "lightblue", alpha = 0.7) +
  scale_y_continuous(limits = c(-5,5), breaks = seq(-5,5,1)) +
  theme_rise()

fig_input_theta_rain <- sim_data_g %>% 
  ggplot(aes(x = factor(time), y = input_theta, group = time)) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.8) +
  geom_rain(id.long.var = "id",
            fill = "lightpink", alpha = 0.7) +
  scale_y_continuous(limits = c(-5,5), breaks = seq(-5,5,1)) +
  theme_rise()

fig_input_theta_rain / fig_est_theta_rain +
  plot_layout(axes = "collect_x")

Figure 7: Comparison of simulated and estimated theta values

6.2 Models

We’ll first make some models without measurement error, both with lmer() and brm(). First with the input parameters to get a reference model for both.

I learnt about measurement error and brms in this excellent online book by Solomon Kurz: https://bookdown.org/content/4857/missing-data-and-other-opportunities.html#measurement-error

There are three different ways to specify measurement error, se(), mi(), and me(), where the last one seems to only be relevant for when you have measurement error on both predictors and outcome. Thus, we will use the two others.

6.2.1 lme4

Code

# reference model with input thetas
lm_fit0 <- lmer(input_theta ~ time + (1 | id),
              data = sim_data_g, 
              REML = TRUE)

# estimated thetas
lm_fit1 <- lmer(est_theta ~ time + (1 | id),
              data = sim_data_g, 
              REML = TRUE)

6.2.2 brms

Code

# put the data into a `list()`
d <- list(
  input_theta = sim_data_g$input_theta,
  est_theta = sim_data_g$est_theta,
  est_sem = sim_data_g$est_sem,
  time = sim_data_g$time,
  id = sim_data_g$id
)

# Input/simulated thetas
b0_fit <- 
  brm(data = d, 
      family = gaussian,
      input_theta ~ time + (1 | id),
      prior = c(prior(normal(0, 3), class = Intercept),
                prior(normal(0, 2), class = b),
                prior(exponential(1), class = sigma),
                prior(exponential(1 / 0.463), class = sd)
                ),
      iter = 2000, warmup = 1000, 
      cores = 4, chains = 4,
      seed = 15)

# With estimated thetas

b1_fit <- 
  brm(data = d, 
      family = gaussian,
      est_theta ~ time + (1 | id),
      prior = c(prior(normal(0, 3), class = Intercept),
                prior(normal(0, 2), class = b),
                prior(exponential(1), class = sigma),
                prior(exponential(1 / 0.463), class = sd)
                ),
      iter = 2000, warmup = 1000, 
      cores = 4, chains = 4,
      seed = 15)


# With measurement error using se()
b2_fit <- 
  brm(data = d, 
      family = gaussian,
      est_theta | se(est_sem, sigma = TRUE) ~ time + (1 | id),
      prior = c(prior(normal(0, 3), class = Intercept),
                prior(normal(0, 2), class = b),
                prior(exponential(1), class = sigma),
                prior(exponential(1 / 0.463), class = sd)
                ),
      iter = 2000, warmup = 1000, 
      cores = 4, chains = 4,
      seed = 15)

# using mi()

b3_fit <- 
  brm(data = d, 
      family = gaussian,
      est_theta | mi(est_sem) ~ time + (1 | id),
      prior = c(prior(normal(0, 3), class = Intercept),
                prior(normal(0, 2), class = b),
                prior(exponential(1), class = sigma),
                prior(exponential(1 / 0.463), class = sd)
                ),
      iter = 2000, warmup = 1000, 
      cores = 4, chains = 4,
      seed = 15)

6.3 Results

lmer() models.

Code

modelsummary(list("Reference model" = lm_fit0,
                  "Estimated thetas" = lm_fit1))

Table 3: Results from lmer() models

	Reference model	Estimated thetas
(Intercept)	−0.598	−0.559
	(0.110)	(0.113)
time	1.096	1.076
	(0.100)	(0.111)
SD (Intercept id)	1.188	1.144
SD (Observations)	1.004	1.106
Num.Obs.	400	400
R2 Marg.	0.111	0.103
R2 Cond.	0.629	0.567
AIC	1417.2	1456.0
BIC	1433.1	1472.0
ICC	0.6	0.5
RMSE	0.80	0.90

Code

modelplot(list("Reference model" = lm_fit0,
               "Estimated thetas" = lm_fit1))

brm() models.

Code

modelsummary(list("Reference model" = b0_fit,
                  "Estimated thetas" = b1_fit,
                  "With se()" = b2_fit,
                  "With mi()" = b3_fit))

Table 4: Results from brm() models

	Reference model	Estimated thetas	With se()	With mi()
b_Intercept	−0.602	−0.560		−0.429
b_time	1.093	1.070		0.948
sigma	1.010	1.111		0.866
(Intercept)			−0.430
			(0.103)
time			0.946
			(0.103)
sd_id__Intercept	1.178	1.130		1.021
sd__(Intercept)			1.023
			(0.083)
sd__Observation			0.864
			(0.062)
Num.Obs.	400	400	400	400
R2	0.625	0.561	0.498	0.498
R2 Adj.	0.421	0.354	0.337	−0.040
R2 Marg.	0.110	0.102	0.081	0.082
ICC	0.6	0.5	0.6	0.6
ELPD	−662.7	−692.3	−673.8	−928.5
ELPD s.e.	13.4	14.1	13.9	44.6
LOOIC	1325.3	1384.6	1347.6	1857.0
LOOIC s.e.	26.8	28.3	27.9	89.3
WAIC	1296.1	1362.1	1326.1	1731.4
RMSE	0.80	0.90	0.97	0.97
r2.adjusted.marginal	0.0921916535951018	0.0873750746366989	0.0873077880757909	0.0737170407593023

Code

modelplot(list("Reference model" = b0_fit,
                  "Estimated thetas" = b1_fit,
                  "With se()" = b2_fit,
                  "With mi()" = b3_fit))

7 Software used

Code

library(grateful)
pkgs <- cite_packages(cite.tidyverse = TRUE, 
                      output = "table",
                      bib.file = "grateful-refs.bib",
                      include.RStudio = TRUE,
                      out.dir = getwd())

formattable(pkgs,
            table.attr = 'class=\"table table-striped\" style="font-size: 15px; font-family: Lato; width: 80%"')

Package	Version	Citation
base	4.3.3	R Core Team (2024)
brms	2.20.4	Bürkner (2017); Bürkner (2018); Bürkner (2021)
broom.mixed	0.2.9.4	Bolker and Robinson (2022)
catR	3.17	Magis and Raîche (2012); Magis and Barrada (2017)
eRm	1.0.5	Mair and Hatzinger (2007b); Mair and Hatzinger (2007a); Hatzinger and Rusch (2009); Rusch, Maier, and Hatzinger (2013); Koller, Maier, and Hatzinger (2015); Debelak and Koller (2019)
extrafont	0.19	Chang (2023)
faux	1.2.1	DeBruine (2023)
ggrain	0.0.4	Allen et al. (2021)
janitor	2.2.0	Firke (2023)
lme4	1.1.35.1	Bates et al. (2015)
modelsummary	1.4.3	Arel-Bundock (2022)
patchwork	1.2.0	Pedersen (2024)
RISEkbmRasch	0.1.33.3	Johansson (2024)
rmarkdown	2.25	Xie, Allaire, and Grolemund (2018); Xie, Dervieux, and Riederer (2020); Allaire et al. (2023)
tidyverse	2.0.0	Wickham et al. (2019)
tinytable	0.2.1	Arel-Bundock (2024)

Code

sessionInfo()

R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.4.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Stockholm
tzcode source: internal

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] grateful_0.2.4        broom.mixed_0.2.9.4   modelsummary_1.4.3   
 [4] brms_2.20.4           Rcpp_1.0.12           lme4_1.1-35.1        
 [7] Matrix_1.6-5          ggrain_0.0.4          faux_1.2.1           
[10] tinytable_0.2.1       janitor_2.2.0         readxl_1.4.3         
[13] RISEkbmRasch_0.1.33.3 hexbin_1.28.3         catR_3.17            
[16] glue_1.7.0            ggrepel_0.9.5         patchwork_1.2.0      
[19] reshape_0.8.9         matrixStats_1.2.0     psychotree_0.16-0    
[22] psychotools_0.7-3     partykit_1.2-20       mvtnorm_1.2-4        
[25] libcoin_1.0-10        psych_2.4.1           mirt_1.41            
[28] lattice_0.22-5        eRm_1.0-5             lubridate_1.9.3      
[31] forcats_1.0.0         stringr_1.5.1         dplyr_1.1.4          
[34] purrr_1.0.2           readr_2.1.5           tidyr_1.3.1          
[37] tibble_3.2.1          ggplot2_3.4.4         tidyverse_2.0.0      
[40] kableExtra_1.3.4      formattable_0.2.1     car_3.1-2            
[43] carData_3.0-5        

loaded via a namespace (and not attached):
  [1] shinythemes_1.2.0     splines_4.3.3         later_1.3.2          
  [4] ggpp_0.5.6            cellranger_1.1.0      xts_0.13.1           
  [7] rpart_4.1.23          lifecycle_1.0.4       StanHeaders_2.26.28  
 [10] processx_3.8.3        globals_0.16.2        MASS_7.3-60.0.1      
 [13] insight_0.19.7        crosstalk_1.2.1       backports_1.4.1      
 [16] magrittr_2.0.3        rmarkdown_2.25        yaml_2.3.8           
 [19] httpuv_1.6.13         pkgbuild_1.4.3        pbapply_1.7-2        
 [22] minqa_1.2.6           multcomp_1.4-25       abind_1.4-5          
 [25] rvest_1.0.3           TH.data_1.1-2         tensorA_0.36.2.1     
 [28] sandwich_3.1-0        inline_0.3.19         listenv_0.9.0        
 [31] vegan_2.6-4           parallelly_1.36.0     bridgesampling_1.1-2 
 [34] svglite_2.1.3         permute_0.9-7         codetools_0.2-19     
 [37] DT_0.31               xml2_1.3.6            tidyselect_1.2.0     
 [40] bayesplot_1.10.0      farver_2.1.1          base64enc_0.1-3      
 [43] webshot_0.5.5         jsonlite_1.8.8        gghalves_0.1.4       
 [46] ellipsis_0.3.2        Formula_1.2-5         survival_3.5-8       
 [49] emmeans_1.9.0         systemfonts_1.0.5     tools_4.3.3          
 [52] mnormt_2.1.1          gridExtra_2.3         xfun_0.43            
 [55] mgcv_1.9-1            distributional_0.3.2  loo_2.6.0            
 [58] withr_3.0.0           fastmap_1.1.1         boot_1.3-29          
 [61] fansi_1.0.6           shinyjs_2.1.0         callr_3.7.3          
 [64] digest_0.6.34         estimability_1.4.1    timechange_0.2.0     
 [67] R6_2.5.1              mime_0.12             colorspace_2.1-0     
 [70] gtools_3.9.5          markdown_1.12         inum_1.0-5           
 [73] threejs_0.3.3         utf8_1.2.4            generics_0.1.3       
 [76] renv_1.0.3            httr_1.4.7            htmlwidgets_1.6.4    
 [79] pkgconfig_2.0.3       dygraphs_1.1.1.6      gtable_0.3.4         
 [82] furrr_0.3.1           htmltools_0.5.7       scales_1.3.0         
 [85] posterior_1.5.0       snakecase_0.11.1      knitr_1.45           
 [88] rstudioapi_0.15.0     tzdb_0.4.0            reshape2_1.4.4       
 [91] coda_0.19-4           checkmate_2.3.1       nlme_3.1-164         
 [94] curl_5.2.0            nloptr_2.0.3          zoo_1.8-12           
 [97] parallel_4.3.3        miniUI_0.1.1.1        pillar_1.9.0         
[100] vctrs_0.6.5           shinystan_2.6.0       promises_1.2.1       
[103] xtable_1.8-4          Deriv_4.1.3           cluster_2.1.6        
[106] dcurver_0.9.2         GPArotation_2023.11-1 evaluate_0.23        
[109] cli_3.6.2             compiler_4.3.3        rlang_1.1.3          
[112] rstantools_2.3.1.1    labeling_0.4.3        ps_1.7.5             
[115] plyr_1.8.9            stringi_1.8.3         rstan_2.32.3         
[118] viridisLite_0.4.2     QuickJSR_1.0.9        tables_0.9.17        
[121] munsell_0.5.0         colourpicker_1.3.0    Brobdingnag_1.2-9    
[124] V8_4.4.1              hms_1.1.3             future_1.33.1        
[127] shiny_1.8.0           broom_1.0.5           gt_0.10.0            
[130] igraph_2.0.2          RcppParallel_5.1.7    polynom_1.4-1

8 References

Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2023. rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.

Allen, Micah, Davide Poggiali, Kirstie Whitaker, Tom Rhys Marshall, Jordy van Langen, and Rogier A. Kievit. 2021. “Raincloud Plots: A Multi-Platform Tool for Robust Data Visualization [Version 2; Peer Review: 2 Approved].” Wellcome Open Research 4 (63). https://doi.org/10.12688/wellcomeopenres.15191.2.

Arel-Bundock, Vincent. 2022. “modelsummary: Data and Model Summaries in R.” Journal of Statistical Software 103 (1): 1–23. https://doi.org/10.18637/jss.v103.i01.

———. 2024. tinytable: Simple and Configurable Tables in “HTML,” “LaTeX,” “Markdown,” “Word,” “PNG,” “PDF,” and “Typst” Formats. https://CRAN.R-project.org/package=tinytable.

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.

Bolker, Ben, and David Robinson. 2022. broom.mixed: Tidying Methods for Mixed Models. https://CRAN.R-project.org/package=broom.mixed.

Bürkner, Paul-Christian. 2017. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.

———. 2018. “Advanced Bayesian Multilevel Modeling with the R Package brms.” The R Journal 10 (1): 395–411. https://doi.org/10.32614/RJ-2018-017.

———. 2021. “Bayesian Item Response Modeling in R with brms and Stan.” Journal of Statistical Software 100 (5): 1–54. https://doi.org/10.18637/jss.v100.i05.

Chang, Winston. 2023. extrafont: Tools for Using Fonts. https://CRAN.R-project.org/package=extrafont.

Cortina, Jose M., Zitong Sheng, Sheila K. Keener, Kathleen R. Keeler, Leah K. Grubb, Neal Schmitt, Scott Tonidandel, Karoline M. Summerville, Eric D. Heggestad, and George C. Banks. 2020. “From Alpha to Omega and Beyond! A Look at the Past, Present, and (Possible) Future of Psychometric Soundness in the Journal of Applied Psychology.” Journal of Applied Psychology 105 (12): 1351–81. https://doi.org/10.1037/apl0000815.

Debelak, Rudolf, and Ingrid Koller. 2019. “Testing the Local Independence Assumption of the Rasch Model with Q3-Based Nonparametric Model Tests.” Applied Psychological Measurement 44. https://doi.org/10.1177/0146621619835501.

DeBruine, Lisa. 2023. faux: Simulation for Factorial Designs. Zenodo. https://doi.org/10.5281/zenodo.2669586.

Firke, Sam. 2023. janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.

Flora, David B. 2020. “Your Coefficient Alpha Is Probably Wrong, but Which Coefficient Omega Is Right? A Tutorial on Using r to Obtain Better Reliability Estimates.” Advances in Methods and Practices in Psychological Science 3 (4): 484–501. https://doi.org/10.1177/2515245920951747.

Hatzinger, Reinhold, and Thomas Rusch. 2009. “IRT Models with Relaxed Assumptions in eRm: A Manual-Like Instruction.” Psychology Science Quarterly 51.

Jiao, Hong. 2023. “Comparison of Different Approaches to Dealing with Guessing in Rasch Modeling.” Psychological Test and Assessment Modeling 64 (1): 65–86.

Johansson, Magnus. 2024. RISEkbmRasch: Psychometric Analysis in r with Rasch Measurement Theory. https://github.com/pgmj/RISEkbmRasch.

Koller, Ingrid, Marco Maier, and Reinhold Hatzinger. 2015. “An Empirical Power Analysis of Quasi-Exact Tests for the Rasch Model: Measurement Invariance in Small Samples.” Methodology 11. https://doi.org/10.1027/1614-2241/a000090.

Kreiner, Svend, and Karl Bang Christensen. 2012. “Person Parameter Estimation and Measurement in Rasch Models.” In, 63–78. John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118574454.ch4.

Kroc, Edward, and Bruno D. Zumbo. 2020. “A Transdisciplinary View of Measurement Error Models and the Variations of x=t+EX=t+e.” Journal of Mathematical Psychology 98 (September): 102372. https://doi.org/10.1016/j.jmp.2020.102372.

Magis, David, and Juan Ramon Barrada. 2017. “Computerized Adaptive Testing with R: Recent Updates of the Package catR.” Journal of Statistical Software, Code Snippets 76 (1): 1–19. https://doi.org/10.18637/jss.v076.c01.

Magis, David, and Gilles Raîche. 2012. “Random Generation of Response Patterns Under Computerized Adaptive Testing with the R Package catR.” Journal of Statistical Software 48 (8): 1–31. https://doi.org/10.18637/jss.v048.i08.

Mair, Patrick, and Reinhold Hatzinger. 2007a. “CML Based Estimation of Extended Rasch Models with the eRm Package in r.” Psychology Science 49. https://doi.org/10.18637/jss.v020.i09.

———. 2007b. “Extended Rasch Modeling: The eRm Package for the Application of IRT Models in r.” Journal of Statistical Software 20. https://doi.org/10.18637/jss.v020.i09.

McNeish, Daniel. 2018. “Thanks Coefficient Alpha, We’ll Take It from Here.” Psychological Methods 23 (3): 412–33. https://doi.org/10.1037/met0000144.

Pedersen, Thomas Lin. 2024. patchwork: The Composer of Plots. https://CRAN.R-project.org/package=patchwork.

R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Rozental, Alexander, David Forsström, and Magnus Johansson. 2023. “A Psychometric Evaluation of the Swedish Translation of the Perceived Stress Scale: A Rasch Analysis.” BMC Psychiatry 23 (1): 690. https://doi.org/10.1186/s12888-023-05162-4.

Rusch, Thomas, Marco Maier, and Reinhold Hatzinger. 2013. “Linear Logistic Models with Relaxed Assumptions in r.” In Algorithms from and for Nature and Life, edited by Berthold Lausen, Dirk van den Poel, and Alfred Ultsch. Studies in Classification, Data Analysis, and Knowledge Organization. New York: Springer. https://doi.org/10.1007/978-3-319-00035-0_34.

Samejima, Fumiko. 1994. “Estimation of Reliability Coefficients Using the Test Information Function and Its Modifications.” Applied Psychological Measurement 18 (3): 229–44. https://doi.org/10.1177/014662169401800304.

Sijtsma, Klaas. 2008. “On the Use, the Misuse, and the Very Limited Usefulness of Cronbach’s Alpha.” Psychometrika 74 (1): 107. https://doi.org/10.1007/s11336-008-9101-0.

Stemler, Steven E., and Adam Naples. 2021. “Rasch Measurement v. Item Response Theory: Knowing When to Cross the Line.” Practical Assessment, Research, and Evaluation 26 (11). https://doi.org/10.7275/V2GD-4441.

Wang, Chun, Gongjun Xu, and Xue Zhang. 2019. “Correction for Item Response Theory Latent Trait Measurement Error in Linear Mixed Effects Models.” Psychometrika 84 (3): 673–700. https://doi.org/10.1007/s11336-019-09672-7.

Warm, Thomas A. 1989. “Weighted Likelihood Estimation of Ability in Item Response Theory.” Psychometrika 54 (3): 427–50. https://doi.org/10.1007/BF02294627.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.

Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.

Footnotes

This is because the discrimination parameter (2PL, GRM, etc) is also sample specific/dependent. See Stemler & Naples (2021), figure 8, for a nice explanation and case example. One exception exists for the 3PL model: if the pseudo-guessing parameter is constant across all items, sample independence is upheld (Jiao 2023).↩︎
Except CME/BME methods (Kroc and Zumbo 2020), but I have never seen those applied or reported in psychometric CTT papers.↩︎

Reuse

CC BY 4.0