5  GAD-7

Published

2025-06-16

Code
library(tidyverse)
library(janitor)
library(skimr)
library(easyRasch)
library(haven)

source("ki.R")
theme_set(theme_ki())

### some commands exist in multiple packages, here we define preferred ones that are frequently used
select <- dplyr::select
count <- dplyr::count
recode <- car::recode
rename <- dplyr::rename

d.all <- read_spss("data/SUVRTW_LONG 240528 TOTAL.sav") 

d <- d.all %>% 
  select(starts_with("GAD7"),time) %>% 
  na.omit()

df <- d.all %>% 
  select(starts_with("GAD7"),time,PID) %>% 
  na.omit()

# subset and remove DIF variable(s)
dif.time <- d$time
d$time <- NULL

itemlabels <- read_csv("data/spss_labels.csv") %>% 
  filter(str_detect(itemnr,"GAD"))

5.1 Background

We have longitudinal data from 10 measurements.

We’ll select the datapoint with best targeting for our psychometric analysis.

Code
plots <- d %>% 
  add_column(time = dif.time) %>% 
  na.omit() %>% 
  split(.$time) %>% 
  map(~ RItargeting(.x %>% select(!time)) + plot_annotation(title = paste0("Timepoint ",.x$time)))
Code
plots[[1]]
plots[[2]]
plots[[3]]
plots[[4]]
plots[[5]]
plots[[6]]
plots[[7]]
plots[[8]]
plots[[9]]
plots[[10]]

Looks like timepoint 2 is best targeted, and has a sample size of 163 complete responses.

Code
d <- d %>% 
  add_column(time = dif.time) %>% 
  filter(time == 2) %>% 
  na.omit() %>% 
  select(!time)

5.1.1 Descriptives

5.2 Rasch analysis 1

The eRm package, which uses Conditional Maximum Likelihood (CML) estimation, will be used primarily. For this analysis, the Partial Credit Model will be used.

itemnr item
GAD7_1 Känt dig nervös, ängslig eller väldigt stressad
GAD7_2 Inte kunnat sluta oroa dig eller kontrollera din oro
GAD7_3 Oroat dig för mycket för olika saker
GAD7_4 Haft svårt att slappna av
GAD7_5 Varit så rastlös att du har haft svårt att sitta still
GAD7_6 Blivit lätt irriterad eller retlig
GAD7_7 Känt dig rädd för att något hemskt skulle hända
Code
simfit1 <- RIgetfit(d, iterations = 100, cpu = 8) 
RIitemfit(d, simfit1)
Item InfitMSQ Infit thresholds OutfitMSQ Outfit thresholds Infit diff Outfit diff Relative location
GAD7_1 0.693 [0.735, 1.213] 0.642 [0.704, 1.42] 0.042 0.062 -0.61
GAD7_2 0.601 [0.834, 1.242] 0.584 [0.832, 1.26] 0.233 0.248 0.42
GAD7_3 0.651 [0.791, 1.314] 0.608 [0.758, 1.302] 0.14 0.15 0.06
GAD7_4 0.799 [0.774, 1.256] 0.758 [0.751, 1.359] no misfit no misfit -0.71
GAD7_5 1.576 [0.773, 1.178] 1.94 [0.712, 1.475] 0.398 0.465 1.35
GAD7_6 1.281 [0.752, 1.396] 1.406 [0.74, 1.488] no misfit no misfit 0.52
GAD7_7 1.334 [0.811, 1.202] 1.328 [0.772, 1.534] 0.132 no misfit 1.02
Note:
MSQ values based on conditional calculations (n = 163 complete cases).
Simulation based thresholds from 100 simulated datasets.
Code
Item Observed value Model expected value Absolute difference Adjusted p-value (BH) Statistical significance level Location Relative location
GAD7_1 0.77 0.64 0.13 0.001 *** -0.91 -0.61
GAD7_2 0.81 0.62 0.19 0.000 *** 0.12 0.42
GAD7_3 0.81 0.65 0.16 0.000 *** -0.23 0.06
GAD7_4 0.73 0.64 0.09 0.072 . -1.00 -0.71
GAD7_5 0.42 0.62 0.20 0.020 * 1.06 1.35
GAD7_6 0.51 0.62 0.11 0.098 . 0.23 0.52
GAD7_7 0.51 0.62 0.11 0.096 . 0.73 1.02
Code

PCA of Rasch model residuals

Eigenvalues Proportion of variance
1.94 30.8%
1.48 23.4%
1.15 21.7%
0.90 10.2%
0.79 7.2%
Code
simcor1 <- RIgetResidCor(d, iterations = 500, cpu = 8)
RIresidcorr(d, cutoff = simcor1$p99)
GAD7_1 GAD7_2 GAD7_3 GAD7_4 GAD7_5 GAD7_6 GAD7_7
GAD7_1
GAD7_2 0.27
GAD7_3 0.14 0.21
GAD7_4 0.16 0.04 -0.11
GAD7_5 -0.36 -0.32 -0.44 -0.07
GAD7_6 -0.26 -0.32 -0.13 -0.23 -0.08
GAD7_7 -0.28 -0.16 -0.06 -0.37 -0.18 -0.18
Note:
Relative cut-off value is 0.144, which is 0.274 above the average correlation (-0.13).
Correlations above the cut-off are highlighted in red text.
Code

Code
mirt(d, model=1, itemtype='Rasch', verbose = FALSE) %>% 
  plot(type="trace", as.table = TRUE, 
       theta_lim = c(-6,6))

Code
# for fewer items or a more magnified figure, use:
#RIitemCats(df)
Code
# increase fig-height above as needed, if you have many items
RItargeting(d)

Code
iarm::score_groups(as.data.frame(d)) %>% 
  as.data.frame(nm = "score_group") %>% 
  dplyr::count(score_group)
  score_group  n
1           1 79
2           2 84
Code
dif_plots <- d %>% 
  add_column(dif = iarm::score_groups(.)) %>% 
  split(.$dif) %>% # split the data using the DIF variable
  map(~ RItileplot(.x %>% dplyr::select(!dif)) + labs(title = .x$dif))
dif_plots[[1]] + dif_plots[[2]]

Item 1 has residual correlations with item 2 and 4, while 2 and 3 are also locally dependent. Items 1-3 are all overfit, whileiItems 5-7 are underfit (high item infit, particularly item 5 and to a lesser degree 7). This indicates two dimensions in data. The loadings plot does not show this as clearly, though, with item 5 being the only item deviating strongly.

All items have issues with the second highest response category having very small distance from the adjancent item category threshold locations.

The sample is too small to create score groups for DIF analysis based on low/high scores.

5.3 Rasch analysis 2

While two dimensions seems likely, let’s see if we can find a working set of unidimensional items based on all 7. First, we remove item 5, leaving other issues for later.

Code
d.backup <- d

d$GAD7_5 <- NULL
itemnr item
GAD7_1 Känt dig nervös, ängslig eller väldigt stressad
GAD7_2 Inte kunnat sluta oroa dig eller kontrollera din oro
GAD7_3 Oroat dig för mycket för olika saker
GAD7_4 Haft svårt att slappna av
GAD7_6 Blivit lätt irriterad eller retlig
GAD7_7 Känt dig rädd för att något hemskt skulle hända
Code
simfit1 <- RIgetfit(d, iterations = 100, cpu = 8) 

RIitemfit(d, simfit1)
Item InfitMSQ Infit thresholds OutfitMSQ Outfit thresholds Infit diff Outfit diff Relative location
GAD7_1 0.718 [0.73, 1.312] 0.68 [0.732, 1.417] 0.012 0.052 -0.64
GAD7_2 0.612 [0.752, 1.275] 0.606 [0.741, 1.314] 0.14 0.135 0.50
GAD7_3 0.631 [0.74, 1.304] 0.574 [0.713, 1.365] 0.109 0.139 0.11
GAD7_4 0.975 [0.793, 1.256] 1.012 [0.725, 1.243] no misfit no misfit -0.76
GAD7_6 1.501 [0.793, 1.22] 1.633 [0.764, 1.369] 0.281 0.264 0.62
GAD7_7 1.514 [0.738, 1.32] 1.452 [0.694, 1.628] 0.194 no misfit 1.16
Note:
MSQ values based on conditional calculations (n = 163 complete cases).
Simulation based thresholds from 100 simulated datasets.
Code
Item Observed value Model expected value Absolute difference Adjusted p-value (BH) Statistical significance level Location Relative location
GAD7_1 0.81 0.69 0.12 0.001 *** -0.80 -0.64
GAD7_2 0.82 0.67 0.15 0.000 *** 0.33 0.50
GAD7_3 0.85 0.70 0.15 0.000 *** -0.05 0.11
GAD7_4 0.71 0.69 0.02 0.721 -0.93 -0.76
GAD7_6 0.49 0.67 0.18 0.022 * 0.45 0.62
GAD7_7 0.52 0.66 0.14 0.037 * 1.00 1.16
Code

PCA of Rasch model residuals

Eigenvalues Proportion of variance
1.81 33.7%
1.36 30.8%
1.10 16.1%
0.88 10.1%
0.84 9.1%
Code
simcor1 <- RIgetResidCor(d, iterations = 500, cpu = 8)
RIresidcorr(d, cutoff = simcor1$p99)
GAD7_1 GAD7_2 GAD7_3 GAD7_4 GAD7_6 GAD7_7
GAD7_1
GAD7_2 0.18
GAD7_3 -0.01 0.08
GAD7_4 0.1 -0.01 -0.21
GAD7_6 -0.33 -0.39 -0.2 -0.23
GAD7_7 -0.37 -0.25 -0.15 -0.4 -0.19
Note:
Relative cut-off value is 0.115, which is 0.274 above the average correlation (-0.158).
Correlations above the cut-off are highlighted in red text.
Code

Code
mirt(d, model=1, itemtype='Rasch', verbose = FALSE) %>% 
  plot(type="trace", as.table = TRUE, 
       theta_lim = c(-6,6))

Code
# increase fig-height above as needed, if you have many items
RItargeting(d)

Only items 1 and 2 have a residual correlation now. The pattern in underfit and overfit items is more clearly seen in the loadings/location plot, with items 6 and 7 deviating. This confirms the two-dimensional structure, and we’ll move to look at items 1-4 and 4-7 separately.

5.4 Two separate dimensions

The initial analysis with all 7 items indicated that items 1-3 have low item fit (overfit to the Rasch model) strong residual correlations amongst them. This could mean that these items work well together, but likely represent another dimension than items 5-7, which in turn all showed high item fit (underfit).

We’ll make an analysis of this setup, including item 4 with both sets.

Code
d1 <- d[,1:4]
d2 <- d.backup[,4:7]

5.5 Items 1-4, worry

Code
d <- d1
itemnr item
GAD7_1 Känt dig nervös, ängslig eller väldigt stressad
GAD7_2 Inte kunnat sluta oroa dig eller kontrollera din oro
GAD7_3 Oroat dig för mycket för olika saker
GAD7_4 Haft svårt att slappna av
Code
simfit1 <- RIgetfit(d, iterations = 100, cpu = 8) 

RIitemfit(d, simfit1)
Item InfitMSQ Infit thresholds OutfitMSQ Outfit thresholds Infit diff Outfit diff Relative location
GAD7_1 0.845 [0.755, 1.251] 0.725 [0.696, 1.523] no misfit no misfit -0.79
GAD7_2 0.781 [0.845, 1.25] 0.739 [0.742, 1.381] 0.064 0.003 0.88
GAD7_3 1.107 [0.686, 1.298] 0.891 [0.618, 1.735] no misfit no misfit 0.30
GAD7_4 1.264 [0.779, 1.255] 1.314 [0.609, 1.36] 0.009 no misfit -1.03
Note:
MSQ values based on conditional calculations (n = 163 complete cases).
Simulation based thresholds from 100 simulated datasets.
Code
Item Observed value Model expected value Absolute difference Adjusted p-value (BH) Statistical significance level Location Relative location
GAD7_1 0.88 0.83 0.05 0.177 -0.63 -0.79
GAD7_2 0.88 0.83 0.05 0.214 1.04 0.88
GAD7_3 0.84 0.85 0.01 0.824 0.46 0.30
GAD7_4 0.79 0.82 0.03 0.577 -0.87 -1.03
Code
simpca <- RIbootPCA(d,200, cpu = 8)
simpca$max
[1] 1.745
Code

PCA of Rasch model residuals

Eigenvalues Proportion of variance
1.49 44.8%
1.29 29.7%
1.20 25.1%
0.02 0.4%
Code
simcor1 <- RIgetResidCor(d, iterations = 500, cpu = 8)
RIresidcorr(d, cutoff = simcor1$p99)
GAD7_1 GAD7_2 GAD7_3 GAD7_4
GAD7_1
GAD7_2 -0.14
GAD7_3 -0.27 -0.14
GAD7_4 -0.24 -0.33 -0.45
Note:
Relative cut-off value is -0.031, which is 0.231 above the average correlation (-0.262).
Correlations above the cut-off are highlighted in red text.
Code

Code
mirt(d, model=1, itemtype='Rasch', verbose = FALSE) %>% 
  plot(type="trace", as.table = TRUE, 
       theta_lim = c(-8,8))

Code
# increase fig-height above as needed, if you have many items
RItargeting(d)

Item-restscore looks good. Item infit shows slight overfit item 2. No residual correlations.

5.5.1 Reliability

Code
RItif(d, samplePSI = T)

TIF is low while PSI is ok at 0.82.

5.5.2 DIF

Code
d <- d.all %>% 
  select(GAD7_1:GAD7_4,time) %>% 
  na.omit()

dif.time <- d$time
d$time <- NULL
RIpartgamDIF(d,dif.time)
[1] "No statistically significant DIF found."
Code
RIdifTable(d,dif.time)

Item 2 3 Mean location StDev MaxDiff
GAD7_1 -0.693 -0.388 -0.540 0.216 0.306
GAD7_2 0.851 1.129 0.990 0.196 0.278
GAD7_3 0.304 0.256 0.280 0.034 0.048
GAD7_4 -0.461 -0.997 -0.729 0.379 0.536

5.5.3 Item parameters

Code
RIitemparams(d1, output = "file", filename = "GAD4_itemparams.csv")
RIscoreSE(d1)
Ordinal sum score Logit score Logit std.error
0 -6.626 0.810
1 -5.094 1.073
2 -3.992 1.099
3 -2.879 1.152
4 -1.353 1.114
5 -0.036 1.011
6 0.757 0.896
7 1.401 0.818
8 1.983 0.795
9 2.560 0.818
10 3.202 0.882
11 4.040 0.948
12 5.506 0.773
Code
RIscoreSE(d1, output = "figure")

5.5.4 Latent scores items 1-4

Code
itemps <- read_csv("GAD4_itemparams.csv") %>% 
  as.matrix()

d <- df %>% 
  select(GAD7_1:GAD7_4,PID,time) %>% 
  na.omit()

PID <- d$PID
dif.time <- d$time

d <- d %>% 
  select(!c(PID,time))

scores <- RIestThetasCATr(d, itemps, theta_range = c(-8,8))
hist(scores)

Code
summary(scores)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-6.6262 -2.8791 -0.0363 -0.2278  1.9828  5.5062 
Code
data.frame(PID = PID,
           time = dif.time,
           GAD4_score = scores) %>% 
  write_csv("GAD4_score.csv")

5.6 Items 4-7, anxiety

Code
d <- d2
itemnr item
GAD7_4 Haft svårt att slappna av
GAD7_5 Varit så rastlös att du har haft svårt att sitta still
GAD7_6 Blivit lätt irriterad eller retlig
GAD7_7 Känt dig rädd för att något hemskt skulle hända
Code
simfit1 <- RIgetfit(d, iterations = 100, cpu = 8) 
RIitemfit(d, simfit1)
Item InfitMSQ Infit thresholds OutfitMSQ Outfit thresholds Infit diff Outfit diff Relative location
GAD7_4 0.816 [0.833, 1.206] 0.788 [0.797, 1.382] 0.017 0.009 -0.59
GAD7_5 1.054 [0.817, 1.245] 1.034 [0.76, 1.487] no misfit no misfit 1.04
GAD7_6 0.982 [0.803, 1.166] 0.995 [0.819, 1.214] no misfit no misfit 0.36
GAD7_7 1.148 [0.793, 1.265] 1.055 [0.788, 1.253] no misfit no misfit 0.76
Note:
MSQ values based on conditional calculations (n = 163 complete cases).
Simulation based thresholds from 100 simulated datasets.
Code
Item Observed value Model expected value Absolute difference Adjusted p-value (BH) Statistical significance level Location Relative location
GAD7_4 0.57 0.47 0.10 0.366 -0.98 -0.59
GAD7_5 0.43 0.46 0.03 0.760 0.65 1.04
GAD7_6 0.49 0.46 0.03 0.760 -0.03 0.36
GAD7_7 0.42 0.47 0.05 0.760 0.37 0.76
Code
simpca <- RIbootPCA(d,200, cpu = 8)
simpca$max
[1] 1.735
Code

PCA of Rasch model residuals

Eigenvalues Proportion of variance
1.45 37.7%
1.32 34.4%
1.23 27.6%
0.01 0.3%
Code
simcor1 <- RIgetResidCor(d, iterations = 500, cpu = 8)
RIresidcorr(d, cutoff = simcor1$p99)
GAD7_4 GAD7_5 GAD7_6 GAD7_7
GAD7_4
GAD7_5 -0.11
GAD7_6 -0.15 -0.24
GAD7_7 -0.24 -0.34 -0.22
Note:
Relative cut-off value is 0.017, which is 0.234 above the average correlation (-0.217).
Correlations above the cut-off are highlighted in red text.
Code

Code
mirt(d, model=1, itemtype='Rasch', verbose = FALSE) %>% 
  plot(type="trace", as.table = TRUE, 
       theta_lim = c(-8,8))

Code
# increase fig-height above as needed, if you have many items
RItargeting(d)

Item 4 actually works quite well with items 5-7 too. However, it shows disordered thresholds in this item set, which it does not with items 1-3, so we will use it as a part of item set 1, “worry”.

Item thresholds have small distances or are disordered (2 items) in this item set, which is further indicated by the low reliability, even before merging response categories. We will not use this second set of items in any analysis.