R, Rasch, etc
  • Home
  • easyRasch vignette
  • About
  • Blog
    • Automating reports with Quarto
    • What’s in a latent trait score?
    • Data retrieval with R using API calls
    • Power analysis for multilevel models
    • Data wrangling for psychometrics in R
    • Simulation based cutoff values for Rasch item fit and residual correlations
    • Comparing Rasch packages/estimators
    • Rasch DIF magnitude & item split
    • Conditional Likelihood Ratio Test and sample size

Note: The RISEkbmRasch R package is now known as easyRasch.

Table of contents

  • 1 Getting started
    • 1.1 Loading data
    • 1.2 Itemlabels
    • 1.3 Demographics
  • 2 Descriptives
    • 2.1 Missing data
    • 2.2 Overall responses
    • 2.3 Item level descriptives
  • 3 Rasch analysis 1
    • 3.1 Analysis 1 comments
  • 4 Rasch analysis 2
    • 4.1 Analysis 2 comments
  • 5 DIF - differential item functioning
    • 5.1 Sex
    • 5.2 Age
    • 5.3 Group
    • 5.4 Sex and age
    • 5.5 LRT-based DIF
    • 5.6 Conditional ICC
    • 5.7 Logistic Ordinal Regression DIF
    • 5.8 Partial gamma DIF
  • 6 Rasch analysis 3
    • 6.1 Analysis 3 comments
  • 7 Reliability
  • 8 Person fit
    • 8.1 PerFit sample code
    • 8.2 Item fit without aberrant responses
  • 9 Item parameters
  • 10 Ordinal sum score to interval score
    • 10.1 Ordinal/interval figure
    • 10.2 Estimating interval level person scores
  • 11 Figure design
  • 12 Software used
  • 13 Additional credits
  • 14 Session info
  • 15 References

easyRasch vignette

  • Show All Code
  • Hide All Code

  • View Source

R package for Rasch analysis

Author
Affiliation

Magnus Johansson

RISE Research Institutes of Sweden

Published

March 28, 2025

This is an introduction to using the easyRasch R package. A changelog for package updates is available here.

Note

This packages was previously known as RISEkbmRasch.

Details on package installation are available at the package GitHub page.

If you are new to Rasch Measurement Theory, you may find this intro presentation useful: https://pgmj.github.io/RaschIRTlecture/slides.html

This vignette will walk through a sample analysis using an open dataset with polytomous questionnaire data. This will include some data wrangling to structure the item data and itemlabels, then provide examples of the different functions. The full source code of this document can be found either in this repository or by clicking on </> CODE at the top of this page. You should be able to use the source code “as is” and reproduce this document locally, as long as you have the required packages installed. This page and this website are built using the open source publishing tool Quarto.

One of the aims with this package is to simplify reproducible psychometric analysis to shed light on the measurement properties of a scale, questionnaire or test. In a paper recently made available as a preprint (Johansson et al., 2023), our research group propose that the basic aspects of a psychometric analysis should include information about:

  • Unidimensionality
  • Response categories
  • Invariance
  • Targeting
  • Measurement uncertainties (reliability)

We’ll include several ways to investigate these measurement properties, using Rasch Measurement Theory. There are also functions in the package less directly related to the criteria above, that will be demonstrated in this vignette.

Please note that this is just a sample analysis to showcase the R package. It is not intended as a “best practice” psychometric analysis example.

You can skip ahead to the Rasch analysis part in Section 3 if you are eager to look at the package output :)

There is a separate GitHub repository containing a template R-project to simplify using the easyRasch package when conducting a reproducible Rasch analysis in R: https://github.com/pgmj/RISEraschTemplate

1 Getting started

Since the package is intended for use with Quarto, this vignette has also been created with Quarto. A “template” .qmd file is available that can be useful to have handy for copy&paste when running a new analysis. You can also download a complete copy of the Quarto/R code to produce this document here.

Loading the easyRasch package will automatically load all the packages it depends on. However, it could be desirable to explicitly load all packages used, to simplify the automatic creation of citations for them, using the grateful package (see Section 12).

Code
library(easyRasch) # devtools::install_github("pgmj/easyRasch", dependencies = TRUE)
library(grateful)
library(ggrepel)
library(car)
library(kableExtra)
library(readxl)
library(tidyverse)
library(eRm)
library(iarm)
library(mirt)
library(psych)
library(ggplot2)
library(psychotree)
library(matrixStats)
library(reshape)
library(knitr)
library(patchwork)
library(formattable) 
library(glue)
library(foreach)

### some commands exist in multiple packages, here we define preferred ones that are frequently used
select <- dplyr::select
count <- dplyr::count
recode <- car::recode
rename <- dplyr::rename
Note

Quarto automatically adds links to R packages and functions throughout this document. However, this feature only works properly for packages available on CRAN. Since the easyRasch package is not on CRAN the links related to functions starting with RI will not work.

1.1 Loading data

We will use data from a recent paper investigating the “initial elevation effect” (Anvari et al., 2022), and focus on the 10 negative items from the PANAS. The data is available at the OSF website.

Code
df.all <- read_csv("https://osf.io/download/6fbr5/")
# if you have issues with the link, please try downloading manually using the same URL as above
# and read the file from your local drive.

# subset items and demographic variables
df <- df.all %>% 
  select(starts_with("PANASD2_1"),
         starts_with("PANASD2_20"),
         age,Sex,Group) %>% 
  select(!PANASD2_10_Active) %>% 
  select(!PANASD2_1_Attentive)

The glimpse() function provides a quick overview of our dataframe.

Code
glimpse(df)
Rows: 1,856
Columns: 13
$ PANASD2_11_Distressed <dbl> 2, 2, 2, 1, 2, 2, 4, 1, 1, 3, 1, 4, 2, 4, 4, 1, …
$ PANASD2_12_Upset      <dbl> 1, 1, 4, 1, 1, 5, 2, 1, 2, 2, 2, 3, 1, 3, 5, 1, …
$ PANASD2_13_Hostile    <dbl> 1, 1, 2, 1, 1, 3, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, …
$ PANASD2_14_Irritable  <dbl> 1, 1, 3, 1, 2, 5, 3, 1, 2, 4, 2, 3, 1, 2, 3, 1, …
$ PANASD2_15_Scared     <dbl> 1, 1, 3, 1, 1, 4, 1, 1, 1, 2, 2, 2, 1, 4, 4, 1, …
$ PANASD2_16_Afraid     <dbl> 1, 1, 4, 1, 1, 3, 1, 1, 1, 3, 1, 2, 1, 4, 4, 1, …
$ PANASD2_17_Ashamed    <dbl> 1, 1, 2, 1, 1, 3, 1, 1, 1, 2, 1, 4, 1, 1, 3, 1, …
$ PANASD2_18_Guilty     <dbl> 2, 1, 2, 1, 1, 3, 3, 1, 1, 3, 1, 4, 1, 1, 3, 1, …
$ PANASD2_19_Nervous    <dbl> 1, 1, 2, 1, 2, 4, 4, 1, 1, 4, 2, 4, 2, 1, 5, 1, …
$ PANASD2_20_Jittery    <dbl> 1, 2, 3, 1, 1, 2, 3, 3, 2, 1, 2, 2, 1, 1, 4, 1, …
$ age                   <dbl> 27, 32, 21, 27, 20, 22, 23, 25, 21, 26, 38, 36, …
$ Sex                   <chr> "Male", "Male", "Female", "Male", "Male", "Male"…
$ Group                 <chr> "Later Start", "Later Start", "Later Start", "La…

We have 1856 rows, ie. respondents. All variables except Sex and Group are of class dbl, which means they are numeric and can have decimals. Integer (numeric with no decimals) would also be fine for our purposes. The two demographic variables currently of class chr (character) will need to be converted to factors (fct), and we will do that later on.

(If you import a dataset where item variables are of class character, you will need to recode to numeric.)

1.2 Itemlabels

Then we set up the itemlabels dataframe. This could also be done using the free LibreOffice Calc or MS Excel. Just make sure the file has the same structure, with two variables named itemnr and item that contain the item variable names and item description. The item variable names have to match the variable names in the item dataframe.

Code
itemlabels <- df %>% 
  select(starts_with("PAN")) %>% 
  names() %>% 
  as_tibble() %>% 
  separate(value, c(NA, "item"), sep ="_[0-9][0-9]_") %>% 
  mutate(itemnr = paste0("PANAS_",c(11:20)), .before = "item")

The itemlabels dataframe looks like this.

Code
itemlabels
# A tibble: 10 × 2
   itemnr   item      
   <chr>    <chr>     
 1 PANAS_11 Distressed
 2 PANAS_12 Upset     
 3 PANAS_13 Hostile   
 4 PANAS_14 Irritable 
 5 PANAS_15 Scared    
 6 PANAS_16 Afraid    
 7 PANAS_17 Ashamed   
 8 PANAS_18 Guilty    
 9 PANAS_19 Nervous   
10 PANAS_20 Jittery   

1.3 Demographics

Variables for invariance tests such as Differential Item Functioning (DIF) need to be separated into vectors (ideally as factors with specified levels and labels) with the same length as the number of rows in the dataset. This means that any kind of removal of respondents/rows with missing data needs to be done before separating the DIF variables.

First, we’ll have a look at the class of our variables.

Code
df %>% 
  select(Sex,age,Group) %>% 
  glimpse()
Rows: 1,856
Columns: 3
$ Sex   <chr> "Male", "Male", "Female", "Male", "Male", "Male", "Female", "Fem…
$ age   <dbl> 27, 32, 21, 27, 20, 22, 23, 25, 21, 26, 38, 36, 24, 21, 19, 26, …
$ Group <chr> "Later Start", "Later Start", "Later Start", "Later Start", "Ear…

Sex and Group are class character, while age is numeric. Below are the response categories represented in data.

Code
table(df$Sex)

  CONSENT REVOKED      DATA EXPIRED            Female              Male 
                2                 1               896               955 
Prefer not to say 
                2 

Since there are only 5 respondents using labels that are not Female or Male (too few for meaningful statistical analysis), we will remove them to have a complete dataset for all variables in this example.

Code
df <- df %>% 
  filter(Sex %in% c("Female","Male")) # filtering to only include those with Sex being either Female or Male.

Let’s make the variable a factor (instead of class “character”), since that will be helpful in our later analyses.

Code
df$Sex <- factor(df$Sex)

Sometimes age is provided in categories, but here we have a numeric variable with age in years. Let’s have a quick look at the age distribution using a histogram, and get some summary statistics.

Code
summary(df$age)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  18.00   21.00   24.00   26.74   30.00   71.00 
Code
### simpler version of the ggplot below using base R function hist()
# hist(df$age, col = "#009ca6")
# abline(v = median(age, na.rm = TRUE))
# 
# df %>% 
#   summarise(Mean = round(mean(age, na.rm = T),1),
#             StDev = round(sd(age, na.rm = T),1)
#             )

ggplot(df) +
  geom_histogram(aes(x = age), 
                 fill = "#009ca6",
                 col = "black") +
  # add the average as a vertical line
  geom_vline(xintercept = median(df$age), 
             linewidth = 1.5,
             linetype = 2,
             col = "orange") +
  # add a light grey field indicating the standard deviation
  annotate("rect", ymin = 0, ymax = Inf, 
           xmin = summary(df$age)[2], 
           xmax = summary(df$age)[5], 
           alpha = .2) +
  labs(title = "",
       x = "Age in years",
       y = "Number of respondents",
       caption = str_wrap(glue("Note. Median age is {round(median(df$age, na.rm = T),1)}, shown with the dashed vertical line. Age range is {min(df$age)} to {max(df$age)}. Interquartile range ({summary(df$age)[2]} to {summary(df$age)[5]}) is indicated with the grey area.")
       )) +
  scale_x_continuous(limits = c(18,75)) +
  theme_bw() +
  theme(plot.caption = element_text(hjust = 0, face = "italic")) 

Since the distribution of the age variable is clearly not following a gaussian (normal) distribution, the median seems like a better option than the mean.

There is also a grouping variable which needs to be converted to a factor.

Code
df$Group <- factor(df$Group)

Now our demographic data has been checked, and we need to move it to a separate dataframe from the item data.

Code
dif <- df %>% 
  select(Sex,age,Group) %>% 
  rename(Age = age) # just for consistency

df <- df %>% 
  select(!c(Sex,age,Group))

To create a “Table 1” containing participant demographic information, the tbl_summary() function from the gtsummary package is really handy to get a nice table.

Code
gtsummary::tbl_summary(dif)
Characteristic N = 1,8511
Sex
    Female 896 (48%)
    Male 955 (52%)
Age 24 (21, 30)
Group
    Earlier Start 901 (49%)
    Later Start 950 (51%)
1 n (%); Median (Q1, Q3)

With only item data remaining in the df dataframe, we can easily rename the items in the item dataframe and make sure that they match the itemlabels variable itemnr. It will be useful to have short item labels to get less clutter in the tables and figures during analysis.

Code
names(df) <- itemlabels$itemnr

Now we are all set for the psychometric analysis!

2 Descriptives

Let’s familiarize ourselves with the data before diving into the analysis.

2.1 Missing data

First, we visualize the proportion of missing data on item level.

Code
RImissing(df)
[1] "No missing data."

No missing data in this dataset. If we had missing data, we could also use RImissingP() to look at which respondents have missing data and how much.

2.2 Overall responses

This provides us with an overall picture of the data distribution. As a bonus, any oddities/mistakes in recoding the item data from categories to numbers will be clearly visible.

Code
RIallresp(df)
Response category Number of responses Percent
1 9430 50.9
2 4136 22.3
3 2676 14.5
4 1722 9.3
5 546 2.9

Most R packages for Rasch analysis require the lowest response category to be zero, which makes it necessary for us to recode our data, from using the range of 1-5 to 0-4.

Code
df <- df %>% 
  mutate(across(everything(), ~ .x - 1))

# always check that your recoding worked as intended.
RIallresp(df)
Response category Number of responses Percent
0 9430 50.9
1 4136 22.3
2 2676 14.5
3 1722 9.3
4 546 2.9

2.2.1 Floor/ceiling effects

We can also look at the raw distribution of ordinal sum scores. The RIrawdist() function is a bit crude, since it requires responses in all response categories to accurately calculate max and min scores.

Code
RIrawdist(df)

There is a floor effect with 11.8% of participants responding in the lowest category for all items.

2.2.2 Guttman structure

While not really necessary, it can be interesting to see whether the response patterns follow a Guttman-like structure. Items and persons are sorted based on lower->higher responses, and we should see the color move from yellow in the lower left corner to blue in the upper right corner.

Code
RIheatmap(df) +
  theme(axis.text.x = element_blank())

In this data, we see the floor effect on the left, with 11.8% of respondents all yellow, and a rather weak Guttman structure. This could also be due to a low variation in item locations/difficulties. Since we have a very large sample I added a theme() option to remove the x-axis text, which would anyway just be a blur of the 1851 respondent row numbers. Each thin vertical slice in the figure is one respondent.

2.3 Item level descriptives

There are many ways to look at the item level data, and we’ll get them all together in the tab-panel below. The RItileplot() is probably most informative, since it provides the number of responses in each response category for each item. It is usually recommended to have at least ~10 responses in each category for psychometric analysis, no matter which methodology is used.

Kudos to Solomon Kurz for providing the idea and code on which the tile plot function is built!

Most people will be familiar with the barplot, and this is probably most intuitive to understand the response distribution within each item. However, if there are many items it will take a while to review, and does not provide the same overview as a tileplot or stacked bars.

Code
```{r}
#| column: margin
#| code-fold: true

# This code chunk creates a small table in the margin beside the panel-tabset output below, showing all items currently in the df dataframe.
# The Quarto code chunk option "#| column: margin" is necessary for the layout to work as intended.
RIlistItemsMargin(df, fontsize = 13)
```
itemnr item
PANAS_11 Distressed
PANAS_12 Upset
PANAS_13 Hostile
PANAS_14 Irritable
PANAS_15 Scared
PANAS_16 Afraid
PANAS_17 Ashamed
PANAS_18 Guilty
PANAS_19 Nervous
PANAS_20 Jittery
  • Tile plot
  • Stacked bars
  • Barplots
Code
RItileplot(df)

While response patterns are skewed for all items, there are more than 10 responses in each category for all items which is helpful for the analysis.

Code
RIbarstack(df) +
  theme_minimal() + # theming is optional, see section 11 for more on this
  theme_rise() 

Code
RIitemcols(df)

3 Rasch analysis 1

The eRm package and Conditional Maximum Likelihood (CML) estimation will be used primarily, with the Partial Credit Model since this is polytomous data.

This is also where the five basic psychometric aspects are good to recall.

  • Unidimensionality & local independence
  • Ordering of response categories
  • Invariance
  • Targeting
  • Measurement uncertainties (reliability)

We will begin by looking at unidimensionality, response categories, and targeting in parallel below. For unidimensionality, we are mostly interested in item fit and residual correlations, as well as PCA of residuals and loadings on the first residual contrast. At the same time, disordered response categories can influence item fit to some extent (and vice versa), and knowledge about targeting can be useful if it is necessary to remove items due to residual correlations.

When unidimensionality and response categories are found to work adequately, we will move on to invariance testing (Differential Item Functioning, DIF). It should be noted that DIF should be evaluated in parallel with all other psychometric aspects, but since it is a more complex issue it is kept in a separate section in this vignette (as is person fit). Finally, when/if invariance/DIF also looks acceptable, we can investigate reliability/measurement uncertainties.

Since our sample is quite large (n > 800), the sample size sensitive tests (item fit and item-restscore) that are likely to produce problematic false positive rates (Johansson, 2025a) will use a randomly selected smaller sample of n = 500 as examples. When analyzing large datasets, item-restscore bootstrap is recommended as the primary method of assessing item fit to the Rasch model. The use of one smaller subsample in this vignette is only for demonstration purposes, it is not a recommended method of analysis for large datasets.

Code
df500 <- df %>% 
  slice_sample(n = 500)
Note

In the tabset-panel below, each tab contains explanatory text, which is sometimes a bit lengthy. Remember to scroll back up and click on all tabs.

itemnr item
PANAS_11 Distressed
PANAS_12 Upset
PANAS_13 Hostile
PANAS_14 Irritable
PANAS_15 Scared
PANAS_16 Afraid
PANAS_17 Ashamed
PANAS_18 Guilty
PANAS_19 Nervous
PANAS_20 Jittery
  • Conditional item fit
  • Item-restscore
  • Item-restscore bootstrap
  • Conditional item characteristic curves
  • PCA of residuals
  • Conditional LRT
  • Residual correlations
  • Partial gamma LD
  • 1st contrast loadings
  • Analysis of response categories
  • Response categories MIRT
  • Targeting
  • Item hierarchy
Code
simfit1 <- RIgetfit(df500, iterations = 200, cpu = 8) # save simulation output to object `simfit1`
RIitemfit(df500, simfit1)
Item InfitMSQ Infit thresholds OutfitMSQ Outfit thresholds Infit diff Outfit diff Relative location
PANAS_11 1.166 [0.844, 1.123] 1.224 [0.839, 1.143] 0.043 0.081 1.14
PANAS_12 0.859 [0.876, 1.242] 0.847 [0.841, 1.274] 0.017 no misfit 1.39
PANAS_13 1.208 [0.856, 1.165] 1.202 [0.778, 1.344] 0.043 no misfit 1.94
PANAS_14 1.048 [0.887, 1.178] 1.064 [0.855, 1.205] no misfit no misfit 1.08
PANAS_15 0.736 [0.865, 1.189] 0.639 [0.773, 1.307] 0.129 0.134 1.59
PANAS_16 0.85 [0.836, 1.15] 0.842 [0.762, 1.274] no misfit no misfit 1.50
PANAS_17 0.96 [0.847, 1.233] 0.934 [0.752, 1.31] no misfit no misfit 1.98
PANAS_18 1.045 [0.788, 1.225] 1.026 [0.742, 1.309] no misfit no misfit 1.87
PANAS_19 1.045 [0.866, 1.156] 1.054 [0.847, 1.146] no misfit no misfit 0.90
PANAS_20 1.17 [0.846, 1.147] 1.243 [0.834, 1.194] 0.023 0.049 1.32
Note:
MSQ values based on conditional calculations (n = 500 complete cases).
Simulation based thresholds from 200 simulated datasets.

RIitemfit() and RIgetfit() both work with both dichotomous and polytomous data (using the partial credit model) and automatically selects the model based on the data structure.

It is important to note that the new (since version 0.2.2, released 2024-08-19) RIitemfit() function uses conditional outfit/infit, which is both robust to different sample sizes and makes ZSTD unnecessary (Müller, 2020).

Since the distribution of item fit statistics are not known, we need to use simulation to determine appropriate cutoff threshold values for the current sample and items. RIitemfit() can also use the simulation based cutoff values and use them for conditional highlighting of misfitting items. See the blog post on simulation based cutoffs for some more details on this. RIitemfit() can also be used without cutoffs and conditional highlighting. For a possibly useful rule-of-thumb cutoff for infit MSQ only, use the option cutoff = "Smith98" (Müller, 2020; R. M. Smith et al., 1998). However, this cutoff is not applicable for all items, only for what can be expected for the average item fit. The simulation/bootstrap-based cutoff values will be more accurate for every item in your data.

Briefly stated, the simulation uses the properties of the current sample and items, and simulates n iterations of data that fit the Rasch model to get an empirical distribution of item fit that we can use for comparison with the observed data. This is also known as “parametric bootstrapping”.

The simulation can take quite a bit of time to run if you have complex data/many items/many participants, and/or choose to use many iterations. Simulation experiments (Johansson, 2025a) indicate that 100-400 iterations should be a useful range, where smaller samples (n < 300) should use 100 iterations, and 200-400 is more appropriate when one has larger samples.

Note

Another important finding from simulation studies is that there is a risk of false positive indication of misfit in samples larger than n = 1000 when using item infit or item-restscore. The recommended primary method for determining item fit in large samples is the bootstrapped item-restscore (Johansson, 2025a), as illustrated in an adjacent tab labeled “Item-restscore bootstrap” .

For reference, the simulation above, using 10 items with 5 response categories each and 1851 respondents, takes about 24 seconds to run on 8 cpu cores (Macbook Pro M1 Max) for 400 iterations.

I’ll cite Ostini & Nering (2006) on the description of outfit and infit (pages 86-87):

Response residuals can be summed over respondents to obtain an item fit measure. Generally, the accumulation is done with squared standardized residuals, which are then divided by the total number of respondents to obtain a mean square statistic. In this form, the statistic is referred to as an unweighted mean square (Masters & Wright, 1997; Wright & Masters, 1982) and has also come to be known as “outfit” (Smith, Schumacker, & Bush, 1998; Wu, 1997), perhaps because it is highly sensitive to outlier responses (Adams & Khoo, 1996; Smith et al., 1998; Wright & Masters, 1982).

A weighted version of this statistic was developed to counteract its sensitivity to outliers (Smith, 2000). In its weighted form, the squared standardized residual is multiplied by the observed response variance and then divided by the sum of the item response variances. This is sometimes referred to as an information weighted mean square and has become known as “infit” (Smith et al., 1998; Wu, 1997).

A low item fit value (sometimes referred to as an item “overfitting” the Rasch model) indicates that responses are too predictable and provide little information. This is often the case for items that are very general/broad in scope in relation to the latent variable. You will often find overfitting items to also have residual correlations with other items.

A high item fit value (sometimes referred to as “underfitting” the Rasch model) can indicate several things, often multidimensionality or a question that is difficult to interpret and thus has noisy response data. The latter could for instance be caused by a question that asks about two things at the same time or is ambiguous for other reasons.

Outfit is not a useful method to detect item misfit, infit should be relied upon when sample sizes are appropriate (Johansson, 2025a).

Note

Remember to scroll back up and click on all tabs.

This is another useful function from the iarm package. It shows the expected and observed correlation between an item and a score based on the rest of the items (Kreiner, 2011). Similarly, but inverted, to item fit, a lower observed correlation value than expected indicates underfit, that the item may not belong to the dimension. A higher than expected observed value indicates an overfitting and possibly redundant item. Overfitting items will often also show issues with residual correlations. Both of these problems can often be (at least partially) resolved by removing underfitting items.

Code
RIrestscore(df500)
Item Observed value Model expected value Absolute difference Adjusted p-value (BH) Statistical significance level Location Relative location
PANAS_11 0.59 0.64 0.05 0.156 -0.33 1.14
PANAS_12 0.71 0.64 0.07 0.017 * -0.08 1.39
PANAS_13 0.59 0.62 0.03 0.342 0.47 1.94
PANAS_14 0.62 0.64 0.02 0.383 -0.39 1.08
PANAS_15 0.77 0.64 0.13 0.000 *** 0.12 1.59
PANAS_16 0.70 0.65 0.05 0.116 0.03 1.50
PANAS_17 0.69 0.62 0.07 0.084 . 0.51 1.98
PANAS_18 0.67 0.63 0.04 0.342 0.40 1.87
PANAS_19 0.62 0.64 0.02 0.378 -0.57 0.90
PANAS_20 0.57 0.63 0.06 0.084 . -0.15 1.32

Both item-restscore and conditional item infit/outfit will indicate “false misfit” when sample sizes are large (even when using simulation/bootstrap based cutoff values). This behavior can occur from about n = 500, and certainly will occur at samples of 800 and above (Johansson, 2025a). This “false misfit” is caused by truly misfitting items, which underlines the importance of removing one item at a time when one finds issues with misfit/multidimensionality. However, a useful way to get additional information about the probability of actual misfit is to use non-parametric bootstrapping. This function resamples with replacement from your response data and reports the percentage and type of misfit indicated by the item-restscore function. You will also get information about conditional MSQ infit (based on the full sample, using complete responders). Simulation studies indicate that a sample size of 800 results in 95+% detection rate of 1-3 misfitting items amongst 20 dichotomous items (Johansson, 2025a).

Code
RIbootRestscore(df, iterations = 250, samplesize = 800)
Item Item-restscore result % of iterations Conditional MSQ infit Relative average item location
PANAS_20 underfit 81.6 1.20 1.42
PANAS_11 underfit 52.4 1.20 1.14
PANAS_13 underfit 24.8 1.17 1.91
PANAS_14 underfit 9.6 1.06 1.15
PANAS_15 overfit 99.2 0.78 1.63
PANAS_16 overfit 98.8 0.80 1.48
PANAS_12 overfit 92.0 0.84 1.42
PANAS_17 overfit 48.8 0.95 1.83
PANAS_19 overfit 14.0 0.92 0.98
Note:
Results based on 250 bootstrap iterations with n = 800 and 10 items. Conditional mean-square infit based on complete responders only (n = 1851).

The iarm package (Mueller & Santiago, 2022) provides several interesting functions for assessing item fit, DIF and other things. Some of these functions have been implemented in simplified versions in easyRasch, and others may be included in a future version. Below are conditional item characteristic curves (ICC’s) using the estimated theta (factor score).

These curves indicate item fit on a group level, where respondents are split into “class intervals” based on their sum score/factor score.

Code
RIciccPlot(df) +
  guide_area() # this relocates the legend/guide

A similar visualization, even more informative and flexible, has been made available in the RASCHplot package (Buchardt et al., 2023). The package needs to be installed from GitHub (see commented code below). The linked paper is recommended reading, not least for descriptions of the useful options available. Below are some sample plots showing conditional ICC’s using the raw sum score.

Code
library(RASCHplot) # devtools::install_github("ERRTG/RASCHplot")

CICCplot(PCM(df),
         which.item = c(1:4),
         lower.groups = c(0,7,14,21,28),
         grid.items = TRUE)

Principal Component Analysis of Rasch model residuals (PCAR).

Code
RIpcmPCA(df)
Eigenvalues Proportion of variance
1.79 16.9%
1.47 15.1%
1.28 13.6%
1.14 13.3%
1.06 11.9%

Based on an old rule-of-thumb, the first eigenvalue should be below 1.5 to support unidimensionality (E. V. Smith, 2002). However, as with many other metrics the expected PCAR eigenvalue is also dependent on sample size and test length (Chou & Wang, 2010). To be clear, there is no single or easily calculated rule-of-thumb critical value for the largest PCA eigenvalue that will be generally applicable. A parametric bootstrap function has been implemented in easyRasch to determine a potentially appropriate cutoff value for the largest PCAR eigenvalue, but it has not been systematically evaluated yet. Below is an example, illustrated with a histogram of the simulated distribution of largest eigenvalues, the 99th percentile and the max value.

Code
pcasim <- RIbootPCA(df, iterations = 500, cpu = 8)
hist(pcasim$results, breaks = 50)

Code
pcasim$p99
  99% 
1.278 
Code
pcasim$max
[1] 1.307

If the bootstrap turns out to provide an appropriate cutoff value, it still needs to be used together with checking item fit (or item-restscore) and residual correlations (local dependence) to evaluate unidimensionality.

The PCA eigenvalues are mostly included here for those coming from Winsteps who may be looking for this metric. Speaking of Winsteps, the “explained variance” generated by RIpcmPCA() will not be comparable to Winsteps corresponding metric, since this function only shows the results from the analysis of residuals and not the variance explained by the Rasch model itself. Additionally, it should be noted that the PCA rotation is set to “oblimin”.

The Conditional Likelihood Ratio Test (CLRT, Andersen, 1973) is a global test of fit and can be a useful addition to more informative item-level metrics (Johansson, 2025a). CLRT compares the responses of those scoring below the median to those scoring above the median to assess the homogeneity of the test. Below, we run the test using the smaller sample with a function from the iarm package.

Code
clr_tests(df500, model = "PCM")

Conditional Likelihood Ratio Tests:
        clr df pvalue  sig 
overall 106 39 4.1e-08  ***

The test is clearly statistically significant, even with the small sample, which indicates that unidimensionality does not hold. However, CLRT is sensitive to large sample sizes and number of items, and produces false positives above the nominal 5% level. See https://pgmj.github.io/clrt.html for a very small simulation study. To minimize the sample size issue, a non-parametric bootstrap function has been implemented in easyRasch, where one can set the sample size manually. The function will then sample with replacement from the response data and conduct a separate CLRT for each sample drawn.

Code
RIbootLRT(df, iterations = 1000, samplesize = 400, cpu = 8)
Result n Percent
Not statistically significant 1000 100

Clearly statistically significant with the bootstrap method as well.

Global tests do not provide any information about the reason for misfit. The iarm package has also implemented a function to get item specific information related to the CLRT, showing observed and expected values for each item.

Code
item_obsexp(PCM(df500))
Score group 1: 
         mean obs mean exp std.res sig
PANAS_11  0.735    0.661    1.582     
PANAS_12  0.301    0.368   -1.775     
PANAS_13  0.301    0.227    2.296  +  
PANAS_14  0.616    0.600    0.350     
PANAS_15  0.178    0.285   -2.867  -  
PANAS_16  0.329    0.315    0.355     
PANAS_17  0.151    0.190   -1.326     
PANAS_18  0.169    0.206   -1.167     
PANAS_19  0.895    0.813    1.668     
PANAS_20  0.621    0.588    0.726     

Score group 2: 
         mean obs mean exp std.res sig
PANAS_11  1.959    2.028   -1.072     
PANAS_12  1.775    1.683    1.365     
PANAS_13  1.126    1.200   -1.179     
PANAS_14  2.036    2.045   -0.141     
PANAS_15  1.667    1.560    1.560     
PANAS_16  1.653    1.665   -0.176     
PANAS_17  1.095    1.055    0.642     
PANAS_18  1.270    1.233    0.568     
PANAS_19  2.153    2.234   -1.281     
PANAS_20  1.811    1.841   -0.488     

In order to support unidimensionality, items should only be related to each other through the latent variable. This is called “local independence”. By investigating patterns in model residuals, we can determine whether items are independent or not.

Similarly to item fit, we need to run simulations to get a useful cutoff threshold value for when residual correlations amongst item pairs are larger than would be expected from items that fit a unidimensional Rasch model (Christensen et al., 2017).

The simulation/bootstrap procedure can take some time to run, depending on the complexity of your data, but it is necessary to set the appropriate cutoff value.

Code
simcor1 <- RIgetResidCor(df, iterations = 400, cpu = 8)
RIresidcorr(df, cutoff = simcor1$p99)
PANAS_11 PANAS_12 PANAS_13 PANAS_14 PANAS_15 PANAS_16 PANAS_17 PANAS_18 PANAS_19 PANAS_20
PANAS_11
PANAS_12 -0.1
PANAS_13 -0.05 -0.01
PANAS_14 -0.11 0.09 0.07
PANAS_15 -0.14 -0.13 -0.22 -0.29
PANAS_16 -0.17 -0.1 -0.25 -0.27 0.38
PANAS_17 -0.18 -0.09 -0.09 -0.19 -0.13 -0.08
PANAS_18 -0.19 -0.15 -0.16 -0.18 -0.15 -0.13 0.32
PANAS_19 -0.13 -0.13 -0.25 -0.14 0.1 0.08 -0.21 -0.12
PANAS_20 -0.06 -0.22 -0.07 -0.05 -0.13 -0.19 -0.16 -0.15 -0.07
Note:
Relative cut-off value is -0.003, which is 0.099 above the average correlation (-0.102).
Correlations above the cut-off are highlighted in red text.

The matrix above shows item-pair correlations of item residuals, with highlights in red showing correlations crossing the threshold compared to the average item-pair correlation (for all item-pairs) (Christensen et al., 2017). Rasch model residual correlations (Yen’s Q3) are calculated using the mirt package.

Another way to assess local (in)dependence is by partial gamma coefficients (Kreiner & Christensen, 2004). This is also a function from the iarm package. See ?iarm::partgam_LD for details.

Code
RIpartgamLD(df)
Item 1 Item 2 Partial gamma SE Lower CI Upper CI Adjusted p-value (BH)
PANAS_15 PANAS_16 0.634 0.030 0.576 0.693 0.000
PANAS_16 PANAS_15 0.634 0.030 0.576 0.692 0.000
PANAS_17 PANAS_18 0.577 0.033 0.512 0.642 0.000
PANAS_18 PANAS_17 0.551 0.034 0.484 0.619 0.000
PANAS_12 PANAS_14 0.341 0.037 0.269 0.413 0.000
PANAS_16 PANAS_19 0.297 0.040 0.219 0.375 0.000
PANAS_14 PANAS_12 0.293 0.038 0.217 0.368 0.000
PANAS_15 PANAS_19 0.287 0.040 0.208 0.366 0.000
PANAS_19 PANAS_16 0.274 0.040 0.197 0.352 0.000
PANAS_19 PANAS_15 0.261 0.041 0.181 0.340 0.000
PANAS_14 PANAS_13 0.209 0.042 0.127 0.291 0.000
PANAS_13 PANAS_14 0.202 0.041 0.121 0.283 0.000
PANAS_12 PANAS_13 0.191 0.044 0.105 0.278 0.001
Code
RIloadLoc(df)

Here we see item locations and their loadings on the first residual contrast. This figure can be helpful to identify clusters in data or multidimensionality. You can get a table with standardized factor loadings using RIloadLoc(df, output = "table").

The xlims setting changes the x-axis limits for the plots. The default values usually make sense, and we mostly add this option to point out the possibility of doing so. You can also choose to only show plots for only specific items.

Code
RIitemCats(df, xlims = c(-5,5))

Each response category for each item should have a curve that indicates it to be the most probably response at some point on the latent variable (x axis in the figure).

For a more compact figure.

Code
mirt(df, model=1, itemtype='Rasch', verbose = FALSE) %>% 
  plot(type="trace", as.table = TRUE, 
       theta_lim = c(-5,5)) # changes x axis limits

Code
# increase fig-height in the chunk option above if you have many items
RItargeting(df, xlim = c(-5,4)) # xlim defaults to c(-4,4) if you omit this option

This figure shows how well the items fit the respondents/persons. It is a sort of Wright Map that shows person locations and item threshold locations on the same logit scale.

The top part shows person location histogram, the middle part an inverted histogram of item threshold locations, and the bottom part shows individual item threshold locations. The histograms also show means and standard deviations.

Here the items are sorted on their average threshold location (black diamonds). 84% confidence intervals are shown around each item threshold location. For further details, see the caption text below the figure.

The numbers displayed in the plot can be disabled using the option numbers = FALSE.

Code
RIitemHierarchy(df)

3.1 Analysis 1 comments

Item fit shows a lot of issues.

Item 18 has issues with the second lowest category being disordered. Several other items have very short distances between thresholds 1 and 2, which is also clearly seen in the Item Hierarchy figure above.

Two item-pairs show residual correlations far above the cutoff value:

  • 15 and 16 (scared and afraid)
  • 17 and 18 (ashamed and guilty)

Since item 15 also has a residual correlation with item 19, we will remove it. In the second pair, item 18 will be removed since it also has problems with disordered response categories.

Note

We have multiple “diagnostics” to review when deciding which item to remove if there are strong residual correlations between two items. Here is a list of commonly used criteria:

  • item fit
  • item threshold locations compared to sample locations (targeting)
  • ordering of response categories
  • DIF
  • and whether there are residual correlations between one item and multiple other items
Code
removed.items <- c("PANAS_15","PANAS_18")

df_backup <- df
df500_backup <- df500

df <- df_backup %>% 
  select(!any_of(removed.items))

df500 <- df500_backup %>% 
  select(!any_of(removed.items))

As seen in the code above, I chose to create a backup copy of the original dataframes, then removing the items. This can be useful if, at a later stage in the analysis, I want to be able to quickly “go back” and reinstate an item or undo any other change I have made.

4 Rasch analysis 2

With items 15 and 18 removed.

itemnr item
PANAS_11 Distressed
PANAS_12 Upset
PANAS_13 Hostile
PANAS_14 Irritable
PANAS_16 Afraid
PANAS_17 Ashamed
PANAS_19 Nervous
PANAS_20 Jittery
  • Item-restscore bootstrap
  • Conditional item fit
  • PCA of residuals
  • Residual correlations
  • 1st contrast loadings
  • Targeting
  • Item hierarchy
Code
RIbootRestscore(df, iterations = 250, samplesize = 800)
Item Item-restscore result % of iterations Conditional MSQ infit Relative average item location
PANAS_20 underfit 39.6 1.16 1.39
PANAS_11 underfit 13.6 1.14 1.11
PANAS_12 overfit 97.6 0.80 1.38
PANAS_16 overfit 58.0 0.88 1.44
PANAS_17 overfit 6.4 1.04 1.79
PANAS_19 overfit 5.2 0.94 0.95
Note:
Results based on 250 bootstrap iterations with n = 800 and 8 items. Conditional mean-square infit based on complete responders only (n = 1851).

Still using the smaller sample of n = 500.

Code
simfit2 <- RIgetfit(df500, iterations = 400, cpu = 8)
RIitemfit(df500, simcut = simfit2)
Item InfitMSQ Infit thresholds OutfitMSQ Outfit thresholds Infit diff Outfit diff Relative location
PANAS_11 1.095 [0.848, 1.146] 1.132 [0.85, 1.178] no misfit no misfit 1.09
PANAS_12 0.814 [0.863, 1.208] 0.802 [0.841, 1.243] 0.049 0.039 1.33
PANAS_13 1.105 [0.863, 1.189] 1.044 [0.741, 1.292] no misfit no misfit 1.89
PANAS_14 0.965 [0.857, 1.148] 1.007 [0.845, 1.175] no misfit no misfit 1.03
PANAS_16 0.914 [0.815, 1.165] 0.893 [0.788, 1.227] no misfit no misfit 1.44
PANAS_17 1.02 [0.836, 1.143] 1.001 [0.769, 1.332] no misfit no misfit 1.91
PANAS_19 1.029 [0.856, 1.166] 1.028 [0.856, 1.174] no misfit no misfit 0.86
PANAS_20 1.123 [0.84, 1.159] 1.165 [0.826, 1.155] no misfit 0.01 1.27
Note:
MSQ values based on conditional calculations (n = 500 complete cases).
Simulation based thresholds from 400 simulated datasets.
Code
RIpcmPCA(df)
Eigenvalues Proportion of variance
1.52 18.9%
1.33 17.1%
1.19 16.6%
1.15 14.6%
1.00 13.1%
Code
simcor2 <- RIgetResidCor(df, iterations = 400, cpu = 8)
RIresidcorr(df, cutoff = simcor2$p99)
PANAS_11 PANAS_12 PANAS_13 PANAS_14 PANAS_16 PANAS_17 PANAS_19 PANAS_20
PANAS_11
PANAS_12 -0.16
PANAS_13 -0.11 -0.06
PANAS_14 -0.19 0.03 0.01
PANAS_16 -0.16 -0.08 -0.25 -0.28
PANAS_17 -0.18 -0.06 -0.09 -0.19 0
PANAS_19 -0.15 -0.15 -0.28 -0.18 0.12 -0.16
PANAS_20 -0.1 -0.27 -0.13 -0.11 -0.18 -0.15 -0.09
Note:
Relative cut-off value is -0.039, which is 0.091 above the average correlation (-0.129).
Correlations above the cut-off are highlighted in red text.
Code
RIloadLoc(df)

Code
RItargeting(df, xlim = c(-4,4), bins = 45)

Code
RIitemHierarchy(df)

4.1 Analysis 2 comments

Items 16 & 19, and 12 & 14 show problematic residual correlations.

Let’s look at DIF before taking action upon this information. While we are keeping DIF as a separate section in this vignette, it is recommended to include DIF-analysis in the panel-tabset above (on item fit, PCA, residual correlations, etc).

5 DIF - differential item functioning

We’ll be looking at whether item (threshold) locations are stable between demographic subgroups.

There are several DIF analysis tools available. The first one uses the package psychotree, which relies on statistical significance at p < .05 as an indicator for DIF. This is a criterion that is highly sample size sensitive, and we are always interested in the size/magnitude of DIF as well, since that will inform us about the impact of DIF on the estimated latent variable.

The structure of DIF is also an important and complex aspect, particularly for polytomous data. Uniform DIF means that the DIF is similar across the latent continuum. We can test this in R using the lordif package, as demonstrated in Section 5.7. However, it should be noted that the lordif package does not provide an option to use Rasch models, and there may be results that are caused by also allowing the discrimination parameter to vary across items.

A recent preprint (Henninger et al., 2024) does a great job illustrating “differential step functioning” (DSF), which is when item threshold locations in polytomous data show varying levels of DIF. It also describes a forthcoming development of the psychotree where one can use DIF effect size and purification functions to evaluate DIF/DSF. When the updated package is available, I will work to implement these new functions into the easyRasch package as well.

Note

It is important to ensure that no cells in the data are empty for subgroups when conducting a DIF analysis. Split the data using the DIF-variable and create separate tileplots to review the response distribution in the DIF-groups.

Code
difPlots <- df %>% # save the output into the `difPlots` object
  add_column(gender = dif$Sex) %>% # add the DIF variable to the dataframe
  split(.$gender) %>% # split the data using the DIF variable
  map(~ RItileplot(.x %>% dplyr::select(!gender)) + labs(title = .x$gender)) # create separate tileplots for each group

difPlots$Female + difPlots$Male # the actual name of the plots (in this case Male/Female) will be determined by the factor labels

5.1 Sex

itemnr item
PANAS_11 Distressed
PANAS_12 Upset
PANAS_13 Hostile
PANAS_14 Irritable
PANAS_16 Afraid
PANAS_17 Ashamed
PANAS_19 Nervous
PANAS_20 Jittery
  • Table
  • Figure items
  • Figure thresholds
Code
RIdifTable(df, dif$Sex)

Item 2 3 Mean location StDev MaxDiff
PANAS_11 -0.314 -0.196 -0.255 0.083 0.117
PANAS_12 0.028 -0.044 -0.008 0.051 0.073
PANAS_13 0.553 0.402 0.478 0.107 0.151
PANAS_14 -0.328 -0.183 -0.255 0.103 0.146
PANAS_16 0.004 0.114 0.059 0.078 0.111
PANAS_17 0.520 0.290 0.405 0.163 0.230
PANAS_19 -0.495 -0.355 -0.425 0.099 0.140
PANAS_20 0.032 -0.028 0.002 0.042 0.059
Code
RIdifFigure(df, dif$Sex)

Code
RIdifFigThresh(df, dif$Sex)

While no item shows problematic levels of DIF regarding item location, as shown by the table, there is an interesting pattern in the thresholds figure. The lowest threshold seems to be slightly lower for node 3 (Male) for all items. Also, item 11 shows a much wider spread of item locations for node 3 compared to node 2.

The results do not require any action since the difference is small.

5.2 Age

The psychotree package uses a model-based recursive partitioning that is particularly useful when you have a continuous variable such as age in years and a large enough sample. It will test different ways to partition the age variable to determine potential group differences (Strobl et al., 2015b, 2021).

Code
RIdifTable(df, dif$Age)
[1] "No statistically significant DIF found."

No DIF found for age.

5.3 Group

Code
RIdifTable(df, dif$Group)
[1] "No statistically significant DIF found."

And no DIF for group.

5.4 Sex and age

The psychotree package also allows for DIF interaction analysis with multiple DIF variables. We can use RIdifTable2() to input two DIF variables.

Code
RIdifTable2(df, dif$Sex, dif$Age)

Item 2 3 Mean location StDev MaxDiff
PANAS_11 -0.314 -0.196 -0.255 0.083 0.117
PANAS_12 0.028 -0.044 -0.008 0.051 0.073
PANAS_13 0.553 0.402 0.478 0.107 0.151
PANAS_14 -0.328 -0.183 -0.255 0.103 0.146
PANAS_16 0.004 0.114 0.059 0.078 0.111
PANAS_17 0.520 0.290 0.405 0.163 0.230
PANAS_19 -0.495 -0.355 -0.425 0.099 0.140
PANAS_20 0.032 -0.028 0.002 0.042 0.059

No interaction effect found for sex and age. The analysis only shows the previously identified DIF for sex.

5.5 LRT-based DIF

We’ll use the group variable as an example. First, we can simply run the test to get the overall result.

Code
erm.out <- PCM(df)
LRtest(erm.out, splitcr = dif$Group)

Andersen LR-test: 
LR-value: 46.864 
Chi-square df: 31 
p-value:  0.034 

Review the documentation for further details, using ?LRtest in your R console panel in Rstudio. There is also a plotting function, plotGOF() that may be of interest.

itemnr item
PANAS_11 Distressed
PANAS_12 Upset
PANAS_13 Hostile
PANAS_14 Irritable
PANAS_16 Afraid
PANAS_17 Ashamed
PANAS_19 Nervous
PANAS_20 Jittery
  • Item location table
  • Item location figure
  • Item threshold table
  • Item threshold figure
Code
RIdifTableLR(df, dif$Group)
Item locations
Standard errors
Item Earlier Start Later Start MaxDiff All SE_Earlier Start SE_Later Start SE_All
PANAS_11 0.069 0.072 0.003 0.073 0.140 0.124 0.093
PANAS_12 0.33 0.346 0.016 0.342 0.155 0.133 0.101
PANAS_13 0.69 0.966 0.276 0.826 0.185 0.189 0.129
PANAS_14 0.118 0.054 0.064 0.084 0.142 0.121 0.092
PANAS_16 0.447 0.362 0.085 0.401 0.160 0.134 0.102
PANAS_17 0.66 0.822 0.162 0.751 0.184 0.172 0.125
PANAS_19 -0.028 -0.122 0.094 -0.083 0.136 0.117 0.088
PANAS_20 0.351 0.348 0.003 0.352 0.160 0.138 0.104
Note:
Values highlighted in red are above the chosen cutoff 0.5 logits. Background color brown and blue indicate the lowest and highest values among the DIF groups.
Code
RIdifFigureLR(df, dif$Group) + theme_rise()

Code
RIdifThreshTblLR(df, dif$Group)
Threshold locations
Standard errors
Item threshold Earlier Start Later Start MaxDiff All SE_Earlier Start SE_Later Start SE_All
PANAS_11
c1 -1.245 -1.241 0.004 -1.240 0.098 0.094 0.068
c2 -0.365 -0.221 0.144 -0.284 0.107 0.100 0.073
c3 0.281 0.096 0.185 0.180 0.131 0.114 0.086
c4 1.604 1.655 0.051 1.637 0.224 0.188 0.144
PANAS_12
c1 -0.484 -0.362 0.122 -0.418 0.092 0.091 0.065
c2 0.241 -0.198 0.439 -0.005 0.126 0.108 0.082
c3 0.479 0.456 0.023 0.467 0.169 0.129 0.103
c4 1.086 1.489 0.403 1.323 0.233 0.205 0.153
PANAS_13
c1 -0.067 0.23 0.297 0.093 0.092 0.089 0.064
c2 0.403 0.115 0.288 0.248 0.135 0.118 0.088
c3 0.889 1.042 0.153 0.983 0.197 0.164 0.126
c4 1.536 2.476 0.94 1.979 0.316 0.384 0.239
PANAS_14
c1 -1.017 -0.972 0.045 -0.990 0.095 0.094 0.066
c2 -0.134 -0.272 0.138 -0.205 0.111 0.103 0.076
c3 0.456 0.095 0.361 0.239 0.146 0.116 0.091
c4 1.168 1.366 0.198 1.294 0.216 0.173 0.135
PANAS_16
c1 -0.156 -0.25 0.094 -0.202 0.097 0.091 0.066
c2 0.009 -0.091 0.1 -0.046 0.130 0.112 0.085
c3 0.324 0.354 0.03 0.344 0.159 0.132 0.102
c4 1.611 1.435 0.176 1.508 0.253 0.201 0.157
PANAS_17
c1 0.264 0.368 0.104 0.324 0.097 0.089 0.066
c2 0.389 0.421 0.032 0.412 0.146 0.128 0.096
c3 0.804 0.955 0.151 0.894 0.205 0.181 0.136
c4 1.182 1.545 0.363 1.373 0.288 0.290 0.204
PANAS_19
c1 -1.339 -1.263 0.076 -1.297 0.100 0.098 0.070
c2 -0.388 -0.27 0.118 -0.323 0.108 0.105 0.075
c3 0.101 -0.333 0.434 -0.143 0.127 0.111 0.083
c4 1.512 1.378 0.134 1.430 0.207 0.156 0.125
PANAS_20
c1 -0.906 -0.877 0.029 -0.887 0.093 0.090 0.065
c2 -0.189 -0.257 0.068 -0.223 0.108 0.099 0.073
c3 1.037 0.585 0.452 0.760 0.162 0.123 0.098
c4 1.463 1.941 0.478 1.756 0.277 0.238 0.180
Note:
Values highlighted in red are above the chosen cutoff 0.5 logits. Background color brown and blue indicate the lowest and highest values among the DIF groups.
Code
RIdifThreshFigLR(df, dif$Group) + theme_rise()

The item threshold table shows that the top threshold for item 13 differs more than 0.5 logits between groups. In this set of 8 items with 4 thresholds each, it is unlikely to result in problematic differences in estimated person scores.

5.6 Conditional ICC

The iarm package function for CICC plots also includes the option to use a DIF variable.

Code
RIciccPlot(df, dif = "yes", dif_var = dif$Sex)

5.7 Logistic Ordinal Regression DIF

The lordif package (Choi et al., 2011) does not use a Rasch measurement model, it only offers a choice between the Graded Response Model (GRM) and the Generalized Partial Credit Model (GPCM). Both of these are 2PL models, meaning that they estimate a discrimination parameter for each item in addition to the item threshold parameters. lordif relies on the mirt package.

There are several nice features available in the lordif package. First, we get a χ2 test of uniform or non-uniform DIF. Second, there are three possible methods/criteria for flagging items with potential DIF. One of these uses a likelihood ratio (LR) χ2 test, while the other two are indicators of DIF size/magnitude, either using a pseudo R2 statistic (“McFadden”, “Nagelkerke”, or “CoxSnell”) or a Beta criterion. For further details, see ?lordif in your R console or the paper describing the package (Choi et al., 2011).

Below is some sample code to get you started with lordif.

Code
library(lordif)

g_dif <- lordif(as.data.frame(df), as.numeric(dif$Sex), # make sure that the data is in a dataframe-object and that the DIF variable is numeric
                criterion = c("Chisqr"), 
                alpha = 0.01, 
                beta.change = 0.1,
                model = "GPCM",
                R2.change = 0.02)

g_dif_sum <- summary(g_dif)
Code
# threshold values for colorizing the table below
alpha = 0.01
beta.change = 0.1
R2.change = 0.02

g_dif_sum$stats %>% 
  as.data.frame() %>% 
  select(!all_of(c("item","df12","df13","df23"))) %>% 
  round(3) %>% 
  add_column(itemnr = names(df), .before = "ncat") %>% 
  mutate(across(c(chi12,chi13,chi23), ~ cell_spec(.x,
                               color = case_when(
                                 .x < alpha ~ "red",
                                 TRUE ~ "black"
                               )))) %>%
  mutate(across(starts_with("pseudo"), ~ cell_spec(.x,
                               color = case_when(
                                 .x > R2.change ~ "red",
                                 TRUE ~ "black"
                               )))) %>%
  mutate(beta12 =  cell_spec(beta12,
                               color = case_when(
                                 beta12 > beta.change ~ "red",
                                 TRUE ~ "black"
                               ))) %>% 
  kbl_rise()
itemnr ncat chi12 chi13 chi23 beta12 pseudo12.McFadden pseudo13.McFadden pseudo23.McFadden pseudo12.Nagelkerke pseudo13.Nagelkerke pseudo23.Nagelkerke pseudo12.CoxSnell pseudo13.CoxSnell pseudo23.CoxSnell
PANAS_11 5 0.323 0 0 0.002 0 0.004 0.003 0 0.005 0.005 0 0.005 0.005
PANAS_12 5 0.013 0.019 0.192 0.008 0.001 0.002 0 0.001 0.002 0 0.001 0.002 0
PANAS_13 5 0.106 0.057 0.077 0.007 0.001 0.001 0.001 0.001 0.002 0.001 0.001 0.002 0.001
PANAS_14 5 0.182 0.401 0.83 0.002 0 0 0 0 0 0 0 0 0
PANAS_16 5 0.64 0.17 0.068 0.001 0 0.001 0.001 0 0.001 0.001 0 0.001 0.001
PANAS_17 5 0 0 0.32 0.032 0.008 0.008 0 0.011 0.011 0 0.01 0.01 0
PANAS_19 5 0.178 0.25 0.33 0.002 0 0 0 0 0.001 0 0 0.001 0
PANAS_20 5 0.68 0.349 0.164 0.001 0 0 0 0 0.001 0.001 0 0.001 0.001

We can review the results regarding uniform/non-uniform DIF by looking at the chi* columns. Uniform DIF is indicated by column chi12 and non-uniform DIF by chi23, while column chi13 represents “an overall test of”total DIF effect” (Choi et al., 2011).

While the table indicates significant chi2-tests for items 11 and 17, the magnitude estimates are low for these items.

There are some plots available as well, using the base R plot() function. For some reason the plots won’t render in this Quarto document, so I will try to sort that out at some point.

Code
plot(g_dif) # use option `graphics.off()` to get the plots rendered one by one
#plot(g_dif, graphics.off())

5.8 Partial gamma DIF

The iarm package provides a function to assess DIF by partial gamma (Bjorner et al., 1998). It should be noted that this function only shows a single partial gamma value per item, so if you have more than two groups in your comparison, you will want to also use other methods to understand your results better.

There are some recommended cutoff-values mentioned in the paper above:

No or negligible DIF:

  • Gamma within the interval -0.21 to 0.21, or
  • Gamma not significantly different from 0

Slight to moderate DIF:

  • Gamma within the interval -0.31 to 0.31 (and outside -0.21 to 0.21), or
  • not significantly outside the interval -0.21 to 0.21

Moderate to large DIF:

  • Gamma outside the interval -0.31 to 0.31, and
  • significantly outside the interval -0.21 to 0.21
Code
RIpartgamDIF(df, dif$Sex)
Item Partial gamma SE Lower CI Upper CI Adjusted p-value (BH)
PANAS_17 0.234 0.054 0.127 0.341 0

We can see “slight” DIF for item 17, with a statistically significant gamma of .23.

6 Rasch analysis 3

While there were no significant issues with DIF for any item/subgroup combination, we need to address the previously identified problem:

  • Items 16 and 19 have the largest residual correlation.

We’ll remove item 19 since item 16 has better targeting.

Code
removed.items <- c(removed.items,"PANAS_19")

df <- df_backup %>% 
  select(!any_of(removed.items))

df500 <- df500_backup %>% 
  select(!any_of(removed.items))
itemnr item
PANAS_11 Distressed
PANAS_12 Upset
PANAS_13 Hostile
PANAS_14 Irritable
PANAS_16 Afraid
PANAS_17 Ashamed
PANAS_20 Jittery
  • Item-restscore bootstrap
  • Item fit
  • CICC
  • Residual correlations
  • LRT
  • Targeting
  • Item hierarchy
Code
RIbootRestscore(df, iterations = 250, samplesize = 800)
Item Item-restscore result % of iterations Conditional MSQ infit Relative average item location
PANAS_20 underfit 50.0 1.16 1.35
PANAS_11 underfit 12.8 1.13 1.07
PANAS_12 overfit 99.2 0.78 1.34
PANAS_17 overfit 9.6 1.00 1.74
PANAS_16 overfit 8.8 0.94 1.40
Note:
Results based on 250 bootstrap iterations with n = 800 and 7 items. Conditional mean-square infit based on complete responders only (n = 1851).
Code
simfit3 <- RIgetfit(df500, iterations = 200, cpu = 8)
RIitemfit(df500, simfit3)
Item InfitMSQ Infit thresholds OutfitMSQ Outfit thresholds Infit diff Outfit diff Relative location
PANAS_11 1.112 [0.871, 1.158] 1.146 [0.868, 1.141] no misfit 0.005 1.06
PANAS_12 0.806 [0.854, 1.169] 0.787 [0.828, 1.172] 0.048 0.041 1.31
PANAS_13 1.044 [0.817, 1.211] 0.984 [0.722, 1.292] no misfit no misfit 1.85
PANAS_14 0.959 [0.824, 1.178] 0.98 [0.845, 1.154] no misfit no misfit 1.00
PANAS_16 1.007 [0.837, 1.162] 1.063 [0.794, 1.245] no misfit no misfit 1.42
PANAS_17 0.97 [0.794, 1.198] 0.958 [0.756, 1.268] no misfit no misfit 1.88
PANAS_20 1.136 [0.871, 1.139] 1.187 [0.846, 1.174] no misfit 0.013 1.24
Note:
MSQ values based on conditional calculations (n = 500 complete cases).
Simulation based thresholds from 200 simulated datasets.
Code
RIciccPlot(df)

Code
simcor3 <- RIgetResidCor(df, iterations = 400, cpu = 8)
RIresidcorr(df, cutoff = simcor3$p99)
PANAS_11 PANAS_12 PANAS_13 PANAS_14 PANAS_16 PANAS_17 PANAS_20
PANAS_11
PANAS_12 -0.18
PANAS_13 -0.15 -0.11
PANAS_14 -0.22 0.01 -0.04
PANAS_16 -0.14 -0.06 -0.26 -0.26
PANAS_17 -0.2 -0.09 -0.14 -0.22 0
PANAS_20 -0.12 -0.29 -0.16 -0.13 -0.15 -0.17
Note:
Relative cut-off value is -0.049, which is 0.098 above the average correlation (-0.147).
Correlations above the cut-off are highlighted in red text.
Code
RIbootLRT(df, iterations = 1000, samplesize = 400, cpu = 8)
Result n Percent
Not statistically significant 76 7.6
Statistically significant 924 92.4
Code
RItargeting(df, bins = 45)

Code
RIitemHierarchy(df)

6.1 Analysis 3 comments

There are still problematic residual correlations and some items show misfit (most notably item 12 being overfit), but we will end this sample analysis here and move on to show other functions.

There are several item thresholds that are very closely located, as shown in the item hierarchy figure. This is not ideal, since it will inflate reliability estimates. However, we will not modify the response categories for this analysis, we only note that this is not workable and should be dealt with by trying variations of merged response categories to achieve better separation of threshold locations without disordering.

7 Reliability

Code
RItif(df)

The figure above shows the Test Information Function (TIF), which indicates the reliability of all items making up the test/scale (not the reliability of the sample).

The default cutoff value used in RItif() is TIF = 3.33, which roughly corresponds to person separation index (PSI) = 0.7 (although this is not always the case). PSI is similar to reliability coefficients such as omega and alpha, ranging from 0 to 1. You can change the TIF cutoff by using the option cutoff, for instance cutoff = 2.5 (TIF values range from 1 and up).

While 11.8% of respondents had a floor effect based on the raw sum scored data, the figure above shows us that 41.8% are located below the point where the items produce a PSI of 0.7 or higher. Again, note that this figure shows the reliability of the test/scale, not the sample. If you want to add the sample reliability use option samplePSI = TRUE. More details are available in the documentation ?RItif.

8 Person fit

We can also look at how the respondents fit the Rasch model with these items. By default, RIpfit() outputs a histogram and a hex heatmap with the person infit ZSTD statistic, using +/- 1.96 as cutoff values. This is currently the only person fit method implemented in the easyRasch package, and the curious analyst is suggested to look at the package PerFit for more tools.

Code
RIpfit(df)

You can export the person fit values to a new variable in the dataframe by specifying output = "dataframe", or if you just want the row numbers for respondents with deviant infit values, output = "rowid".

You can also specify a grouping variable to visualize the person fit for different groups.

Code
RIpfit(df, group = dif$Sex, output = "heatmap")

Person fit is a useful way to identify respondents with unexpected response patterns and investigate this further.

8.1 PerFit sample code

While none of the functions in the PerFit package has been implemented in easyRasch, this is some code to get you started if you are interested in using it. There are multiple methods/functions available for polytomous and dichotomous data, see the package documentation.

For this example, we’ll use the non-parametric U3 statistic generalized to polytomous items (Emons, 2008) and the smaller sample.

  • U3poly
  • Cutoff information
  • Flagged respondents
Code
library(PerFit)
pfit_u3poly <- U3poly(matrix = df500, 
                      Ncat = 5, # make sure to input number of response categories, not thresholds
                      IRT.PModel = "PCM")
Code
cutoff(pfit_u3poly)
$Cutoff
[1] 0.3762

$Cutoff.SE
[1] 0.0183

$Prop.flagged
[1] 0.086

$Tail
[1] "upper"

$Cutoff.CI
  2.5%  97.5% 
0.3463 0.4104 

attr(,"class")
[1] "PerFit.cutoff"
Code
flagged.resp(pfit_u3poly) %>% 
  pluck("Scores") %>% 
  as.data.frame() %>% 
  arrange(desc(PFscores))
   FlaggedID It1 It4 It7 It2 It5 It3 It6 PFscores
1        187   0   0   0   0   0   0   2   1.0000
2        392   0   0   0   0   0   0   1   1.0000
3        266   0   1   0   0   0   4   0   0.8695
4        124   4   0   4   0   0   4   0   0.7433
5        474   0   0   0   0   3   0   0   0.6635
6        351   0   4   2   4   4   1   4   0.6480
7        116   2   0   0   0   1   4   0   0.6454
8        384   0   0   0   0   2   0   0   0.6365
9         99   0   0   0   0   1   0   0   0.6192
10       385   0   0   0   0   1   0   0   0.6192
11       456   0   0   0   0   1   0   0   0.6192
12       329   0   0   0   2   0   0   0   0.5865
13       495   3   0   2   0   0   0   4   0.5838
14       396   0   2   0   4   0   0   0   0.5804
15       142   0   4   4   4   2   2   4   0.5738
16        73   4   2   0   4   4   0   0   0.5657
17        95   1   0   0   1   4   0   0   0.5447
18       172   0   0   0   0   1   0   1   0.5436
19        78   4   4   4   0   0   0   0   0.5364
20       285   1   0   0   2   0   0   3   0.5333
21        41   2   0   0   0   0   2   3   0.5205
22       320   0   1   0   0   0   3   0   0.5193
23       334   1   4   0   4   0   0   0   0.5188
24        34   0   0   0   1   0   0   1   0.4814
25       493   0   0   0   1   0   0   1   0.4814
26       184   3   3   0   2   0   4   2   0.4640
27       475   1   0   3   0   0   3   0   0.4530
28       425   4   4   0   3   3   2   4   0.4502
29       274   0   0   0   2   3   0   1   0.4464
30        36   3   0   0   0   0   0   0   0.4453
31       369   3   0   0   0   0   0   0   0.4453
32       311   0   0   1   0   0   2   0   0.4447
33        46   0   0   0   1   0   0   0   0.4446
34        98   0   0   0   1   0   0   0   0.4446
35       451   0   0   0   1   0   0   0   0.4446
36       407   0   0   4   2   3   1   1   0.4379
37       430   0   1   0   1   1   3   0   0.4215
38       446   3   0   3   4   4   0   2   0.4190
39       123   1   1   0   1   0   0   3   0.4143
40       346   0   0   2   0   3   0   0   0.4072
41       304   4   4   3   4   0   3   1   0.4016
42       318   2   4   0   4   3   3   0   0.4002
43       358   2   0   0   0   3   0   0   0.3863
44       299   4   4   2   0   0   2   0   0.3715
45        80   0   0   2   0   0   0   0   0.3709
46       110   0   0   2   0   0   0   0   0.3709
47       220   0   0   2   0   0   0   0   0.3709
48       440   0   0   2   0   0   0   0   0.3709
49       467   0   0   2   0   0   0   0   0.3709
50       238   1   2   4   3   0   0   0   0.3697
51       245   0   3   0   0   2   0   0   0.3694
52        17   1   1   3   1   2   1   4   0.3623
53       111   3   4   0   4   1   1   1   0.3596

The dataframe shown under the tab Flagged respondents above contains a variable named FlaggedID which represents the row id’s. This variable is useful if one wants to filter out respondents with deviant response patterns (person misfit). There are indications that persons with misfit may affect results of Andersen’s LR-test for DIF (Artner, 2016).

8.2 Item fit without aberrant responses

We can remove the misfitting persons to see how that affects item fit. Let’s also compare with the misfitting respondents identified by RIpfit(), also using the smaller sample.

Code
misfits <- flagged.resp(pfit_u3poly) %>% 
  pluck("Scores") %>% 
  as.data.frame() %>% 
  pull(FlaggedID)

misfits2 <- RIpfit(df500, output = "rowid")
  • All respondents
  • U3 misfit removed
  • ZSTD misfit removed
Code
RIitemfit(df500, simcut = simfit3)
Item InfitMSQ Infit thresholds OutfitMSQ Outfit thresholds Infit diff Outfit diff Relative location
PANAS_11 1.112 [0.871, 1.158] 1.146 [0.868, 1.141] no misfit 0.005 1.06
PANAS_12 0.806 [0.854, 1.169] 0.787 [0.828, 1.172] 0.048 0.041 1.31
PANAS_13 1.044 [0.817, 1.211] 0.984 [0.722, 1.292] no misfit no misfit 1.85
PANAS_14 0.959 [0.824, 1.178] 0.98 [0.845, 1.154] no misfit no misfit 1.00
PANAS_16 1.007 [0.837, 1.162] 1.063 [0.794, 1.245] no misfit no misfit 1.42
PANAS_17 0.97 [0.794, 1.198] 0.958 [0.756, 1.268] no misfit no misfit 1.88
PANAS_20 1.136 [0.871, 1.139] 1.187 [0.846, 1.174] no misfit 0.013 1.24
Note:
MSQ values based on conditional calculations (n = 500 complete cases).
Simulation based thresholds from 200 simulated datasets.
Code
RIitemfit(df500[-misfits,], simcut = simfit3)
Item InfitMSQ Infit thresholds OutfitMSQ Outfit thresholds Infit diff Outfit diff Relative location
PANAS_11 1.181 [0.871, 1.158] 1.193 [0.868, 1.141] 0.023 0.052 1.24
PANAS_12 0.812 [0.854, 1.169] 0.733 [0.828, 1.172] 0.042 0.095 1.63
PANAS_13 0.986 [0.817, 1.211] 0.874 [0.722, 1.292] no misfit no misfit 2.46
PANAS_14 1.039 [0.824, 1.178] 1.063 [0.845, 1.154] no misfit no misfit 1.16
PANAS_16 0.949 [0.837, 1.162] 0.892 [0.794, 1.245] no misfit no misfit 1.68
PANAS_17 0.936 [0.794, 1.198] 0.803 [0.756, 1.268] no misfit no misfit 2.57
PANAS_20 1.137 [0.871, 1.139] 1.17 [0.846, 1.174] no misfit no misfit 1.47
Note:
MSQ values based on conditional calculations (n = 448 complete cases).
Simulation based thresholds from 200 simulated datasets.
Code
RIitemfit(df500[-misfits2,], simcut = simfit3)
Item InfitMSQ Infit thresholds OutfitMSQ Outfit thresholds Infit diff Outfit diff Relative location
PANAS_11 1.095 [0.871, 1.158] 1.112 [0.868, 1.141] no misfit no misfit 1.14
PANAS_12 0.791 [0.854, 1.169] 0.766 [0.828, 1.172] 0.063 0.062 1.42
PANAS_13 1.007 [0.817, 1.211] 0.899 [0.722, 1.292] no misfit no misfit 2.09
PANAS_14 0.98 [0.824, 1.178] 0.995 [0.845, 1.154] no misfit no misfit 1.06
PANAS_16 1.005 [0.837, 1.162] 1.034 [0.794, 1.245] no misfit no misfit 1.51
PANAS_17 0.993 [0.794, 1.198] 0.963 [0.756, 1.268] no misfit no misfit 2.07
PANAS_20 1.162 [0.871, 1.139] 1.208 [0.846, 1.174] 0.023 0.034 1.32
Note:
MSQ values based on conditional calculations (n = 454 complete cases).
Simulation based thresholds from 200 simulated datasets.

9 Item parameters

To allow others (and oneself) to use the item parameters estimated for estimation of person locations/thetas, we should make the item parameters available. The function will also write a csv-file with the item threshold locations. Estimations of person locations/thetas can be done with the thetaEst() function from the catR package. This is implemented in the function RIestThetasOLD(), see below for details.

First, we’ll output the parameters into a table.

Code
RIitemparams(df)
Threshold 1 Threshold 2 Threshold 3 Threshold 4 Item location
PANAS_11 -1.63 -0.67 -0.22 1.21 -0.33
PANAS_12 -0.80 -0.39 0.06 0.89 -0.06
PANAS_13 -0.29 -0.15 0.56 1.53 0.42
PANAS_14 -1.38 -0.59 -0.16 0.87 -0.32
PANAS_16 -0.59 -0.44 -0.06 1.08 0
PANAS_17 -0.06 0.01 0.48 0.93 0.34
PANAS_20 -1.27 -0.61 0.35 1.32 -0.05
Note:
Item location is the average of the thresholds for each item.

The parameters can also be output to a dataframe or a file, using the option output = "dataframe" or output = "file".

10 Ordinal sum score to interval score

This table shows the corresponding “raw” ordinal sum score values and logit scores, with standard errors for each logit value. Interval scores are estimated using WL based on a simulated dataset using the item parameters estimated from the input dataset. The choice of WL as default is due to the lower bias compared to ML estimation (Warm, 1989).

(An option will hopefully be added at some point to create this table based on only item parameters.)

Code
RIscoreSE(df)
Ordinal sum score Logit score Logit std.error
0 -3.642 0.620
1 -2.543 0.716
2 -2.032 0.653
3 -1.693 0.578
4 -1.437 0.515
5 -1.228 0.468
6 -1.050 0.432
7 -0.893 0.406
8 -0.750 0.386
9 -0.618 0.371
10 -0.493 0.361
11 -0.373 0.353
12 -0.258 0.348
13 -0.144 0.346
14 -0.031 0.346
15 0.083 0.348
16 0.198 0.352
17 0.317 0.359
18 0.441 0.368
19 0.572 0.381
20 0.712 0.397
21 0.866 0.418
22 1.037 0.447
23 1.233 0.484
24 1.464 0.534
25 1.747 0.601
26 2.118 0.681
27 2.666 0.745
28 3.804 0.638

10.1 Ordinal/interval figure

The figure below can also be generated to illustrate the relationship between ordinal sum score and logit interval score. The errorbars default to show the standard error at each point, multiplied by 1.96.

Code
RIscoreSE(df, output = "figure")

10.2 Estimating interval level person scores

Based on the Rasch analysis output of item parameters, we can estimate each individuals location or score (also known as “theta”). RIestThetas() by default uses WLE estimation based on item parameters from a partial credit model (PCM) and outputs a dataframe with person locations (WLE) and measurement error (SEM) on the logit scale.

Code
thetas <- RIestThetas(df)

head(thetas)
         WLE       SEM
1 -2.5430672 0.7160122
2 -2.0319918 0.6533338
3 -0.1439296 0.3460439
4 -3.6420132 0.6202853
5 -2.0319918 0.6533338
6  0.1980813 0.3520345

Each individual has a standard error of measurement (SEM) associated with their estimated location/score. This is included in the output of the RIestThetas() function as the SEM variable, as seen above. We can review the distribution of measurement error with a figure.

We can take a look at the distribution of person locations (thetas) using a histogram.

Code
hist(thetas$WLE, 
     col = "#009ca6", 
     main = "Histogram of person locations (thetas)", 
     breaks = 20)

RIestThetasOLD() can be used with a pre-specified item (threshold) location matrix. The choice of WL as default is due to the lower bias compared to ML estimation (Warm, 1989). Similarly to RIscoreSE() you can (and may indeed need to) change the range of logit scores, using the option theta_range. The default is c(-7,7), which should hopefully work in most circumstances.

If you would like to use an existing item threshold location matrix, this code may be helpful:

Code
itemParameters <- read_csv("itemParameters.csv") %>% 
  as.matrix()
itemParameters
     Threshold 1 Threshold 2 Threshold 3 Threshold 4
[1,]     -1.2382     -0.3288      0.0666      1.4649
[2,]      0.0623      0.1480      0.8241      1.7680
[3,]     -0.9915     -0.2566      0.1231      1.1222
[4,]     -0.2179     -0.1217      0.2099      1.3248
[5,]      0.2868      0.3061      0.7319      1.1655
[6,]     -0.8929     -0.2860      0.6297      1.5673

As you can see, this is a matrix object (not a dataframe), with each item as a row, and the threshold locations as columns.

11 Figure design

Most of the figures created by the functions can be styled (colors, fonts, etc) by adding theme settings to them. You can use the standard ggplot function theme() and related theme-functions. As usual it is possible to “stack” theme functions, as seen in the example below.

You can also change coloring, axis limits/breaks, etc, just by adding ggplot options with a + sign.

A custom theme function, theme_rise(), is included in the easyRasch package. It might be easier to use if you are not familiar with theme().

For instance, you might like to change the font to “Lato” for the item hierarchy figure, and make the background transparent.

Code
RIitemHierarchy(df) +
  theme_minimal() + # first apply the minimal theme to make the background transparent
  theme_rise(fontfamily = "Lato") # then apply theme_rise, which simplifies making changes to all plot elements

As of package version 0.1.30.0, the RItargeting() function allows more flexibility in styling too, by having an option to return a list object with the three separate plots. See the NEWS file for more details. Since the RItargeting() function uses the patchwork library to combine plots, you can also make use of the many functions that patchwork includes. For instance, you can set a title with a specific theme:

Code
RItargeting(df) + plot_annotation(title = "Targeting", theme = theme_rise(fontfamily = "Arial"))

In order to change font for text inside plots (such as “t1” for thresholds) you will need to add an additional line of code.

update_geom_defaults("text", list(family = "Lato"))

Please note that the line of code above updates the default settings for geom_text() for the whole session. Also, some functions, such as RIloadLoc(), make use of geom_text_repel(), for which you would need to change the code above from “text” to “text_repel”.

A simple way to only change font family and font size would be to use theme_minimal(base_family = "Calibri", base_size = 14). Please see the reference page for default ggplot themes for alternatives to theme_minimal().

12 Software used

The grateful package is a nice way to give credit to the packages used in making the analysis. The package can create both a bibliography file and a table object, which is handy for automatically creating a reference list based on the packages used (or at least explicitly loaded).

Code
library(grateful)
pkgs <- cite_packages(cite.tidyverse = TRUE, 
                      output = "table",
                      bib.file = "grateful-refs.bib",
                      include.RStudio = TRUE,
                      out.dir = getwd())
# If kbl() is used to generate this table, the references will not be added to the Reference list.
formattable(pkgs, 
            table.attr = 'class=\"table table-striped\" style="font-size: 13px; font-family: Lato; width: 80%"')
Package Version Citation
base 4.4.3 R Core Team (2025)
car 3.1.2 Fox & Weisberg (2019)
easyRasch 0.3.6.1 Johansson (2025b)
eRm 1.0.6 Mair & Hatzinger (2007b); Mair & Hatzinger (2007a); Hatzinger & Rusch (2009); Rusch et al. (2013); Koller et al. (2015); Debelak & Koller (2019)
foreach 1.5.2 Microsoft & Weston (2022)
formattable 0.2.1 Ren & Russell (2021)
ggrepel 0.9.6 Slowikowski (2024)
glue 1.8.0 Hester & Bryan (2024)
gtsummary 2.0.4 Sjoberg et al. (2021)
iarm 0.4.3 Mueller (2022)
kableExtra 1.4.0 Zhu (2024)
knitr 1.48 Xie (2014); Xie (2015); Xie (2024)
lordif 0.3.3 Choi et al. (2016)
matrixStats 1.4.1 Bengtsson (2024)
mirt 1.44.0 Chalmers (2012)
patchwork 1.3.0 Pedersen (2024)
PerFit 1.4.6 Tendeiro et al. (2016)
psych 2.4.12 William Revelle (2024)
psychotree 0.16.1 Trepte & Verbeet (2010); Strobl et al. (2011); Strobl et al. (2015a); Komboz et al. (2018); Wickelmaier & Zeileis (2018)
RASCHplot 0.1.0 Buchardt et al. (2022)
reshape 0.8.9 Wickham (2007)
rmarkdown 2.28 Xie et al. (2018); Xie et al. (2020); Allaire et al. (2024)
tidyverse 2.0.0 Wickham et al. (2019)

13 Additional credits

Thanks to my colleagues at RISE for providing feedback and testing the package on Windows and MacOS platforms. Also, thanks to Mike Linacre and Jeanette Melin for providing useful feedback to improve this vignette.

14 Session info

Code
sessionInfo()
R version 4.4.3 (2025-02-28)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.3.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Stockholm
tzcode source: internal

attached base packages:
 [1] parallel  grid      stats4    stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] PerFit_1.4.6      ltm_1.2-0         polycor_0.8-1     msm_1.7.1        
 [5] MASS_7.3-64       lordif_0.3-3      rms_6.8-1         Hmisc_5.2-3      
 [9] RASCHplot_0.1.0   knitr_1.48        readxl_1.4.3      car_3.1-2        
[13] carData_3.0-5     grateful_0.2.4    easyRasch_0.3.6.1 doParallel_1.0.17
[17] iterators_1.0.14  furrr_0.3.1       future_1.34.0     foreach_1.5.2    
[21] ggdist_3.3.2      janitor_2.2.1     iarm_0.4.3        hexbin_1.28.4    
[25] catR_3.17         glue_1.8.0        ggrepel_0.9.6     patchwork_1.3.0  
[29] reshape_0.8.9     matrixStats_1.4.1 psychotree_0.16-1 psychotools_0.7-4
[33] partykit_1.2-23   mvtnorm_1.3-1     libcoin_1.0-10    psych_2.4.12     
[37] mirt_1.44.0       lattice_0.22-6    eRm_1.0-6         lubridate_1.9.4  
[41] forcats_1.0.0     stringr_1.5.1     dplyr_1.1.4       purrr_1.0.2      
[45] readr_2.1.5       tidyr_1.3.1       tibble_3.2.1      ggplot2_3.5.1    
[49] tidyverse_2.0.0   kableExtra_1.4.0  formattable_0.2.1

loaded via a namespace (and not attached):
  [1] bitops_1.0-8         RColorBrewer_1.1-3   tools_4.4.3         
  [4] backports_1.5.0      utf8_1.2.4           R6_2.5.1            
  [7] DT_0.33              vegan_2.6-8          sm_2.2-6.0          
 [10] mgcv_1.9-1           permute_0.9-7        withr_3.0.2         
 [13] gridExtra_2.3        progressr_0.14.0     archive_1.1.8       
 [16] quantreg_5.98        cli_3.6.3            gt_0.11.1           
 [19] sandwich_3.1-1       labeling_0.4.3       sass_0.4.9          
 [22] polspline_1.1.25     pbapply_1.7-2        systemfonts_1.1.0   
 [25] commonmark_1.9.2     foreign_0.8-88       relimp_1.0-5        
 [28] svglite_2.1.3        R.utils_2.12.3       parallelly_1.38.0   
 [31] sessioninfo_1.2.2    rstudioapi_0.17.1    generics_0.1.3      
 [34] vroom_1.6.5          distributional_0.4.0 qvcalc_1.0.3        
 [37] Matrix_1.7-2         abind_1.4-5          R.methodsS3_1.8.2   
 [40] lifecycle_1.0.4      multcomp_1.4-26      yaml_2.3.10         
 [43] snakecase_0.11.1     inum_1.0-5           gtsummary_2.0.4     
 [46] ggstance_0.3.7       irtoys_0.2.2         promises_1.3.2      
 [49] crayon_1.5.3         cowplot_1.1.3        pillar_1.10.1       
 [52] fda_6.1.8            vcdExtra_0.8-5       future.apply_1.11.2 
 [55] admisc_0.35          codetools_0.2-20     beepr_2.0           
 [58] data.table_1.16.0    vcd_1.4-12           vctrs_0.6.5         
 [61] testthat_3.2.1.1     cellranger_1.1.0     gtable_0.3.5        
 [64] cachem_1.1.0         ks_1.14.2            xfun_0.46           
 [67] mime_0.12            pracma_2.4.4         fds_1.8             
 [70] pcaPP_2.0-4          survival_3.8-3       audio_0.1-11        
 [73] RPushbullet_0.3.4    TH.data_1.1-2        nlme_3.1-167        
 [76] bit64_4.0.5          rprojroot_2.0.4      Deriv_4.1.3         
 [79] KernSmooth_2.23-26   rpart_4.1.24         colorspace_2.1-1    
 [82] gnm_1.1-5            nnet_7.3-20          mnormt_2.1.1        
 [85] tidyselect_1.2.1     bit_4.0.5            compiler_4.4.3      
 [88] curl_6.1.0           htmlTable_2.4.3      SparseM_1.84-2      
 [91] expm_0.999-9         xml2_1.3.6           checkmate_2.3.2     
 [94] scales_1.3.0         lmtest_0.9-40        digest_0.6.37       
 [97] rainbow_3.8          rmarkdown_2.28       ca_0.71.1           
[100] htmltools_0.5.8.1    pkgconfig_2.0.3      base64enc_0.1-3     
[103] SimDesign_2.17.1     fastmap_1.2.0        rlang_1.1.4         
[106] htmlwidgets_1.6.4    shiny_1.10.0         farver_2.1.2        
[109] zoo_1.8-12           jsonlite_1.8.9       mclust_6.1.1        
[112] dcurver_0.9.2        R.oo_1.26.0          RCurl_1.98-1.16     
[115] magrittr_2.0.3       Formula_1.2-5        munsell_0.5.1       
[118] Rcpp_1.0.14          stringi_1.8.4        brio_1.1.5          
[121] plyr_1.8.9           listenv_0.9.1        splines_4.4.3       
[124] hms_1.1.3            ggpubr_0.6.0         ggsignif_0.6.4      
[127] markdown_1.13        reshape2_1.4.4       GPArotation_2024.3-1
[130] evaluate_1.0.1       renv_1.0.7           deSolve_1.40        
[133] tzdb_0.4.0           httpuv_1.6.15        MatrixModels_0.5-3  
[136] cards_0.4.0          broom_1.0.7          xtable_1.8-4        
[139] rstatix_0.7.2        later_1.4.1          viridisLite_0.4.2   
[142] snow_0.4-4           memoise_2.0.1        cluster_2.1.8       
[145] corrplot_0.92        timechange_0.3.0     globals_0.16.3      
[148] hdrcde_3.4           here_1.0.1          

15 References

Allaire, J., Xie, Y., Dervieux, C., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., Wickham, H., Cheng, J., Chang, W., & Iannone, R. (2024). rmarkdown: Dynamic documents for r. https://github.com/rstudio/rmarkdown
Andersen, E. B. (1973). A goodness of fit test for the rasch model. Psychometrika, 38(1), 123–140. https://doi.org/10.1007/BF02291180
Anvari, F., Efendić, E., Olsen, J., Arslan, R. C., Elson, M., & Schneider, I. K. (2022). Bias in Self-Reports: An Initial Elevation Phenomenon. Social Psychological and Personality Science, 19485506221129160. https://doi.org/10.1177/19485506221129160
Artner, R. (2016). A simulation study of person-fit in the Rasch model. Psychological Test and Assessment Modeling, 58(3), 531–563.
Bengtsson, H. (2024). matrixStats: Functions that apply to rows and columns of matrices (and to vectors). https://CRAN.R-project.org/package=matrixStats
Bjorner, J. B., Kreiner, S., Ware, J. E., Damsgaard, M. T., & Bech, P. (1998). Differential Item Functioning in the Danish Translation of the SF-36. Journal of Clinical Epidemiology, 51(11), 1189–1202. https://doi.org/10.1016/S0895-4356(98)00111-5
Buchardt, A.-S., Christensen, K. B., & Jensen, N. (2023). Visualizing Rasch item fit using conditional item characteristic curves in R. Psychological Test and Assessment Modeling, 65(2), 206–219.
Buchardt, A.-S., Jensen, S. N., & Christensen, K. B. (2022). RASCHplot: Visualisation tool for validity of rasch models. https://github.com/ERRTG/RASCHplot
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
Choi, S. W., Gibbons, L. E., & Crane, P. K. (2011). Lordif: An R Package for Detecting Differential Item Functioning Using Iterative Hybrid Ordinal Logistic Regression/Item Response Theory and Monte Carlo Simulations. Journal of Statistical Software, 39(1), 1–30. https://doi.org/10.18637/jss.v039.i08
Choi, S. W., Laura E. Gibbons, with contributions from, & Crane, P. K. (2016). lordif: Logistic ordinal regression differential item functioning using IRT. https://CRAN.R-project.org/package=lordif
Chou, Y.-T., & Wang, W.-C. (2010). Checking Dimensionality in Item Response Models With Principal Component Analysis on Standardized Residuals. Educational and Psychological Measurement, 70(5), 717–731. https://doi.org/10.1177/0013164410379322
Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical Values for Yen’s Q3: Identification of Local Dependence in the Rasch Model Using Residual Correlations. Applied Psychological Measurement, 41(3), 178–194. https://doi.org/10.1177/0146621616677520
Debelak, R., & Koller, I. (2019). Testing the local independence assumption of the rasch model with Q3-based nonparametric model tests. Applied Psychological Measurement, 44. https://doi.org/10.1177/0146621619835501
Emons, W. H. M. (2008). Nonparametric Person-Fit Analysis of Polytomous Item Scores. Applied Psychological Measurement, 32(3), 224–247. https://doi.org/10.1177/0146621607302479
Fox, J., & Weisberg, S. (2019). An R companion to applied regression (Third). Sage. https://socialsciences.mcmaster.ca/jfox/Books/Companion/
Hatzinger, R., & Rusch, T. (2009). IRT models with relaxed assumptions in eRm: A manual-like instruction. Psychology Science Quarterly, 51.
Henninger, M., Radek, J., Sengewald, M.-A., & Strobl, C. (2024). Partial credit trees meet the partial gamma coefficient for quantifying DIF and DSF in polytomous items. OSF. https://doi.org/10.31234/osf.io/47sah
Hester, J., & Bryan, J. (2024). glue: Interpreted string literals. https://CRAN.R-project.org/package=glue
Johansson, M. (2025a). Detecting Item Misfit in Rasch Models. Educational Methods & Psychometrics, 3(18). https://doi.org/10.61186/emp.2025.5
Johansson, M. (2025b). easyRasch: Psychometric analysis in r with rasch measurement theory. https://github.com/pgmj/easyRasch
Johansson, M., Preuter, M., Karlsson, S., Möllerberg, M.-L., Svensson, H., & Melin, J. (2023). Valid and reliable? Basic and expanded recommendations for psychometric reporting and quality assessment. https://doi.org/10.31219/osf.io/3htzc
Koller, I., Maier, M., & Hatzinger, R. (2015). An empirical power analysis of quasi-exact tests for the rasch model: Measurement invariance in small samples. Methodology, 11. https://doi.org/10.1027/1614-2241/a000090
Komboz, B., Zeileis, A., & Strobl, C. (2018). Tree-based global model tests for polytomous Rasch models. Educational and Psychological Measurement, 78(1), 128–166. https://doi.org/10.1177/0013164416664394
Kreiner, S. (2011). A Note on Item–Restscore Association in Rasch Models. Applied Psychological Measurement, 35(7), 557–561. https://doi.org/10.1177/0146621611410227
Kreiner, S., & Christensen, K. B. (2004). Analysis of Local Dependence and Multidimensionality in Graphical Loglinear Rasch Models. Communications in Statistics - Theory and Methods, 33(6), 1239–1276. https://doi.org/10.1081/STA-120030148
Mair, P., & Hatzinger, R. (2007a). CML based estimation of extended rasch models with the eRm package in r. Psychology Science, 49. https://doi.org/10.18637/jss.v020.i09
Mair, P., & Hatzinger, R. (2007b). Extended rasch modeling: The eRm package for the application of IRT models in r. Journal of Statistical Software, 20. https://doi.org/10.18637/jss.v020.i09
Microsoft, & Weston, S. (2022). foreach: Provides foreach looping construct. https://CRAN.R-project.org/package=foreach
Mueller, M. (2022). iarm: Item analysis in rasch models. https://CRAN.R-project.org/package=iarm
Mueller, M., & Santiago, P. H. R. (2022). Iarm: Item Analysis in Rasch Models. https://cran.r-project.org/web/packages/iarm/index.html
Müller, M. (2020). Item fit statistics for Rasch analysis: Can we trust them? Journal of Statistical Distributions and Applications, 7(1), 5. https://doi.org/10.1186/s40488-020-00108-7
Ostini, R., & Nering, M. (2006). Polytomous Item Response Theory Models. SAGE Publications, Inc. https://doi.org/10.4135/9781412985413
Pedersen, T. L. (2024). patchwork: The composer of plots. https://CRAN.R-project.org/package=patchwork
R Core Team. (2025). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Ren, K., & Russell, K. (2021). formattable: Create “Formattable” data structures. https://CRAN.R-project.org/package=formattable
Rusch, T., Maier, M., & Hatzinger, R. (2013). Linear logistic models with relaxed assumptions in r. In B. Lausen, D. van den Poel, & A. Ultsch (Eds.), Algorithms from and for nature and life. Springer. https://doi.org/10.1007/978-3-319-00035-0_34
Sjoberg, D. D., Whiting, K., Curry, M., Lavery, J. A., & Larmarange, J. (2021). Reproducible summary tables with the gtsummary package. The R Journal, 13, 570–580. https://doi.org/10.32614/RJ-2021-053
Slowikowski, K. (2024). ggrepel: Automatically position non-overlapping text labels with “ggplot2”. https://CRAN.R-project.org/package=ggrepel
Smith, E. V. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3(2), 205–231.
Smith, R. M., Schumacker, R. E., & Bush, M. J. (1998). Using item mean squares to evaluate fit to the Rasch model. Journal of Outcome Measurement, 2(1), 66–78.
Strobl, C., Kopf, J., & Zeileis, A. (2015a). Rasch trees: A new method for detecting differential item functioning in the Rasch model. Psychometrika, 80(2), 289–316. https://doi.org/10.1007/s11336-013-9388-3
Strobl, C., Kopf, J., & Zeileis, A. (2015b). Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model. Psychometrika, 80(2), 289–316. https://doi.org/10.1007/s11336-013-9388-3
Strobl, C., Schneider, L., Kopf, J., & Zeileis, A. (2021). Using the raschtree function for detecting differential item functioning in the Rasch model. 12.
Strobl, C., Wickelmaier, F., & Zeileis, A. (2011). Accounting for individual differences in Bradley-Terry models by means of recursive partitioning. Journal of Educational and Behavioral Statistics, 36(2), 135–153. https://doi.org/10.3102/1076998609359791
Tendeiro, J. N., Meijer, R. R., & Niessen, A. S. M. (2016). PerFit: An R package for person-fit analysis in IRT. Journal of Statistical Software, 74(5), 1–27. https://doi.org/10.18637/jss.v074.i05
Trepte, S., & Verbeet, M. (Eds.). (2010). Allgemeinbildung in Deutschland – erkenntnisse aus dem SPIEGEL Studentenpisa-Test. VS Verlag.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450. https://doi.org/10.1007/BF02294627
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Wickham, H. (2007). Reshaping data with the reshape package. Journal of Statistical Software, 21(12). https://www.jstatsoft.org/v21/i12/
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
William Revelle. (2024). psych: Procedures for psychological, psychometric, and personality research. Northwestern University. https://CRAN.R-project.org/package=psych
Xie, Y. (2014). knitr: A comprehensive tool for reproducible research in R. In V. Stodden, F. Leisch, & R. D. Peng (Eds.), Implementing reproducible computational research. Chapman; Hall/CRC.
Xie, Y. (2015). Dynamic documents with R and knitr (2nd ed.). Chapman; Hall/CRC. https://yihui.org/knitr/
Xie, Y. (2024). knitr: A general-purpose package for dynamic report generation in r. https://yihui.org/knitr/
Xie, Y., Allaire, J. J., & Grolemund, G. (2018). R markdown: The definitive guide. Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown
Xie, Y., Dervieux, C., & Riederer, E. (2020). R markdown cookbook. Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook
Zhu, H. (2024). kableExtra: Construct complex table with “kable” and pipe syntax. https://CRAN.R-project.org/package=kableExtra

Reuse

CC BY 4.0

Citation

BibTeX citation:
@online{johansson2025,
  author = {Johansson, Magnus},
  title = {easyRasch Vignette},
  date = {2025-03-28},
  url = {https://pgmj.github.io/raschrvignette/RaschRvign.html},
  langid = {en}
}
For attribution, please cite this work as:
Johansson, M. (2025, March 28). easyRasch vignette. https://pgmj.github.io/raschrvignette/RaschRvign.html
Source Code
---
title: "easyRasch vignette"
subtitle: "R package for Rasch analysis"
author: 
  name: 'Magnus Johansson'
  affiliation: 'RISE Research Institutes of Sweden'
  affiliation-url: 'https://www.ri.se/en/person/magnus-p-johansson'
  orcid: '0000-0003-1669-592X'
date: last-modified
google-scholar: true
citation: true
website:
  open-graph:
    image: "/RaschRvign_files/figure-html/unnamed-chunk-34-1.png"
execute: 
  cache: true
  warning: false
  message: false
bibliography: 
  - references.bib
  - grateful-refs.bib
csl: apa.csl
editor_options: 
  chunk_output_type: console
---

This is an introduction to using the [`easyRasch` R package](https://pgmj.github.io/easyRasch/). A changelog for package updates is available [here](https://pgmj.github.io/easyRasch/news/index.html).

::: {.callout-note icon="true"}
## Note
This packages was previously known as `RISEkbmRasch`.
:::

Details on package installation are available at the [package GitHub page](https://pgmj.github.io/easyRasch).

If you are new to Rasch Measurement Theory, you may find this intro presentation useful:
<https://pgmj.github.io/RaschIRTlecture/slides.html>

This vignette will walk through a sample analysis using an open dataset with polytomous questionnaire data. This will include some data wrangling to structure the item data and itemlabels, then provide examples of the different functions. The full source code of this document can be found either [in this repository](https://github.com/pgmj/pgmj.github.io/blob/main/raschrvignette/RaschRvign.qmd) or by clicking on **\</\> CODE** at the top of this page. You should be able to use the source code "as is" and reproduce this document locally, as long as you have the required packages installed. This page and this website are built using the open source publishing tool [Quarto](https://www.quarto.org).

One of the aims with this package is to simplify reproducible psychometric analysis to shed light on the measurement properties of a scale, questionnaire or test. In a paper recently made available as a preprint [@johansson], our [research group](https://www.ri.se/en/what-we-do/projects/center-for-categorically-based-measurements) propose that the basic aspects of a psychometric analysis should include information about:

-   Unidimensionality
-   Response categories
-   Invariance
-   Targeting
-   Measurement uncertainties (reliability)

We'll include several ways to investigate these measurement properties, using Rasch Measurement Theory. There are also functions in the package less directly related to the criteria above, that will be demonstrated in this vignette.

Please note that this is just a sample analysis to showcase the R package. It is not intended as a "best practice" psychometric analysis example.

You can skip ahead to the Rasch analysis part in @sec-rasch if you are eager to look at the package output :)

There is a separate GitHub repository containing a template R-project to simplify using the `easyRasch` package when conducting a reproducible Rasch analysis in R: <https://github.com/pgmj/RISEraschTemplate>


## Getting started

Since the package is intended for use with Quarto, this vignette has also been created with Quarto. A "template" .qmd file [is available](https://github.com/pgmj/RISEraschTemplate/blob/main/analysis.qmd) that can be useful to have handy for copy&paste when running a new analysis.  You can also download a complete copy of the Quarto/R code to produce this document [here](https://github.com/pgmj/pgmj.github.io/blob/main/raschrvignette/RaschRvign.qmd).

Loading the `easyRasch` package will automatically load all the packages it depends on. However, it could be desirable to explicitly load all packages used, to simplify the automatic creation of citations for them, using the `grateful` package (see @sec-grateful).

```{r}
library(easyRasch) # devtools::install_github("pgmj/easyRasch", dependencies = TRUE)
library(grateful)
library(ggrepel)
library(car)
library(kableExtra)
library(readxl)
library(tidyverse)
library(eRm)
library(iarm)
library(mirt)
library(psych)
library(ggplot2)
library(psychotree)
library(matrixStats)
library(reshape)
library(knitr)
library(patchwork)
library(formattable) 
library(glue)
library(foreach)

### some commands exist in multiple packages, here we define preferred ones that are frequently used
select <- dplyr::select
count <- dplyr::count
recode <- car::recode
rename <- dplyr::rename
```

::: {.callout-note icon="true"}
## Note
Quarto automatically adds links to R packages and functions throughout this document. However, this feature only works properly for packages available on [CRAN](https://cran.r-project.org/). Since the `easyRasch` package is not on CRAN the links related to functions starting with **RI** will not work.
:::

### Loading data

We will use data from a recent paper investigating the "initial elevation effect" [@anvari2022], and focus on the 10 negative items from the PANAS. The data is available at the OSF website.

```{r}
df.all <- read_csv("https://osf.io/download/6fbr5/")
# if you have issues with the link, please try downloading manually using the same URL as above
# and read the file from your local drive.

# subset items and demographic variables
df <- df.all %>% 
  select(starts_with("PANASD2_1"),
         starts_with("PANASD2_20"),
         age,Sex,Group) %>% 
  select(!PANASD2_10_Active) %>% 
  select(!PANASD2_1_Attentive)
```

The `glimpse()` function provides a quick overview of our dataframe.

```{r}
glimpse(df)
```

We have `r nrow(df)` rows, ie. respondents. All variables except Sex and Group are of class `dbl`, which means they are numeric and can have decimals. Integer (numeric with no decimals) would also be fine for our purposes. The two demographic variables currently of class `chr` (character) will need to be converted to factors (`fct`), and we will do that later on.

(If you import a dataset where item variables are of class character, you will need to recode to numeric.)

### Itemlabels

Then we set up the itemlabels dataframe. This could also be done using the free [LibreOffice Calc](https://www.libreoffice.org/download/download-libreoffice/) or MS Excel. Just make sure the file has the same structure, with two variables named `itemnr` and `item` that contain the item variable names and item description. The item variable names have to match the variable names in the item dataframe.

```{r}
itemlabels <- df %>% 
  select(starts_with("PAN")) %>% 
  names() %>% 
  as_tibble() %>% 
  separate(value, c(NA, "item"), sep ="_[0-9][0-9]_") %>% 
  mutate(itemnr = paste0("PANAS_",c(11:20)), .before = "item")

```

The `itemlabels` dataframe looks like this.

```{r}
itemlabels
```

### Demographics

Variables for invariance tests such as Differential Item Functioning (DIF) need to be separated into vectors (ideally as factors with specified levels and labels) with the same length as the number of rows in the dataset. This means that any kind of removal of respondents/rows with missing data needs to be done before separating the DIF variables.

First, we'll have a look at the class of our variables.

```{r}
df %>% 
  select(Sex,age,Group) %>% 
  glimpse()
```

`Sex` and `Group` are class character, while `age` is numeric. Below are the response categories represented in data.

```{r}
table(df$Sex)
```

Since there are only 5 respondents using labels that are not Female or Male (too few for meaningful statistical analysis), we will remove them to have a complete dataset for all variables in this example.

```{r}
df <- df %>% 
  filter(Sex %in% c("Female","Male")) # filtering to only include those with Sex being either Female or Male.
```

Let's make the variable a factor (instead of class "character"), since that will be helpful in our later analyses.

```{r}
df$Sex <- factor(df$Sex)
```

Sometimes age is provided in categories, but here we have a numeric variable with age in years. Let's have a quick look at the age distribution using a histogram, and get some summary statistics.

```{r}
summary(df$age)
```

```{r}
### simpler version of the ggplot below using base R function hist()
# hist(df$age, col = "#009ca6")
# abline(v = median(age, na.rm = TRUE))
# 
# df %>% 
#   summarise(Mean = round(mean(age, na.rm = T),1),
#             StDev = round(sd(age, na.rm = T),1)
#             )

ggplot(df) +
  geom_histogram(aes(x = age), 
                 fill = "#009ca6",
                 col = "black") +
  # add the average as a vertical line
  geom_vline(xintercept = median(df$age), 
             linewidth = 1.5,
             linetype = 2,
             col = "orange") +
  # add a light grey field indicating the standard deviation
  annotate("rect", ymin = 0, ymax = Inf, 
           xmin = summary(df$age)[2], 
           xmax = summary(df$age)[5], 
           alpha = .2) +
  labs(title = "",
       x = "Age in years",
       y = "Number of respondents",
       caption = str_wrap(glue("Note. Median age is {round(median(df$age, na.rm = T),1)}, shown with the dashed vertical line. Age range is {min(df$age)} to {max(df$age)}. Interquartile range ({summary(df$age)[2]} to {summary(df$age)[5]}) is indicated with the grey area.")
       )) +
  scale_x_continuous(limits = c(18,75)) +
  theme_bw() +
  theme(plot.caption = element_text(hjust = 0, face = "italic")) 

```

Since the distribution of the age variable is clearly not following a gaussian (normal) distribution, the median seems like a better option than the mean.

There is also a grouping variable which needs to be converted to a factor.

```{r}
df$Group <- factor(df$Group)
```

Now our demographic data has been checked, and we need to move it to a separate dataframe from the item data.

```{r}
dif <- df %>% 
  select(Sex,age,Group) %>% 
  rename(Age = age) # just for consistency

df <- df %>% 
  select(!c(Sex,age,Group))
```

To create a "Table 1" containing participant demographic information, the `tbl_summary()` function from the `gtsummary` package is really handy to get a nice table.

```{r}
gtsummary::tbl_summary(dif)
```

With only item data remaining in the `df` dataframe, we can easily rename the items in the item dataframe and make sure that they match the `itemlabels` variable `itemnr`. It will be useful to have short item labels to get less clutter in the tables and figures during analysis.

```{r}
names(df) <- itemlabels$itemnr
```

Now we are all set for the psychometric analysis!

## Descriptives

Let's familiarize ourselves with the data before diving into the analysis.

### Missing data

First, we visualize the proportion of missing data on item level.

```{r}
RImissing(df)
```

No missing data in this dataset. If we had missing data, we could also use `RImissingP()` to look at which respondents have missing data and how much.

### Overall responses

This provides us with an overall picture of the data distribution. As a bonus, any oddities/mistakes in recoding the item data from categories to numbers will be clearly visible.

```{r}
RIallresp(df)
```

Most R packages for Rasch analysis require the lowest response category to be zero, which makes it necessary for us to recode our data, from using the range of 1-5 to 0-4.

```{r}
df <- df %>% 
  mutate(across(everything(), ~ .x - 1))

# always check that your recoding worked as intended.
RIallresp(df)

```

#### Floor/ceiling effects

We can also look at the raw distribution of ordinal sum scores. The `RIrawdist()` function is a bit crude, since it requires responses in all response categories to accurately calculate max and min scores.

```{r}
RIrawdist(df)
```

There is a floor effect with 11.8% of participants responding in the lowest category for all items.

#### Guttman structure

While not really necessary, it can be interesting to see whether the response patterns follow a Guttman-like structure. Items and persons are sorted based on lower-\>higher responses, and we should see the color move from yellow in the lower left corner to blue in the upper right corner.

```{r}
RIheatmap(df) +
  theme(axis.text.x = element_blank())
```

In this data, we see the floor effect on the left, with 11.8% of respondents all yellow, and a rather weak Guttman structure. This could also be due to a low variation in item locations/difficulties. Since we have a very large sample I added a `theme()` option to remove the x-axis text, which would anyway just be a blur of the `r nrow(df)` respondent row numbers. Each thin vertical slice in the figure is one respondent.

### Item level descriptives

There are many ways to look at the item level data, and we'll get them all together in the tab-panel below. The `RItileplot()` is probably most informative, since it provides the number of responses in each response category for each item. It is usually recommended to have at least \~10 responses in each category for psychometric analysis, no matter which methodology is used.

Kudos to [Solomon Kurz](https://solomonkurz.netlify.app/blog/2021-05-11-yes-you-can-fit-an-exploratory-factor-analysis-with-lavaan/) for providing the idea and code on which the tile plot function is built!

Most people will be familiar with the barplot, and this is probably most intuitive to understand the response distribution within each item. However, if there are many items it will take a while to review, and does not provide the same overview as a tileplot or stacked bars.

```{r}
#| column: margin
#| code-fold: true
#| echo: fenced

# This code chunk creates a small table in the margin beside the panel-tabset output below, showing all items currently in the df dataframe.
# The Quarto code chunk option "#| column: margin" is necessary for the layout to work as intended.
RIlistItemsMargin(df, fontsize = 13)
```

::: panel-tabset
#### Tile plot
```{r}
RItileplot(df)
```

While response patterns are skewed for all items, there are more than 10 responses in each category for all items which is helpful for the analysis.

#### Stacked bars
```{r}
RIbarstack(df) +
  theme_minimal() + # theming is optional, see section 11 for more on this
  theme_rise() 
```

#### Barplots
```{r}
#| fig-height: 9
RIitemcols(df)

```
:::



## Rasch analysis 1 {#sec-rasch}

The eRm package and Conditional Maximum Likelihood (CML) estimation will be used primarily, with the Partial Credit Model since this is polytomous data.

This is also where the [five basic psychometric aspects](https://doi.org/10.31219/osf.io/3htzc) are good to recall.

-   Unidimensionality & local independence
-   Ordering of response categories
-   Invariance
-   Targeting
-   Measurement uncertainties (reliability)

We will begin by looking at unidimensionality, response categories, and targeting in parallel below. For unidimensionality, we are mostly interested in item fit and residual correlations, as well as PCA of residuals and loadings on the first residual contrast. At the same time, disordered response categories can influence item fit to some extent (and vice versa), and knowledge about targeting can be useful if it is necessary to remove items due to residual correlations.

When unidimensionality and response categories are found to work adequately, we will move on to invariance testing (Differential Item Functioning, DIF). It should be noted that DIF should be evaluated in parallel with all other psychometric aspects, but since it is a more complex issue it is kept in a separate section in this vignette (as is person fit). Finally, when/if invariance/DIF also looks acceptable, we can investigate reliability/measurement uncertainties.

Since our sample is quite large (n > 800), the sample size sensitive tests (item fit and item-restscore) that are likely to produce problematic false positive rates [@johansson_detecting_2025] will use a randomly selected smaller sample of n = 500 as examples. When analyzing large datasets, item-restscore bootstrap is recommended as the primary method of assessing item fit to the Rasch model. The use of one smaller subsample in this vignette is only for demonstration purposes, it is not a recommended method of analysis for large datasets.

```{r}
df500 <- df %>% 
  slice_sample(n = 500)
```

::: {.callout-note}
In the tabset-panel below, each tab contains explanatory text, which is sometimes a bit lengthy. Remember to **scroll back up and click on all tabs**.
:::

```{r}
#| column: margin
#| echo: false
RIlistItemsMargin(df, fontsize = 13)
```

::: panel-tabset
### Conditional item fit

```{r}
simfit1 <- RIgetfit(df500, iterations = 200, cpu = 8) # save simulation output to object `simfit1`
RIitemfit(df500, simfit1)
```

`RIitemfit()` and `RIgetfit()` both work with both dichotomous and polytomous data (using the partial credit model) and automatically selects the model based on the data structure.

It is important to note that the new (since version 0.2.2, released 2024-08-19) `RIitemfit()` function uses **conditional** outfit/infit, which is both robust to different sample sizes and makes ZSTD unnecessary [@muller_item_2020].

Since the distribution of item fit statistics are not known, we need to use simulation to determine appropriate cutoff threshold values for the current sample and items. `RIitemfit()` can also use the simulation based cutoff values and use them for conditional highlighting of misfitting items. See the [blog post on simulation based cutoffs](https://pgmj.github.io/simcutoffs.html) for some more details on this. `RIitemfit()` can also be used without cutoffs and conditional highlighting. For a possibly useful rule-of-thumb cutoff for infit MSQ only, use the option `cutoff = "Smith98"` [@smith_using_1998;@muller_item_2020]. However, this cutoff is not applicable for all items, only for what can be expected for the *average* item fit. The simulation/bootstrap-based cutoff values will be more accurate for every item in your data.

Briefly stated, the simulation uses the properties of the current sample and items, and simulates n iterations of data that fit the Rasch model to get an empirical distribution of item fit that we can use for comparison with the observed data. This is also known as "parametric bootstrapping".

The simulation can take quite a bit of time to run if you have complex data/many items/many participants, and/or choose to use many iterations. Simulation experiments [@johansson_detecting_2025] indicate that 100-400 iterations should be a useful range, where smaller samples (n < 300) should use 100 iterations, and 200-400 is more appropriate when one has larger samples. 

::: {.callout-note}
Another important finding from simulation studies is that there is a risk of false positive indication of misfit in samples larger than n = 1000 when using item infit or item-restscore. The recommended primary method for determining item fit in large samples is the bootstrapped item-restscore [@johansson_detecting_2025], as illustrated in an adjacent tab labeled "Item-restscore bootstrap" .
:::

For reference, the simulation above, using 10 items with 5 response categories each and 1851 respondents, takes about 24 seconds to run on 8 cpu cores (Macbook Pro M1 Max) for 400 iterations.

I'll cite Ostini & Nering [-@ostini_polytomous_2006] on the description of outfit and infit (pages 86-87):

> Response residuals can be summed over respondents to obtain an item fit measure. Generally, the accumulation is done with squared standardized residuals, which are then divided by the total number of respondents to obtain a mean square statistic. In this form, the statistic is referred to as an **unweighted mean square** (Masters & Wright, 1997; Wright & Masters, 1982) and has also come to be known as **“outfit”** (Smith, Schumacker, & Bush, 1998; Wu, 1997), perhaps because it is highly sensitive to outlier responses (Adams & Khoo, 1996; Smith et al., 1998; Wright & Masters, 1982).

> A weighted version of this statistic was developed to counteract its sensitivity to outliers (Smith, 2000). In its weighted form, the squared standardized residual is multiplied by the observed response variance and then divided by the sum of the item response variances. This is sometimes referred to as an **information weighted mean square** and has become known as **“infit”** (Smith et al., 1998; Wu, 1997).

A low item fit value (sometimes referred to as an item "overfitting" the Rasch model) indicates that responses are too predictable and provide little information. This is often the case for items that are very general/broad in scope in relation to the latent variable. You will often find overfitting items to also have residual correlations with other items.

A high item fit value (sometimes referred to as "underfitting" the Rasch model) can indicate several things, often multidimensionality or a question that is difficult to interpret and thus has noisy response data. The latter could for instance be caused by a question that asks about two things at the same time or is ambiguous for other reasons.

Outfit is not a useful method to detect item misfit, infit should be relied upon when sample sizes are appropriate [@johansson_detecting_2025].

::: {.callout-note}
Remember to **scroll back up and click on all tabs**.
:::

### Item-restscore

This is another useful function from the `iarm` package. It shows the expected and observed correlation between an item and a score based on the rest of the items [@kreiner_note_2011]. Similarly, but inverted, to item fit, a lower observed correlation value than expected indicates underfit, that the item may not belong to the dimension. A higher than expected observed value indicates an overfitting and possibly redundant item. Overfitting items will often also show issues with residual correlations. Both of these problems can often be (at least partially) resolved by removing underfitting items.

```{r}
RIrestscore(df500)
```

### Item-restscore bootstrap

Both item-restscore and conditional item infit/outfit will indicate "false misfit" when sample sizes are large (even when using simulation/bootstrap based cutoff values). This behavior can occur from about n = 500, and certainly will occur at samples of 800 and above [@johansson_detecting_2025]. This "false misfit" is caused by truly misfitting items, which underlines the importance of removing one item at a time when one finds issues with misfit/multidimensionality. However, a useful way to get additional information about the probability of actual misfit is to use non-parametric bootstrapping. This function resamples with replacement from your response data and reports the percentage and type of misfit indicated by the item-restscore function. You will also get information about conditional MSQ infit (based on the full sample, using complete responders). Simulation studies indicate that a sample size of 800 results in 95+% detection rate of 1-3 misfitting items amongst 20 dichotomous items [@johansson_detecting_2025].

```{r}
RIbootRestscore(df, iterations = 250, samplesize = 800)
```


### Conditional item characteristic curves

The [`iarm`](https://cran.r-project.org/web/packages/iarm/index.html) package [@mueller_iarm_2022] provides several interesting functions for assessing item fit, DIF and other things. Some of these functions have been implemented in simplified versions in `easyRasch`, and others may be included in a future version. Below are conditional item characteristic curves (ICC's) using the estimated theta (factor score).

These curves indicate item fit on a group level, where respondents are split into "class intervals" based on their sum score/factor score.

```{r}
RIciccPlot(df) +
  guide_area() # this relocates the legend/guide
```

A similar visualization, even more informative and flexible, has been made available in the [`RASCHplot`](https://github.com/ERRTG/RASCHplot/) package [@buchardt_visualizing_2023]. The package needs to be installed from GitHub (see commented code below). The linked paper is recommended reading, not least for descriptions of the useful options available. Below are some sample plots showing conditional ICC's using the raw sum score.

```{r}
library(RASCHplot) # devtools::install_github("ERRTG/RASCHplot")

CICCplot(PCM(df),
         which.item = c(1:4),
         lower.groups = c(0,7,14,21,28),
         grid.items = TRUE)
```

### PCA of residuals

Principal Component Analysis of Rasch model residuals (PCAR).

```{r}
RIpcmPCA(df)
```

Based on an old rule-of-thumb, the first eigenvalue should be below 1.5 to support unidimensionality [@smith_detecting_2002]. However, as with many other metrics the expected PCAR eigenvalue is also dependent on sample size and test length [@chou_checking_2010]. To be clear, there is no single or easily calculated rule-of-thumb critical value for the largest PCA eigenvalue that will be generally applicable. A parametric bootstrap function has been implemented in `easyRasch` to determine a potentially appropriate cutoff value for the largest PCAR eigenvalue, but it has not been systematically evaluated yet. Below is an example, illustrated with a histogram of the simulated distribution of largest eigenvalues, the 99th percentile and the max value.

```{r}
pcasim <- RIbootPCA(df, iterations = 500, cpu = 8)
hist(pcasim$results, breaks = 50)
pcasim$p99
pcasim$max
```

If the bootstrap turns out to provide an appropriate cutoff value, it still needs to be used together with checking item fit (or item-restscore) and residual correlations (local dependence) to evaluate unidimensionality.

The PCA eigenvalues are mostly included here for those coming from Winsteps who may be looking for this metric. Speaking of Winsteps, the "explained variance" generated by `RIpcmPCA()` will not be comparable to Winsteps corresponding metric, since this function only shows the results from the analysis of residuals and not the variance explained by the Rasch model itself. Additionally, it should be noted that the PCA rotation is set to "oblimin".

### Conditional LRT

The Conditional Likelihood Ratio Test [CLRT, @andersen_goodness_1973] is a global test of fit and can be a useful addition to more informative item-level metrics [@johansson_detecting_2025]. CLRT compares the responses of those scoring below the median to those scoring above the median to assess the homogeneity of the test. Below, we run the test using the smaller sample with a function from the `iarm` package. 

```{r}
clr_tests(df500, model = "PCM")
```

The test is clearly statistically significant, even with the small sample, which indicates that unidimensionality does not hold. However, CLRT is sensitive to large sample sizes and number of items, and produces false positives above the nominal 5% level. See <https://pgmj.github.io/clrt.html> for a very small simulation study. To minimize the sample size issue, a non-parametric bootstrap function has been implemented in `easyRasch`, where one can set the sample size manually. The function will then sample with replacement from the response data and conduct a separate CLRT for each sample drawn. 

```{r}
RIbootLRT(df, iterations = 1000, samplesize = 400, cpu = 8)
```

Clearly statistically significant with the bootstrap method as well.

Global tests do not provide any information about the reason for misfit. The `iarm` package has also implemented a function to get item specific information related to the CLRT, showing observed and expected values for each item.

```{r}
item_obsexp(PCM(df500))
```


### Residual correlations

In order to support unidimensionality, items should only be related to each other through the latent variable. This is called "local independence". By investigating patterns in model residuals, we can determine whether items are independent or not.

Similarly to item fit, we need to run simulations to get a useful cutoff threshold value for when residual correlations amongst item pairs are larger than would be expected from items that fit a unidimensional Rasch model [@christensen2017].

The simulation/bootstrap procedure can take some time to run, depending on the complexity of your data, but it is necessary to set the appropriate cutoff value.

```{r}
simcor1 <- RIgetResidCor(df, iterations = 400, cpu = 8)
RIresidcorr(df, cutoff = simcor1$p99)
```

The matrix above shows item-pair correlations of item residuals, with highlights in red showing correlations crossing the threshold compared to the average item-pair correlation (for all item-pairs) [@christensen2017]. Rasch model residual correlations (Yen's Q3) are calculated using the [mirt](https://cran.r-project.org/web/packages/mirt/index.html) package.

### Partial gamma LD

Another way to assess local (in)dependence is by partial gamma coefficients [@kreiner_analysis_2004]. This is also a function from the `iarm` package. See `?iarm::partgam_LD` for details.

```{r}
RIpartgamLD(df)
```

### 1st contrast loadings

```{r}
RIloadLoc(df)
```

Here we see item locations and their loadings on the first residual contrast. This figure can be helpful to identify clusters in data or multidimensionality. You can get a table with standardized factor loadings using `RIloadLoc(df, output = "table")`.


### Analysis of response categories

The `xlims` setting changes the x-axis limits for the plots. The default values usually make sense, and we mostly add this option to point out the possibility of doing so. You can also choose to only show plots for only specific items.

```{r}
#| layout-ncol: 2
RIitemCats(df, xlims = c(-5,5))
```

Each response category for each item should have a curve that indicates it to be the most probably response at some point on the latent variable (x axis in the figure).

### Response categories MIRT

For a more compact figure.

```{r}
mirt(df, model=1, itemtype='Rasch', verbose = FALSE) %>% 
  plot(type="trace", as.table = TRUE, 
       theta_lim = c(-5,5)) # changes x axis limits
```

### Targeting

```{r}
#| fig-height: 7
# increase fig-height in the chunk option above if you have many items
RItargeting(df, xlim = c(-5,4)) # xlim defaults to c(-4,4) if you omit this option
```

This figure shows how well the items fit the respondents/persons. It is a sort of [Wright Map](https://www.rasch.org/rmt/rmt253b.htm) that shows person locations and item threshold locations on the same logit scale.

The top part shows person location histogram, the middle part an inverted histogram of item threshold locations, and the bottom part shows individual item threshold locations. The histograms also show means and standard deviations.

### Item hierarchy

Here the items are sorted on their average threshold location (black diamonds). 84% confidence intervals are shown around each item threshold location. For further details, see the caption text below the figure.

The numbers displayed in the plot can be disabled using the option `numbers = FALSE`.

```{r}
#| fig-height: 6
RIitemHierarchy(df)
```
:::

### Analysis 1 comments

Item fit shows a lot of issues.

Item 18 has issues with the second lowest category being disordered. Several other items have very short distances between thresholds 1 and 2, which is also clearly seen in the Item Hierarchy figure above.

Two item-pairs show residual correlations far above the cutoff value:

-   15 and 16 (scared and afraid)
-   17 and 18 (ashamed and guilty)

Since item 15 also has a residual correlation with item 19, we will remove it. In the second pair, item 18 will be removed since it also has problems with disordered response categories.

::: {.callout-note icon="true"}
We have multiple "diagnostics" to review when deciding which item to remove if there are strong residual correlations between two items. Here is a list of commonly used criteria:

- item fit
- item threshold locations compared to sample locations (targeting)
- ordering of response categories
- DIF
- and whether there are residual correlations between one item and multiple other items
:::

```{r}
removed.items <- c("PANAS_15","PANAS_18")

df_backup <- df
df500_backup <- df500

df <- df_backup %>% 
  select(!any_of(removed.items))

df500 <- df500_backup %>% 
  select(!any_of(removed.items))
```

As seen in the code above, I chose to create a backup copy of the original dataframes, then removing the items. This can be useful if, at a later stage in the analysis, I want to be able to quickly "go back" and reinstate an item or undo any other change I have made.

## Rasch analysis 2

With items 15 and 18 removed.

```{r}
#| column: margin
#| echo: false
RIlistItemsMargin(df, fontsize = 13)
```

::: panel-tabset
### Item-restscore bootstrap

```{r}
RIbootRestscore(df, iterations = 250, samplesize = 800)
```

### Conditional item fit

Still using the smaller sample of n = 500.

```{r}
simfit2 <- RIgetfit(df500, iterations = 400, cpu = 8)
RIitemfit(df500, simcut = simfit2)
```

### PCA of residuals

```{r}
RIpcmPCA(df)
```

### Residual correlations

```{r}
simcor2 <- RIgetResidCor(df, iterations = 400, cpu = 8)
RIresidcorr(df, cutoff = simcor2$p99)
```

### 1st contrast loadings

```{r}
RIloadLoc(df)
```

### Targeting

```{r}
#| fig-height: 5
RItargeting(df, xlim = c(-4,4), bins = 45)
```

### Item hierarchy

```{r}
#| fig-height: 5
RIitemHierarchy(df)
```
:::

### Analysis 2 comments

Items 16 & 19, and 12 & 14 show problematic residual correlations.

Let's look at DIF before taking action upon this information. While we are keeping DIF as a separate section in this vignette, it is recommended to include DIF-analysis in the `panel-tabset` above (on item fit, PCA, residual correlations, etc).

## DIF - differential item functioning

We'll be looking at whether item (threshold) locations are stable between demographic subgroups.

There are several DIF analysis tools available. The first one uses the package `psychotree`, which relies on statistical significance at p < .05 as an indicator for DIF. This is a criterion that is highly sample size sensitive, and we are always interested in the size/magnitude of DIF as well, since that will inform us about the impact of DIF on the estimated latent variable. 

The structure of DIF is also an important and complex aspect, particularly for polytomous data. Uniform DIF means that the DIF is similar across the latent continuum. We can test this in R using the `lordif` package, as demonstrated in @sec-lordif. However, it should be noted that the `lordif` package does not provide an option to use Rasch models, and there may be results that are caused by also allowing the discrimination parameter to vary across items.

A recent preprint [@henninger_partial_2024] does a great job illustrating "differential step functioning" (DSF), which is when item threshold locations in polytomous data show varying levels of DIF. It also describes a forthcoming development of the `psychotree` where one can use DIF effect size and purification functions to evaluate DIF/DSF. When the updated package is available, I will work to implement these new functions into the `easyRasch` package as well.

::: {.callout-note icon="true"}
It is important to ensure that no cells in the data are empty for subgroups when conducting a DIF analysis. Split the data using the DIF-variable and create separate tileplots to review the response distribution in the DIF-groups.
:::

```{r}
#| fig-height: 5
difPlots <- df %>% # save the output into the `difPlots` object
  add_column(gender = dif$Sex) %>% # add the DIF variable to the dataframe
  split(.$gender) %>% # split the data using the DIF variable
  map(~ RItileplot(.x %>% dplyr::select(!gender)) + labs(title = .x$gender)) # create separate tileplots for each group

difPlots$Female + difPlots$Male # the actual name of the plots (in this case Male/Female) will be determined by the factor labels
```

### Sex

```{r}
#| column: margin
#| echo: false
RIlistItemsMargin(df, fontsize = 13)
```

::: panel-tabset
#### Table

```{r}
#| fig-height: 3
RIdifTable(df, dif$Sex)
```

#### Figure items

```{r}
RIdifFigure(df, dif$Sex)
```

#### Figure thresholds

```{r}
RIdifFigThresh(df, dif$Sex)
```
:::

While no item shows problematic levels of DIF regarding item location, as shown by the table, there is an interesting pattern in the thresholds figure. The lowest threshold seems to be slightly lower for node 3 (Male) for all items. Also, item 11 shows a much wider spread of item locations for node 3 compared to node 2.

The results do not require any action since the difference is small.

### Age

The `psychotree` package uses a model-based recursive partitioning that is particularly useful when you have a continuous variable such as age in years and a large enough sample. It will test different ways to partition the age variable to determine potential group differences [@strobl2015; @strobl2021].

```{r}
RIdifTable(df, dif$Age)
```

No DIF found for age.

### Group

```{r}
RIdifTable(df, dif$Group)
```

And no DIF for group.

### Sex and age

The `psychotree` package also allows for DIF interaction analysis with multiple DIF variables. We can use `RIdifTable2()` to input two DIF variables.

```{r}
RIdifTable2(df, dif$Sex, dif$Age)

```

No interaction effect found for sex and age. The analysis only shows the previously identified DIF for sex.

### LRT-based DIF {#sec-diflrt}

We'll use the group variable as an example. First, we can simply run the test to get the overall result.

```{r}
erm.out <- PCM(df)
LRtest(erm.out, splitcr = dif$Group)
```

Review the documentation for further details, using `?LRtest` in your R console panel in Rstudio. There is also a plotting function, `plotGOF()` that may be of interest.

```{r}
#| column: margin
#| echo: false
RIlistItemsMargin(df, fontsize = 13)
```

::: panel-tabset
#### Item location table
```{r}
RIdifTableLR(df, dif$Group)
```
#### Item location figure
```{r}
#| fig-height: 7
RIdifFigureLR(df, dif$Group) + theme_rise()
```
#### Item threshold table
```{r}
RIdifThreshTblLR(df, dif$Group)
```
#### Item threshold figure
```{r}
#| fig-height: 7
RIdifThreshFigLR(df, dif$Group) + theme_rise()
```
:::

The item threshold table shows that the top threshold for item 13 differs more than 0.5 logits between groups. In this set of 8 items with 4 thresholds each, it is unlikely to result in problematic differences in estimated person scores.

### Conditional ICC

The `iarm` package function for CICC plots also includes the option to use a DIF variable.

```{r}
RIciccPlot(df, dif = "yes", dif_var = dif$Sex)

```


### Logistic Ordinal Regression DIF {#sec-lordif}

The `lordif` package [@choi_lordif_2011] does not use a Rasch measurement model, it only offers a choice between the Graded Response Model (GRM) and the Generalized Partial Credit Model (GPCM). Both of these are 2PL models, meaning that they estimate a discrimination parameter for each item in addition to the item threshold parameters. `lordif` relies on the `mirt` package.

There are several nice features available in the `lordif` package. First, we get a χ2 test of uniform or non-uniform DIF. Second, there are three possible methods/criteria for flagging items with potential DIF. One of these uses a likelihood ratio (LR) χ2 test, while the other two are indicators of DIF size/magnitude, either using a pseudo R2 statistic ("McFadden", "Nagelkerke", or "CoxSnell") or a Beta criterion. For further details, see `?lordif` in your R console or the paper describing the package [@choi_lordif_2011].

Below is some sample code to get you started with `lordif`.

```{r}
#| results: hide
library(lordif)

g_dif <- lordif(as.data.frame(df), as.numeric(dif$Sex), # make sure that the data is in a dataframe-object and that the DIF variable is numeric
                criterion = c("Chisqr"), 
                alpha = 0.01, 
                beta.change = 0.1,
                model = "GPCM",
                R2.change = 0.02)

g_dif_sum <- summary(g_dif)
```

```{r}
# threshold values for colorizing the table below
alpha = 0.01
beta.change = 0.1
R2.change = 0.02

g_dif_sum$stats %>% 
  as.data.frame() %>% 
  select(!all_of(c("item","df12","df13","df23"))) %>% 
  round(3) %>% 
  add_column(itemnr = names(df), .before = "ncat") %>% 
  mutate(across(c(chi12,chi13,chi23), ~ cell_spec(.x,
                               color = case_when(
                                 .x < alpha ~ "red",
                                 TRUE ~ "black"
                               )))) %>%
  mutate(across(starts_with("pseudo"), ~ cell_spec(.x,
                               color = case_when(
                                 .x > R2.change ~ "red",
                                 TRUE ~ "black"
                               )))) %>%
  mutate(beta12 =  cell_spec(beta12,
                               color = case_when(
                                 beta12 > beta.change ~ "red",
                                 TRUE ~ "black"
                               ))) %>% 
  kbl_rise()
```

We can review the results regarding uniform/non-uniform DIF by looking at the `chi*` columns. Uniform DIF is indicated by column `chi12` and non-uniform DIF by `chi23`, while column `chi13` represents "an overall test of "total DIF effect" [@choi_lordif_2011].

While the table indicates significant chi2-tests for items 11 and 17, the magnitude estimates are low for these items.

There are some plots available as well, using the base R `plot()` function. For some reason the plots won't render in this Quarto document, so I will try to sort that out at some point.

```{r}
#| layout-ncol: 2
plot(g_dif) # use option `graphics.off()` to get the plots rendered one by one
#plot(g_dif, graphics.off())
```

### Partial gamma DIF

The `iarm` package provides a function to assess DIF by partial gamma [@bjorner_differential_1998]. It should be noted that this function only shows a single partial gamma value per item, so if you have more than two groups in your comparison, you will want to also use other methods to understand your results better.

There are some recommended cutoff-values mentioned in the paper above:

No or negligible DIF:

- Gamma within the interval -0.21 to 0.21, *or*
- Gamma not significantly different from 0

Slight to moderate DIF:

- Gamma within the interval -0.31 to 0.31 (and outside -0.21 to 0.21), *or*
- not significantly outside the interval -0.21 to 0.21

Moderate to large DIF:

- Gamma outside the interval -0.31 to 0.31, **and**
- significantly outside the interval -0.21 to 0.21

```{r}
RIpartgamDIF(df, dif$Sex)
```

We can see "slight" DIF for item 17, with a statistically significant gamma of .23. 

## Rasch analysis 3

While there were no significant issues with DIF for any item/subgroup combination, we need to address the previously identified problem:

- Items 16 and 19 have the largest residual correlation.

We'll remove item 19 since item 16 has better targeting.

```{r}
removed.items <- c(removed.items,"PANAS_19")

df <- df_backup %>% 
  select(!any_of(removed.items))

df500 <- df500_backup %>% 
  select(!any_of(removed.items))
```

```{r}
#| column: margin
#| echo: false
RIlistItemsMargin(df, fontsize = 13)
```

::: panel-tabset
### Item-restscore bootstrap

```{r}
RIbootRestscore(df, iterations = 250, samplesize = 800)
```

### Item fit

```{r}
simfit3 <- RIgetfit(df500, iterations = 200, cpu = 8)
RIitemfit(df500, simfit3)
```

### CICC

```{r}
RIciccPlot(df)

```

### Residual correlations

```{r}
simcor3 <- RIgetResidCor(df, iterations = 400, cpu = 8)
RIresidcorr(df, cutoff = simcor3$p99)
```

### LRT
```{r}
RIbootLRT(df, iterations = 1000, samplesize = 400, cpu = 8)
```

### Targeting

```{r}
#| fig-height: 5
RItargeting(df, bins = 45)
```

### Item hierarchy

```{r}
#| fig-height: 5
RIitemHierarchy(df)
```
:::

### Analysis 3 comments

There are still problematic residual correlations and some items show misfit (most notably item 12 being overfit), but we will end this sample analysis here and move on to show other functions.

There are several item thresholds that are very closely located, as shown in the item hierarchy figure. This is not ideal, since it will inflate reliability estimates. However, we will not modify the response categories for this analysis, we only note that this is not workable and should be dealt with by trying variations of merged response categories to achieve better separation of threshold locations without disordering.

## Reliability

```{r}
#| fig-height: 6
RItif(df)
```

The figure above shows the Test Information Function (TIF), which indicates the reliability of all items making up the test/scale (not the reliability of the sample).

The default cutoff value used in `RItif()` is TIF = 3.33, which roughly corresponds to person separation index (PSI) = 0.7 (although this is not always the case). PSI is similar to reliability coefficients such as omega and alpha, ranging from 0 to 1. You can change the TIF cutoff by using the option `cutoff`, for instance `cutoff = 2.5` (TIF values range from 1 and up).

While 11.8% of respondents had a floor effect based on the raw sum scored data, the figure above shows us that 41.8% are located below the point where the items produce a PSI of 0.7 or higher. Again, note that this figure shows the reliability of the test/scale, not the sample. If you want to add the sample reliability use option `samplePSI = TRUE`. More details are available in the documentation `?RItif`.

## Person fit

We can also look at how the respondents fit the Rasch model with these items. By default, `RIpfit()` outputs a histogram and a hex heatmap with the person infit ZSTD statistic, using +/- 1.96 as cutoff values. This is currently the only person fit method implemented in the `easyRasch` package, and the curious analyst is suggested to look at the package [PerFit](https://www.rdocumentation.org/packages/PerFit/versions/1.4.6/topics/PerFit-package) for more tools.

```{r}
RIpfit(df)
```

You can export the person fit values to a new variable in the dataframe by specifying `output = "dataframe"`, or if you just want the row numbers for respondents with deviant infit values, `output = "rowid"`.

You can also specify a grouping variable to visualize the person fit for different groups.

```{r}
RIpfit(df, group = dif$Sex, output = "heatmap")
```

Person fit is a useful way to identify respondents with unexpected response patterns and investigate this further.

### `PerFit` sample code

While none of the functions in the `PerFit` package has been implemented in `easyRasch`, this is some code to get you started if you are interested in using it. There are multiple methods/functions available for polytomous and dichotomous data, see the package [documentation](https://www.rdocumentation.org/packages/PerFit/versions/1.4.6/topics/PerFit-package).

For this example, we'll use the non-parametric U3 statistic generalized to polytomous items [@emons_nonparametric_2008] and the smaller sample.

::: panel-tabset
#### U3poly
```{r}
library(PerFit)
pfit_u3poly <- U3poly(matrix = df500, 
                      Ncat = 5, # make sure to input number of response categories, not thresholds
                      IRT.PModel = "PCM")
```
#### Cutoff information
```{r}
cutoff(pfit_u3poly)
```
#### Flagged respondents
```{r}
flagged.resp(pfit_u3poly) %>% 
  pluck("Scores") %>% 
  as.data.frame() %>% 
  arrange(desc(PFscores))
```
:::

The dataframe shown under the tab `Flagged respondents` above contains a variable named `FlaggedID` which represents the row id's. This variable is useful if one wants to filter out respondents with deviant response patterns (person misfit). There are indications that persons with misfit may affect results of Andersen's LR-test for DIF [@artner_simulation_2016].

### Item fit without aberrant responses

We can remove the misfitting persons to see how that affects item fit. Let's also compare with the misfitting respondents identified by `RIpfit()`, also using the smaller sample.

```{r}
misfits <- flagged.resp(pfit_u3poly) %>% 
  pluck("Scores") %>% 
  as.data.frame() %>% 
  pull(FlaggedID)

misfits2 <- RIpfit(df500, output = "rowid")
```

::: panel-tabset
#### All respondents
```{r}
RIitemfit(df500, simcut = simfit3)
```
#### U3 misfit removed
```{r}
RIitemfit(df500[-misfits,], simcut = simfit3)
```
#### ZSTD misfit removed
```{r}
RIitemfit(df500[-misfits2,], simcut = simfit3)
```
:::

## Item parameters

To allow others (and oneself) to use the item parameters estimated for estimation of person locations/thetas, we should make the item parameters available. The function will also write a csv-file with the item threshold locations. Estimations of person locations/thetas can be done with the `thetaEst()` function from the `catR` package. This is implemented in the function `RIestThetasOLD()`, see below for details.

First, we'll output the parameters into a table.

```{r}
RIitemparams(df)
```

The parameters can also be output to a dataframe or a file, using the option `output = "dataframe"` or `output = "file"`.

## Ordinal sum score to interval score

This table shows the corresponding "raw" ordinal sum score values and logit scores, with standard errors for each logit value. Interval scores are estimated using WL based on a simulated dataset using the item parameters estimated from the input dataset. The choice of WL as default is due to the lower bias compared to ML estimation [@warm1989].

(An option will hopefully be added at some point to create this table based on only item parameters.)

```{r}
RIscoreSE(df)
```

### Ordinal/interval figure

The figure below can also be generated to illustrate the relationship between ordinal sum score and logit interval score. The errorbars default to show the standard error at each point, multiplied by 1.96.

```{r}
RIscoreSE(df, output = "figure")
```

### Estimating interval level person scores

Based on the Rasch analysis output of item parameters, we can estimate each individuals location or score (also known as "theta"). `RIestThetas()` by default uses WLE estimation based on item parameters from a partial credit model (PCM) and outputs a dataframe with person locations (WLE) and measurement error (SEM) on the logit scale.

```{r}
thetas <- RIestThetas(df)

head(thetas)
```

Each individual has a standard error of measurement (SEM) associated with their estimated location/score. This is included in the output of the `RIestThetas()` function as the `SEM` variable, as seen above. We can review the distribution of measurement error with a figure.

We can take a look at the distribution of person locations (thetas) using a histogram.

```{r}
hist(thetas$WLE, 
     col = "#009ca6", 
     main = "Histogram of person locations (thetas)", 
     breaks = 20)
```


`RIestThetasOLD()` can be used with a pre-specified item (threshold) location matrix. The choice of WL as default is due to the lower bias compared to ML estimation [@warm1989]. Similarly to `RIscoreSE()` you can (and may indeed need to) change the range of logit scores, using the option `theta_range`. The default is `c(-7,7)`, which should hopefully work in most circumstances.

If you would like to use an existing item threshold location matrix, this code may be helpful:

```{r}
itemParameters <- read_csv("itemParameters.csv") %>% 
  as.matrix()
itemParameters
```

As you can see, this is a matrix object (not a dataframe), with each item as a row, and the threshold locations as columns.

## Figure design

Most of the figures created by the functions can be styled (colors, fonts, etc) by adding theme settings to them. You can use the standard ggplot function `theme()` and related theme-functions. As usual it is possible to "stack" theme functions, as seen in the example below.

You can also change coloring, axis limits/breaks, etc, just by adding ggplot options with a `+` sign.

A custom theme function, `theme_rise()`, is included in the `easyRasch` package. It might be easier to use if you are not familiar with `theme()`.

For instance, you might like to change the font to "Lato" for the item hierarchy figure, and make the background transparent.

```{r}
RIitemHierarchy(df) +
  theme_minimal() + # first apply the minimal theme to make the background transparent
  theme_rise(fontfamily = "Lato") # then apply theme_rise, which simplifies making changes to all plot elements
```

As of package version 0.1.30.0, the `RItargeting()` function allows more flexibility in styling too, by having an option to return a list object with the three separate plots. See the [NEWS](https://github.com/pgmj/easyRasch/blob/main/NEWS.md#01300) file for more details. Since the `RItargeting()` function uses the `patchwork` library to combine plots, you can also make use of [the many functions that `patchwork` includes](https://patchwork.data-imaginist.com/articles/patchwork.html). For instance, you can set a title with a specific theme:

``` {r}
RItargeting(df) + plot_annotation(title = "Targeting", theme = theme_rise(fontfamily = "Arial"))
```

In order to change font for text *inside* plots (such as "t1" for thresholds) you will need to add an additional line of code.

``` r
update_geom_defaults("text", list(family = "Lato"))
```

Please note that the line of code above updates the default settings for `geom_text()` for the whole session. Also, some functions, such as `RIloadLoc()`, make use of `geom_text_repel()`, for which you would need to change the code above from "text" to "text_repel".

A simple way to only change font family and font size would be to use `theme_minimal(base_family = "Calibri", base_size = 14)`. Please see the [reference page](https://ggplot2.tidyverse.org/reference/ggtheme.html) for default ggplot themes for alternatives to `theme_minimal()`.

## Software used {#sec-grateful}

The `grateful` package is a nice way to give credit to the packages used in making the analysis. The package can create both a bibliography file and a table object, which is handy for automatically creating a reference list based on the packages used (or at least explicitly loaded).

```{r}
library(grateful)
pkgs <- cite_packages(cite.tidyverse = TRUE, 
                      output = "table",
                      bib.file = "grateful-refs.bib",
                      include.RStudio = TRUE,
                      out.dir = getwd())
# If kbl() is used to generate this table, the references will not be added to the Reference list.
formattable(pkgs, 
            table.attr = 'class=\"table table-striped\" style="font-size: 13px; font-family: Lato; width: 80%"')
```

## Additional credits

Thanks to my [colleagues at RISE](https://www.ri.se/en/what-we-do/projects/center-for-category-based-measurements) for providing feedback and testing the package on Windows and MacOS platforms. Also, thanks to [Mike Linacre](https://www.winsteps.com/linacre.htm) and [Jeanette Melin](https://www.ri.se/en/person/jeanette-melin) for providing useful feedback to improve this vignette.

## Session info

```{r}
sessionInfo()
```


## References