easyRasch vignette

Magnus Johansson

easyRasch vignette

R package for Rasch analysis

Author

Affiliation

Magnus Johansson

RISE Research Institutes of Sweden

Published

October 24, 2025

This is an introduction to using the easyRasch R package. A changelog for package updates is available here.

Note

This packages was previously known as RISEkbmRasch.

Details on package installation are available at the package GitHub page.

If you are new to Rasch Measurement Theory, you may find this intro presentation useful: https://pgmj.github.io/RaschIRTlecture/slides.html

This vignette will walk through a sample analysis using an open dataset with polytomous questionnaire data. This will include some data wrangling to structure the item data and itemlabels, then provide examples of the different functions. The full source code of this document can be found either in this repository or by clicking on </> CODE at the top of this page. You should be able to use the source code “as is” and reproduce this document locally, as long as you have the required packages installed. This page and this website are built using the open source publishing tool Quarto.

One of the aims with this package is to simplify reproducible psychometric analysis to shed light on the measurement properties of a scale, questionnaire or test. In a paper recently made available as a preprint (Johansson et al., 2023), our research group propose that the basic aspects of a psychometric analysis should include information about:

Unidimensionality
Response categories
Invariance
Targeting
Measurement uncertainties (reliability)

We’ll include several ways to investigate these measurement properties, using Rasch Measurement Theory. There are also functions in the package less directly related to the criteria above, that will be demonstrated in this vignette.

Please note that this is just a sample analysis to showcase the R package. It is not intended as a “best practice” psychometric analysis example.

You can skip ahead to the Rasch analysis part in Section 3 if you are eager to look at the package output :)

There is a separate GitHub repository containing a template R-project to simplify using the easyRasch package when conducting a reproducible Rasch analysis in R: https://github.com/pgmj/RISEraschTemplate

1 Getting started

Since the package is intended for use with Quarto, this vignette has also been created with Quarto. A “template” .qmd file is available that can be useful to have handy for copy&paste when running a new analysis. You can also download a complete copy of the Quarto/R code to produce this document here.

Loading the easyRasch package will automatically load all the packages it depends on. However, it could be desirable to explicitly load all packages used, to simplify the automatic creation of citations for them, using the grateful package (see Section 12).

Code

library(easyRasch) # devtools::install_github("pgmj/easyRasch", dependencies = TRUE)
library(grateful)
library(ggrepel)
library(car)
library(kableExtra)
library(readxl)
library(tidyverse)
library(eRm)
library(iarm)
library(mirt)
library(psych)
library(ggplot2)
library(psychotree)
library(matrixStats)
library(reshape)
library(knitr)
library(patchwork)
library(formattable) 
library(glue)
library(foreach)

### some commands exist in multiple packages, here we define preferred ones that are frequently used
select <- dplyr::select
count <- dplyr::count
recode <- car::recode
rename <- dplyr::rename

Note

Quarto automatically adds links to R packages and functions throughout this document. However, this feature only works properly for packages available on CRAN. Since the easyRasch package is not on CRAN the links related to functions starting with RI will not work.

1.1 Loading data

We will use data from a recent paper investigating the “initial elevation effect” (Anvari et al., 2022), and focus on the 10 negative items from the PANAS. The data is available at the OSF website.

Code

df.all <- read_csv("https://osf.io/download/6fbr5/")
# if you have issues with the link, please try downloading manually using the same URL as above
# and read the file from your local drive.

# subset items and demographic variables
df <- df.all %>% 
  select(starts_with("PANASD2_1"),
         starts_with("PANASD2_20"),
         age,Sex,Group) %>% 
  select(!PANASD2_10_Active) %>% 
  select(!PANASD2_1_Attentive)

The glimpse() function provides a quick overview of our dataframe.

Code

glimpse(df)

Rows: 1,856
Columns: 13
$ PANASD2_11_Distressed <dbl> 2, 2, 2, 1, 2, 2, 4, 1, 1, 3, 1, 4, 2, 4, 4, 1, …
$ PANASD2_12_Upset      <dbl> 1, 1, 4, 1, 1, 5, 2, 1, 2, 2, 2, 3, 1, 3, 5, 1, …
$ PANASD2_13_Hostile    <dbl> 1, 1, 2, 1, 1, 3, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, …
$ PANASD2_14_Irritable  <dbl> 1, 1, 3, 1, 2, 5, 3, 1, 2, 4, 2, 3, 1, 2, 3, 1, …
$ PANASD2_15_Scared     <dbl> 1, 1, 3, 1, 1, 4, 1, 1, 1, 2, 2, 2, 1, 4, 4, 1, …
$ PANASD2_16_Afraid     <dbl> 1, 1, 4, 1, 1, 3, 1, 1, 1, 3, 1, 2, 1, 4, 4, 1, …
$ PANASD2_17_Ashamed    <dbl> 1, 1, 2, 1, 1, 3, 1, 1, 1, 2, 1, 4, 1, 1, 3, 1, …
$ PANASD2_18_Guilty     <dbl> 2, 1, 2, 1, 1, 3, 3, 1, 1, 3, 1, 4, 1, 1, 3, 1, …
$ PANASD2_19_Nervous    <dbl> 1, 1, 2, 1, 2, 4, 4, 1, 1, 4, 2, 4, 2, 1, 5, 1, …
$ PANASD2_20_Jittery    <dbl> 1, 2, 3, 1, 1, 2, 3, 3, 2, 1, 2, 2, 1, 1, 4, 1, …
$ age                   <dbl> 27, 32, 21, 27, 20, 22, 23, 25, 21, 26, 38, 36, …
$ Sex                   <chr> "Male", "Male", "Female", "Male", "Male", "Male"…
$ Group                 <chr> "Later Start", "Later Start", "Later Start", "La…

We have 1856 rows, ie. respondents. All variables except Sex and Group are of class dbl, which means they are numeric and can have decimals. Integer (numeric with no decimals) would also be fine for our purposes. The two demographic variables currently of class chr (character) will need to be converted to factors (fct), and we will do that later on.

(If you import a dataset where item variables are of class character, you will need to recode to numeric.)

1.2 Itemlabels

Then we set up the itemlabels dataframe. This could also be done using the free LibreOffice Calc or MS Excel. Just make sure the file has the same structure, with two variables named itemnr and item that contain the item variable names and item description. The item variable names have to match the variable names in the item dataframe.

Code

itemlabels <- df %>% 
  select(starts_with("PAN")) %>% 
  names() %>% 
  as_tibble() %>% 
  separate(value, c(NA, "item"), sep ="_[0-9][0-9]_") %>% 
  mutate(itemnr = paste0("PANAS_",c(11:20)), .before = "item")

The itemlabels dataframe looks like this.

Code

itemlabels

# A tibble: 10 × 2
   itemnr   item      
   <chr>    <chr>     
 1 PANAS_11 Distressed
 2 PANAS_12 Upset     
 3 PANAS_13 Hostile   
 4 PANAS_14 Irritable 
 5 PANAS_15 Scared    
 6 PANAS_16 Afraid    
 7 PANAS_17 Ashamed   
 8 PANAS_18 Guilty    
 9 PANAS_19 Nervous   
10 PANAS_20 Jittery

1.3 Demographics

Variables for invariance tests such as Differential Item Functioning (DIF) need to be separated into vectors (ideally as factors with specified levels and labels) with the same length as the number of rows in the dataset. This means that any kind of removal of respondents/rows with missing data needs to be done before separating the DIF variables.

First, we’ll have a look at the class of our variables.

Code

df %>% 
  select(Sex,age,Group) %>% 
  glimpse()

Rows: 1,856
Columns: 3
$ Sex   <chr> "Male", "Male", "Female", "Male", "Male", "Male", "Female", "Fem…
$ age   <dbl> 27, 32, 21, 27, 20, 22, 23, 25, 21, 26, 38, 36, 24, 21, 19, 26, …
$ Group <chr> "Later Start", "Later Start", "Later Start", "Later Start", "Ear…

Sex and Group are class character, while age is numeric. Below are the response categories represented in data.

Code

table(df$Sex)


  CONSENT REVOKED      DATA EXPIRED            Female              Male 
                2                 1               896               955 
Prefer not to say 
                2

Since there are only 5 respondents using labels that are not Female or Male (too few for meaningful statistical analysis), we will remove them to have a complete dataset for all variables in this example.

Code

df <- df %>% 
  filter(Sex %in% c("Female","Male")) # filtering to only include those with Sex being either Female or Male.

Let’s make the variable a factor (instead of class “character”), since that will be helpful in our later analyses.

Code

df$Sex <- factor(df$Sex)

Sometimes age is provided in categories, but here we have a numeric variable with age in years. Let’s have a quick look at the age distribution using a histogram, and get some summary statistics.

Code

summary(df$age)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  18.00   21.00   24.00   26.74   30.00   71.00

Code

### simpler version of the ggplot below using base R function hist()
# hist(df$age, col = "#009ca6")
# abline(v = median(age, na.rm = TRUE))
# 
# df %>% 
#   summarise(Mean = round(mean(age, na.rm = T),1),
#             StDev = round(sd(age, na.rm = T),1)
#             )

ggplot(df) +
  geom_histogram(aes(x = age), 
                 fill = "#009ca6",
                 col = "black") +
  # add the average as a vertical line
  geom_vline(xintercept = median(df$age), 
             linewidth = 1.5,
             linetype = 2,
             col = "orange") +
  # add a light grey field indicating the standard deviation
  annotate("rect", ymin = 0, ymax = Inf, 
           xmin = summary(df$age)[2], 
           xmax = summary(df$age)[5], 
           alpha = .2) +
  labs(title = "",
       x = "Age in years",
       y = "Number of respondents",
       caption = str_wrap(glue("Note. Median age is {round(median(df$age, na.rm = T),1)}, shown with the dashed vertical line. Age range is {min(df$age)} to {max(df$age)}. Interquartile range ({summary(df$age)[2]} to {summary(df$age)[5]}) is indicated with the grey area.")
       )) +
  scale_x_continuous(limits = c(18,75)) +
  theme_bw() +
  theme(plot.caption = element_text(hjust = 0, face = "italic"))

Since the distribution of the age variable is clearly not following a gaussian (normal) distribution, the median seems like a better option than the mean.

There is also a grouping variable which needs to be converted to a factor.

Code

df$Group <- factor(df$Group)

Now our demographic data has been checked, and we need to move it to a separate dataframe from the item data.

Code

dif <- df %>% 
  select(Sex,age,Group) %>% 
  rename(Age = age) # just for consistency

df <- df %>% 
  select(!c(Sex,age,Group))

To create a “Table 1” containing participant demographic information, the tbl_summary() function from the gtsummary package is really handy to get a nice table.

Code

gtsummary::tbl_summary(dif)

Characteristic	N = 1,851¹
Sex
Female	896 (48%)
Male	955 (52%)
Age	24 (21, 30)
Group
Earlier Start	901 (49%)
Later Start	950 (51%)
¹ n (%); Median (Q1, Q3)

With only item data remaining in the df dataframe, we can easily rename the items in the item dataframe and make sure that they match the itemlabels variable itemnr. It will be useful to have short item labels to get less clutter in the tables and figures during analysis.

Code

names(df) <- itemlabels$itemnr

Now we are all set for the psychometric analysis!

2 Descriptives

Let’s familiarize ourselves with the data before diving into the analysis.

2.1 Missing data

First, we visualize the proportion of missing data on item level.

Code

RImissing(df)

[1] "No missing data."

No missing data in this dataset. If we had missing data, we could also use RImissingP() to look at which respondents have missing data and how much. Just to show what the output of these functions looks like, we’ll use mice::ampute() to temporarily remove 15% of data.

Code

RImissing(mice::ampute(df, prop = 0.1)$amp)

Code

RImissingP(mice::ampute(df, prop = 0.1)$amp, N = 20)

2.2 Overall responses

This provides us with an overall picture of the data distribution. As a bonus, any oddities/mistakes in recoding the item data from categories to numbers will be clearly visible.

Code

RIallresp(df)

Response category	Number of responses	Percent
1	9430	50.9
2	4136	22.3
3	2676	14.5
4	1722	9.3
5	546	2.9

Most R packages for Rasch analysis require the lowest response category to be zero, which makes it necessary for us to recode our data, from using the range of 1-5 to 0-4.

Code

df <- df %>% 
  mutate(across(everything(), ~ .x - 1))

# always check that your recoding worked as intended.
RIallresp(df)

Response category	Number of responses	Percent
0	9430	50.9
1	4136	22.3
2	2676	14.5
3	1722	9.3
4	546	2.9

2.2.1 Floor/ceiling effects

We can also look at the raw distribution of ordinal sum scores. The RIrawdist() function is a bit crude, since it requires responses in all response categories to accurately calculate max and min scores.

Code

RIrawdist(df)

There is a floor effect with 11.8% of participants responding in the lowest category for all items.

2.2.2 Guttman structure

While not really necessary, it can be interesting to see whether the response patterns follow a Guttman-like structure. Items and persons are sorted based on lower->higher responses, and we should see the color move from yellow in the lower left corner to blue in the upper right corner.

Code

RIheatmap(df) +
  theme(axis.text.x = element_blank())

In this data, we see the floor effect on the left, with 11.8% of respondents all yellow, and a rather weak Guttman structure. This could also be due to a low variation in item locations/difficulties. Since we have a very large sample I added a theme() option to remove the x-axis text, which would anyway just be a blur of the 1851 respondent row numbers. Each thin vertical slice in the figure is one respondent.

2.3 Item level descriptives

There are many ways to look at the item level data, and we’ll get them all together in the tab-panel below. The RItileplot() is probably most informative, since it provides the number of responses in each response category for each item. It is usually recommended to have at least ~10 responses in each category for psychometric analysis, no matter which methodology is used.

Kudos to Solomon Kurz for providing the idea and code on which the tile plot function is built!

Most people will be familiar with the barplot, and this is probably most intuitive to understand the response distribution within each item. However, if there are many items it will take a while to review, and does not provide the same overview as a tileplot or stacked bars.

Code

```{r}
#| column: margin
#| code-fold: true

# This code chunk creates a small table in the margin beside the panel-tabset output below, showing all items currently in the df dataframe.
# The Quarto code chunk option "#| column: margin" is necessary for the layout to work as intended.
RIlistItemsMargin(df, fontsize = 13)
```

itemnr	item
PANAS_11	Distressed
PANAS_12	Upset
PANAS_13	Hostile
PANAS_14	Irritable
PANAS_15	Scared
PANAS_16	Afraid
PANAS_17	Ashamed
PANAS_18	Guilty
PANAS_19	Nervous
PANAS_20	Jittery

Code

RItileplot(df)

While response patterns are skewed for all items, there are more than 10 responses in each category for all items which is helpful for the analysis.

Code

RIbarstack(df) +
  theme_minimal() + # theming is optional, see section 11 for more on this
  theme_rise()

Code

RIitemcols(df)

3 Rasch analysis 1

The eRm package and Conditional Maximum Likelihood (CML) estimation will be used primarily, with the Partial Credit Model since this is polytomous data.

This is also where the five basic psychometric aspects are good to recall.

Unidimensionality & local independence
Ordering of response categories
Invariance
Targeting
Measurement uncertainties (reliability)

We will begin by looking at unidimensionality, response categories, and targeting in parallel below. For unidimensionality, we are mostly interested in item fit and residual correlations, as well as PCA of residuals and loadings on the first residual contrast. At the same time, disordered response categories can influence item fit to some extent (and vice versa), and knowledge about targeting can be useful if it is necessary to remove items due to residual correlations.

When unidimensionality and response categories are found to work adequately, we will move on to invariance testing (Differential Item Functioning, DIF). It should be noted that DIF should be evaluated in parallel with all other psychometric aspects, but since it is a more complex issue it is kept in a separate section in this vignette (as is person fit). Finally, when/if invariance/DIF also looks acceptable, we can investigate reliability/measurement uncertainties.

Since our sample is quite large (n > 800), the sample size sensitive tests (item fit and item-restscore) that are likely to produce problematic false positive rates (Johansson, 2025a) will use a randomly selected smaller sample of n = 500 as examples. When analyzing large datasets, item-restscore bootstrap is recommended as the primary method of assessing item fit to the Rasch model. The use of one smaller subsample in this vignette is only for demonstration purposes, it is not a recommended method of analysis for large datasets.

Code

df500 <- df %>% 
  slice_sample(n = 500)

Note

In the tabset-panel below, each tab contains explanatory text, which is sometimes a bit lengthy. Remember to scroll back up and click on all tabs.

itemnr	item
PANAS_11	Distressed
PANAS_12	Upset
PANAS_13	Hostile
PANAS_14	Irritable
PANAS_15	Scared
PANAS_16	Afraid
PANAS_17	Ashamed
PANAS_18	Guilty
PANAS_19	Nervous
PANAS_20	Jittery

Code

simfit1 <- RIgetfit(df500, iterations = 400, cpu = 8) # save simulation output to object `simfit1`
RIitemfit(df500, simfit1)

Item	InfitMSQ	Infit thresholds	Infit diff	Relative location
PANAS_11	1.081	[0.877, 1.149]	no misfit	1.28
PANAS_12	0.808	[0.844, 1.158]	0.036	1.46
PANAS_13	1.139	[0.823, 1.229]	no misfit	1.93
PANAS_14	1.096	[0.853, 1.176]	no misfit	1.22
PANAS_15	0.768	[0.828, 1.205]	0.06	1.63
PANAS_16	0.843	[0.817, 1.156]	no misfit	1.45
PANAS_17	1.008	[0.816, 1.209]	no misfit	1.94
PANAS_18	1.183	[0.806, 1.208]	no misfit	1.85
PANAS_19	0.934	[0.849, 1.199]	no misfit	1.03
PANAS_20	1.152	[0.862, 1.171]	no misfit	1.54
Note:
MSQ values based on conditional calculations (n = 500 complete cases). Simulation based thresholds from 400 simulated datasets.

RIitemfit() and RIgetfit() both work with both dichotomous and polytomous data (using the partial credit model) and automatically selects the model based on the data structure.

It is important to note that the RIitemfit() function uses conditional infit, which is both robust to different sample sizes and makes ZSTD unnecessary (Müller, 2020). Müller also questions the usefulness of outfit, and my simulation study (Johansson, 2025a) concluded the same. Thus, outfit is not reported any more.

Since the distribution of item fit statistics are not known, we need to use simulation to determine appropriate cutoff threshold values for the current sample and items. RIitemfit() can also use the simulation based cutoff values and use them for conditional highlighting of misfitting items. See the blog post on simulation based cutoffs for some more details on this.

Briefly stated, the simulation uses the estimated properties of the current sample and items (your dataset), and simulates n iterations of data that fit the Rasch model to get an empirical distribution of item fit that we can use for comparison with the observed data. This is also known as “parametric bootstrapping”.

The simulations can take a bit of time to run if you have complex data/many items/many participants, and/or choose to use many iterations. Simulation experiments (Johansson, 2025a) indicate that 100-400 iterations should be a useful range, where smaller samples (n < 300) should use 100 iterations, and 200-400 is more appropriate when one has larger samples. To improve speed, make good use of your CPU cores.

Note

Another important finding from simulation studies is that there is a risk of false positive indication of misfit in samples larger than n = 800 or so when using item infit or item-restscore. The recommended primary method for determining item fit in large samples is the bootstrapped item-restscore (Johansson, 2025a), as illustrated in an adjacent tab labeled “Item-restscore bootstrap” .

For reference, the simulation above, using 10 items with 5 response categories each and 1851 respondents, takes about 24 seconds to run on 8 cpu cores (Macbook Pro M1 Max) for 400 iterations.

I’ll cite Ostini & Nering (2006) on the description of outfit and infit (pages 86-87):

Response residuals can be summed over respondents to obtain an item fit measure. Generally, the accumulation is done with squared standardized residuals, which are then divided by the total number of respondents to obtain a mean square statistic. In this form, the statistic is referred to as an unweighted mean square (Masters & Wright, 1997; Wright & Masters, 1982) and has also come to be known as “outfit” (Smith, Schumacker, & Bush, 1998; Wu, 1997) […]

A weighted version of this statistic was developed to counteract its sensitivity to outliers (Smith, 2000). In its weighted form, the squared standardized residual is multiplied by the observed response variance and then divided by the sum of the item response variances. This is sometimes referred to as an information weighted mean square and has become known as “infit” (Smith et al., 1998; Wu, 1997).

A low item fit value (sometimes referred to as an item “overfitting” the Rasch model) indicates that responses may be too predictable. This is often the case for items that are very general/broad in scope in relation to the latent variable, for instance asking about feeling depressed in a depression questionnaire. You will often find overfitting items to also have residual correlations (local dependencies) with other items. Overfit may be likened to having a much stronger factor loading than other items in a confirmatory factor analysis.

A high item fit value (sometimes referred to as “underfitting” the Rasch model) can indicate several things, often multidimensionality or a question that is difficult to interpret and thus has noisy response data. The latter could for instance be caused by a question that asks about two things at the same time or is ambiguous for other reasons.

Outfit has been found to be a not very useful method to detect item misfit, infit should be relied upon when sample sizes are appropriate (Johansson, 2025a). Outfit was removed from the output in version 0.3.9.

Note

Remember to scroll back up and click on all tabs.

This is another useful function from the iarm package. It shows the expected and observed correlation between an item and a score based on the rest of the items (Kreiner, 2011). Similarly, but inverted, to item fit, a lower observed correlation value than expected indicates underfit, that the item may not belong to the dimension. A higher than expected observed value indicates an overfitting and possibly redundant item. Overfitting items will often also show issues with residual correlations. Both of these problems can often be (at least partially) resolved by removing underfitting items.

Code

RIrestscore(df500)

Item	Observed value	Model expected value	Absolute difference	Adjusted p-value (BH)	Statistical significance level	Location	Relative location
PANAS_11	0.63	0.65	0.02	0.593		-0.25	1.28
PANAS_12	0.72	0.65	0.07	0.029	*	-0.07	1.46
PANAS_13	0.60	0.64	0.04	0.472		0.39	1.93
PANAS_14	0.60	0.65	0.05	0.160		-0.31	1.22
PANAS_15	0.77	0.66	0.11	0.000	***	0.09	1.63
PANAS_16	0.72	0.66	0.06	0.077	.	-0.08	1.45
PANAS_17	0.65	0.64	0.01	0.850		0.41	1.94
PANAS_18	0.61	0.64	0.03	0.472		0.31	1.85
PANAS_19	0.68	0.66	0.02	0.543		-0.50	1.03
PANAS_20	0.59	0.64	0.05	0.189		0.00	1.54

Both item-restscore and conditional item infit/outfit will indicate “false misfit” when sample sizes are large (even when using simulation/bootstrap based cutoff values). This behavior can occur from about n = 500, and certainly will occur at samples of 800 and above (Johansson, 2025a). This “false misfit” is caused by truly misfitting items, which underlines the importance of removing one item at a time when one finds issues with misfit/multidimensionality. However, a useful way to get additional information about the probability of actual misfit is to use non-parametric bootstrapping. This function resamples with replacement from your response data and reports the percentage and type of misfit indicated by the item-restscore function. You will also get information about conditional MSQ infit (based on the full sample, using complete responders). Simulation studies indicate that a sample size of 800 results in 95+% detection rate of 1-3 misfitting items amongst 20 dichotomous items (Johansson, 2025a).

Code

RIbootRestscore(df, iterations = 250, samplesize = 800)

Item	Item-restscore result	% of iterations	Conditional MSQ infit	Relative average item location
PANAS_20	underfit	74.8	1.20	1.42
PANAS_11	underfit	52.8	1.20	1.14
PANAS_13	underfit	26.0	1.17	1.91
PANAS_14	underfit	11.2	1.06	1.15
PANAS_15	overfit	99.6	0.78	1.63
PANAS_16	overfit	99.2	0.80	1.48
PANAS_12	overfit	92.4	0.84	1.42
PANAS_17	overfit	44.8	0.95	1.83
PANAS_19	overfit	14.4	0.92	0.98
Note:
Results based on 250 bootstrap iterations with n = 800 and 10 items. Conditional mean-square infit based on complete responders only (n = 1851).

The iarm package (Mueller & Santiago, 2022) provides several interesting functions for assessing item fit, DIF and other things. Some of these functions have been implemented in simplified versions in easyRasch, and others may be included in a future version. Below are conditional item characteristic curves (ICC’s) using the estimated theta (factor score).

These curves indicate item fit on a group level, where respondents are split into “class intervals” based on their sum score/factor score.

Code

RIciccPlot(df) +
  guide_area() # this relocates the legend/guide, which may produce a nicer output

A similar visualization, even more informative and flexible, has been made available in the RASCHplot package (Buchardt et al., 2023). The package needs to be installed from GitHub (see commented code below). The linked paper is recommended reading, not least for descriptions of the useful options available. Below are some sample plots showing conditional ICC’s using the raw sum score.

Code

library(RASCHplot) # devtools::install_github("ERRTG/RASCHplot")

CICCplot(PCM(df),
         which.item = c(1:4),
         lower.groups = c(0,7,14,21,28),
         grid.items = TRUE)

Principal Component Analysis of Rasch model residuals (PCAR).

Code

RIpcmPCA(df)

Eigenvalues	Proportion of variance
1.79	16.9%
1.47	15.1%
1.28	13.6%
1.14	13.3%
1.06	11.9%

Based on an old rule-of-thumb, the first eigenvalue should be below 1.5 to support unidimensionality (Smith, 2002). However, as with many other metrics the expected PCAR eigenvalue is also dependent on sample size and test length (Chou & Wang, 2010). To be clear, there is no single or easily calculated rule-of-thumb critical value for the largest PCA eigenvalue that will be generally applicable. A parametric bootstrap function has been implemented in easyRasch to determine a potentially appropriate cutoff value for the largest PCAR eigenvalue, but it has not been systematically evaluated yet. Below is an example, illustrated with a histogram of the simulated distribution of largest eigenvalues, the 99th percentile and the max value.

Code

pcasim <- RIbootPCA(df, iterations = 500, cpu = 8)
hist(pcasim$results, breaks = 50)

Code

pcasim$p99

    99% 
1.26901

Code

pcasim$max

[1] 1.287

If the bootstrap turns out to provide an appropriate cutoff value, it still needs to be used together with checking item fit (or item-restscore) and residual correlations (local dependence) to evaluate unidimensionality.

The PCA eigenvalues are mostly included here for those coming from Winsteps who may be looking for this metric. Speaking of Winsteps, the “explained variance” generated by RIpcmPCA() will not be comparable to Winsteps corresponding metric, since this function only shows the results from the analysis of residuals and not the variance explained by the Rasch model itself. Additionally, it should be noted that the PCA rotation is set to “oblimin”.

The Conditional Likelihood Ratio Test (CLRT, Andersen, 1973) is a global test of fit and can be a useful addition to more informative item-level metrics (Johansson, 2025a). CLRT compares the responses of those scoring below the median to those scoring above the median to assess the homogeneity of the test. Below, we run the test using the smaller sample with a function from the iarm package.

Code

clr_tests(df500, model = "PCM")


Conditional Likelihood Ratio Tests:

        clr  df pvalue sig
overall 71.3 39 0.0012  **

The test is clearly statistically significant, even with the smaller sample, which indicates that unidimensionality does not hold. However, CLRT is sensitive to large sample sizes and number of items, and produces false positives above the nominal 5% level. See https://pgmj.github.io/clrt.html for a very small simulation study. To minimize the sample size issue, a non-parametric bootstrap function has been implemented in easyRasch, where one can set the sample size manually. The function will then sample with replacement from the response data and conduct a separate CLRT for each sample drawn.

Code

RIbootLRT(df, iterations = 1000, samplesize = 400, cpu = 8)

Result	n	Percent
Not statistically significant	1000	100

The bootstrap method also shows a strong majority of statistically significant results.

Global tests do not provide any information about the reason for misfit. The iarm package has also implemented a function to get item specific information related to the CLRT, showing observed and expected values for each item.

Code

item_obsexp(PCM(df500))

Score group 1: 
         mean obs mean exp std.res sig
PANAS_11  0.7110   0.6472   1.3980    
PANAS_12  0.2798   0.3526  -1.8978    
PANAS_13  0.2523   0.2099   1.3249    
PANAS_14  0.6468   0.5959   1.1356    
PANAS_15  0.1468   0.2539  -2.9997 -  
PANAS_16  0.2294   0.2787  -1.3523    
PANAS_17  0.1881   0.1723   0.5426    
PANAS_18  0.2339   0.2107   0.7299    
PANAS_19  0.7110   0.7156  -0.0958    
PANAS_20  0.5780   0.5230   1.2863    

Score group 2: 
         mean obs mean exp std.res sig
PANAS_11  1.889    1.953   -1.011     
PANAS_12  1.728    1.653    1.077     
PANAS_13  1.171    1.213   -0.650     
PANAS_14  1.899    1.950   -0.772     
PANAS_15  1.664    1.555    1.526     
PANAS_16  1.724    1.671    0.710     
PANAS_17  1.120    1.135   -0.231     
PANAS_18  1.249    1.272   -0.344     
PANAS_19  2.221    2.208    0.204     
PANAS_20  1.641    1.694   -0.875

In order to support unidimensionality, items should only be related to each other through the latent variable, and otherwise be independent of each other. This is called “local independence” when fulfilled and “local dependence” (LD) when there are issues. The extreme case of dependence is when all respondents give the same response for two items. Two aspects of LD are often described (Chen & Thissen, 1997):

Underlying local dependence (ULD), which is when the locally dependent items are measuring a different latent variable than the other items.
Surface local dependence (SLD), when item content is very similar.

By investigating patterns in model residuals, we can determine whether items are independent or not. There are several tests that can be used to assess LD, of which Yen’s Q³ generally has good properties. Please see (Edwards et al., 2018) for an overview of different methods. It is recommended to use multiple methods, since they may perform differently with different types of LD. For instance, Chen & Thissen (1997) report that G² is slightly more powerful than Yen’s Q³ in detecting SLD, while the reverse is true for ULD.

Similarly to item fit, we need to run simulations based on the observed data to get a useful cutoff threshold value for when residual correlations amongst item pairs are larger than would be expected from items that fit a unidimensional Rasch model (Christensen et al., 2017).

The simulation/bootstrap procedure can take some time to run, depending on the complexity of your data, but it is necessary to set the appropriate cutoff value. The number of iterations to use has yet to be systematically evaluated, but I recommend no less than 500, and ideally at least 1000 as a starting point. T

Code

simcor1 <- RIgetResidCor(df, iterations = 500, cpu = 8)
RIresidcorr(df, cutoff = simcor1$p995)

	PANAS_11	PANAS_12	PANAS_13	PANAS_14	PANAS_15	PANAS_16	PANAS_17	PANAS_18	PANAS_19
PANAS_11
PANAS_12	-0.1
PANAS_13	-0.05	-0.01
PANAS_14	-0.11	0.09	0.07
PANAS_15	-0.14	-0.13	-0.22	-0.29
PANAS_16	-0.17	-0.1	-0.25	-0.27	0.38
PANAS_17	-0.18	-0.09	-0.09	-0.19	-0.13	-0.08
PANAS_18	-0.19	-0.15	-0.16	-0.18	-0.15	-0.13	0.32
PANAS_19	-0.13	-0.13	-0.25	-0.14	0.1	0.08	-0.21	-0.12
PANAS_20	-0.06	-0.22	-0.07	-0.05	-0.13	-0.19	-0.16	-0.15	-0.07
Note:
Relative cut-off value is 0.008, which is 0.11 above the average correlation (-0.102). Correlations above the cut-off are highlighted in red text.

The matrix above shows item-pair correlations of item residuals, with highlights in red showing correlations crossing the threshold compared to the average item-pair correlation (for all item-pairs) (Christensen et al., 2017). The threshold used in the example above has selected the 99.5th percentile from the simulations. The simcor1 object also pre-calculates values for the 95th, 99th and 99.9th percentiles as well as the maximum. If you desire another percentile, you can manually calculate it from the results in the simcor1 object using quantile().

Rasch model residual correlations using Yen’s Q³ are calculated using the mirt package.

Rasch model residual correlations using Chen & Thissen’s G² are calculated using the mirt package.

Code

simcor_g2 <- RIgetResidCorG2(df, iterations = 500, cpu = 8)
RIresidcorrG2(df, cutoff = simcor_g2$p995)

	PANAS_12	PANAS_13	PANAS_14	PANAS_15	PANAS_16	PANAS_17	PANAS_18	PANAS_19	PANAS_20
PANAS_11	-0.14	-0.13	-0.11	-0.1	-0.11	-0.12	-0.14	-0.13	-0.12
PANAS_12		0.1	0.13	0.09	0.1	0.09	-0.12	0.1	-0.12
PANAS_13			0.1	-0.1	-0.12	-0.1	-0.12	-0.12	-0.13
PANAS_14				-0.09	-0.1	-0.09	-0.11	-0.09	-0.09
PANAS_15					0.24	0.1	-0.08	0.14	-0.1
PANAS_16						0.1	0.1	0.14	-0.1
PANAS_17							0.17	-0.11	-0.11
PANAS_18								-0.11	-0.13
PANAS_19									-0.12
PANAS_20
Note:
Relative cut-off value is 0.103, which is 0.141 above the average correlation (-0.039). Correlations above the cut-off are highlighted in red text.

Another way to assess local (in)dependence is by partial gamma coefficients (Kreiner & Christensen, 2004). This is also a function from the iarm package. See ?iarm::partgam_LD for details.

Code

RIpartgamLD(df)

Item 1	Item 2	Partial gamma	SE	Lower CI	Upper CI	Adjusted p-value (BH)
PANAS_15	PANAS_16	0.634	0.030	0.576	0.693	0.000
PANAS_16	PANAS_15	0.634	0.030	0.576	0.692	0.000
PANAS_17	PANAS_18	0.577	0.033	0.512	0.642	0.000
PANAS_18	PANAS_17	0.551	0.034	0.484	0.619	0.000
PANAS_12	PANAS_14	0.341	0.037	0.269	0.413	0.000
PANAS_16	PANAS_19	0.297	0.040	0.219	0.375	0.000
PANAS_14	PANAS_12	0.293	0.038	0.217	0.368	0.000
PANAS_15	PANAS_19	0.287	0.040	0.208	0.366	0.000
PANAS_19	PANAS_16	0.274	0.040	0.197	0.352	0.000
PANAS_19	PANAS_15	0.261	0.041	0.181	0.340	0.000
PANAS_14	PANAS_13	0.209	0.042	0.127	0.291	0.000
PANAS_13	PANAS_14	0.202	0.041	0.121	0.283	0.000
PANAS_12	PANAS_13	0.191	0.044	0.105	0.278	0.001

Code

RIloadLoc(df)

Here we see item locations and their loadings on the first residual contrast. This figure can be helpful to identify clusters in data or multidimensionality. You can get a table with standardized factor loadings using RIloadLoc(df, output = "table").

The xlims setting changes the x-axis limits for the plots. The default values usually make sense, and we mostly add this option to point out the possibility of doing so. You can also choose to only show plots for only specific items.

Code

RIitemCats(df, xlims = c(-5,5))

Each response category for each item should have a curve that indicates it to be the most probably response at some point on the latent variable (x axis in the figure).

This produces a single figure with all items. Note that mirt() uses marginal maximum likelihood (MML) estimation which may produce slightly different results compared to eRm/CML when the distribution of person locations/scores are non-normal.

Code

mirt(df, model=1, itemtype='Rasch', verbose = FALSE) %>% 
  plot(type="trace", as.table = TRUE, 
       theta_lim = c(-5,5)) # changes x axis limits

Code

# increase fig-height in the chunk option above if you have many items
RItargeting(df, xlim = c(-5,4)) # xlim defaults to c(-4,4) if you omit this option

This figure shows how well the items fit the respondents/persons. It is a sort of Wright Map that shows person locations and item threshold locations on the same logit scale.

The top part shows person location histogram, the middle part an inverted histogram of item threshold locations, and the bottom part shows individual item threshold locations. The histograms also show means and standard deviations.

Here the items are sorted on their average threshold location (black diamonds). 84% confidence intervals are shown around each item threshold location. For further details, see the caption text below the figure.

The numbers displayed in the plot can be disabled using the option numbers = FALSE.

Code

RIitemHierarchy(df)

3.1 Analysis 1 comments

Item fit shows a lot of issues.

Item 18 has issues with the second lowest category being disordered. Several other items have very short distances between thresholds 1 and 2, which is also clearly seen in the Item Hierarchy figure above.

Two item-pairs show residual correlations far above the cutoff value:

15 and 16 (scared and afraid)
17 and 18 (ashamed and guilty)

Since item 15 also has a residual correlation with item 19, we will remove it. In the second pair, item 18 will be removed since it also has problems with disordered response categories.

Note

We have multiple “diagnostics” to review when deciding which item to remove if there are strong residual correlations between two items. Here is a list of commonly used criteria:

item fit
item threshold locations compared to sample locations (targeting)
ordering of response categories
DIF
and whether there are residual correlations between one item and multiple other items

Code

removed.items <- c("PANAS_15","PANAS_18")

df_backup <- df
df500_backup <- df500

df <- df_backup %>% 
  select(!any_of(removed.items))

df500 <- df500_backup %>% 
  select(!any_of(removed.items))

As seen in the code above, I chose to create a backup copy of the original dataframes, then removing the items. This can be useful if, at a later stage in the analysis, I want to be able to quickly “go back” and reinstate an item or undo any other change I have made.

4 Rasch analysis 2

With items 15 and 18 removed.

itemnr	item
PANAS_11	Distressed
PANAS_12	Upset
PANAS_13	Hostile
PANAS_14	Irritable
PANAS_16	Afraid
PANAS_17	Ashamed
PANAS_19	Nervous
PANAS_20	Jittery

Code

RIbootRestscore(df, iterations = 250, samplesize = 800)

Item	Item-restscore result	% of iterations	Conditional MSQ infit	Relative average item location
PANAS_20	underfit	32.8	1.16	1.39
PANAS_11	underfit	11.2	1.14	1.11
PANAS_12	overfit	97.6	0.80	1.38
PANAS_16	overfit	58.0	0.88	1.44
PANAS_17	overfit	9.2	1.04	1.79
PANAS_19	overfit	6.0	0.94	0.95
Note:
Results based on 250 bootstrap iterations with n = 800 and 8 items. Conditional mean-square infit based on complete responders only (n = 1851).

Still using the smaller sample of n = 500.

Code

simfit2 <- RIgetfit(df500, iterations = 400, cpu = 8)
RIitemfit(df500, simcut = simfit2)

Item	InfitMSQ	Infit thresholds	Infit diff	Relative location
PANAS_11	1.016	[0.876, 1.206]	no misfit	1.25
PANAS_12	0.791	[0.862, 1.159]	0.071	1.43
PANAS_13	1.109	[0.846, 1.19]	no misfit	1.89
PANAS_14	0.996	[0.836, 1.146]	no misfit	1.19
PANAS_16	0.939	[0.808, 1.235]	no misfit	1.42
PANAS_17	1.098	[0.843, 1.191]	no misfit	1.90
PANAS_19	0.963	[0.838, 1.198]	no misfit	1.00
PANAS_20	1.103	[0.834, 1.176]	no misfit	1.50
Note:
MSQ values based on conditional calculations (n = 500 complete cases). Simulation based thresholds from 400 simulated datasets.

Code

RIpcmPCA(df)

Eigenvalues	Proportion of variance
1.52	18.9%
1.33	17.1%
1.19	16.6%
1.15	14.6%
1.00	13.1%

Code

simcor2 <- RIgetResidCor(df, iterations = 400, cpu = 8)
RIresidcorr(df, cutoff = simcor2$p99)

	PANAS_11	PANAS_12	PANAS_13	PANAS_14	PANAS_16	PANAS_17	PANAS_19
PANAS_11
PANAS_12	-0.16
PANAS_13	-0.11	-0.06
PANAS_14	-0.19	0.03	0.01
PANAS_16	-0.16	-0.08	-0.25	-0.28
PANAS_17	-0.18	-0.06	-0.09	-0.19	0
PANAS_19	-0.15	-0.15	-0.28	-0.18	0.12	-0.16
PANAS_20	-0.1	-0.27	-0.13	-0.11	-0.18	-0.15	-0.09
Note:
Relative cut-off value is -0.036, which is 0.093 above the average correlation (-0.129). Correlations above the cut-off are highlighted in red text.

Code

RIloadLoc(df)

Code

RItargeting(df, xlim = c(-4,4), bins = 45)

Code

RIitemHierarchy(df)

4.1 Analysis 2 comments

Items 16 & 19, and 12 & 14 show problematic residual correlations.

Let’s look at DIF before taking action upon this information. While we are keeping DIF as a separate section in this vignette, it is recommended to include DIF-analysis in the panel-tabset above (on item fit, residual correlations, etc).

5 DIF - differential item functioning

DIF is a complex topic, which is why it gets a separate section here. Even so, this vignette will just brush the surface. For an historical overview, which also includes recent developments, see Thissen (2025).

Note

While we are keeping DIF as a separate section in this vignette, it is recommended to include DIF-analysis in the parallel analysis above (on item fit, targeting, residual correlations, etc) since DIF tests will help you understand or evaluate item functioning together with the other tests.

We’ll be looking at whether item (threshold) locations are stable between demographic subgroups. There are usually two key aspects to this analysis. One is whether we can detect DIF for one or more items. The other is the magnitude of DIF and how it affects the use of the overall test, if test scores are biased for any group. I recommend a paper by Chalmers et al. (2016) on the topic of DTF, differential test functioning.

There are several DIF analysis tools available. The first one uses the package psychotree, which relies on statistical significance at p < .05 as an indicator for DIF. This is a criterion that is highly sample size sensitive, and we are always interested in the size/magnitude of DIF as well, since that will inform us about the impact of DIF on the estimated latent variable.

The structure of DIF is also important to assess, particularly for polytomous data. Uniform DIF means that the DIF is similar across the latent continuum. The RIciccPlot() function can be used for DIF analysis and produces curves that show whether DIF is uniform or not. We can also test for uniformity formally in R using the lordif package, as demonstrated in Section 5.7. However, it should be noted that the lordif package does not provide an option to use Rasch models, and there may be results that are caused by also allowing the discrimination parameter to vary across items.

A recent preprint (Henninger et al., 2024) does a great job illustrating “differential step functioning” (DSF), which is when item threshold locations in polytomous data show varying levels of DIF. It also describes a forthcoming development of the psychotree where one can use DIF effect size and purification functions to evaluate DIF/DSF. When the updated package is available, I will work to implement these new functions into the easyRasch package as well.

Note

It is important to ensure that no cells in the data are empty for subgroups when conducting a DIF analysis. Split the data using the DIF-variable and create separate tileplots to review the response distribution in the DIF-groups.

Code

RIdifTileplot(df, dif$Sex)

5.1 Sex

itemnr	item
PANAS_11	Distressed
PANAS_12	Upset
PANAS_13	Hostile
PANAS_14	Irritable
PANAS_16	Afraid
PANAS_17	Ashamed
PANAS_19	Nervous
PANAS_20	Jittery

Code

RIdifTable(df, dif$Sex)

Item	2	3	Mean location	StDev	MaxDiff
PANAS_11	-0.314	-0.196	-0.255	0.083	0.117
PANAS_12	0.028	-0.044	-0.008	0.051	0.073
PANAS_13	0.553	0.402	0.478	0.107	0.151
PANAS_14	-0.328	-0.183	-0.255	0.103	0.146
PANAS_16	0.004	0.114	0.059	0.078	0.111
PANAS_17	0.520	0.290	0.405	0.163	0.230
PANAS_19	-0.495	-0.355	-0.425	0.099	0.140
PANAS_20	0.032	-0.028	0.002	0.042	0.059

Code

RIdifFigure(df, dif$Sex)

Code

RIdifFigThresh(df, dif$Sex)

While no item shows problematic levels of DIF regarding item location, as shown by the table, there is an interesting pattern in the thresholds figure. The lowest threshold seems to be slightly lower for node 3 (Male) for all items. Also, item 11 shows a much wider spread of item locations for node 3 compared to node 2.

The results do not require any action since the difference is small.

5.2 Age

The psychotree package uses a model-based recursive partitioning that is particularly useful when you have a continuous variable such as age in years and a large enough sample. It will test different ways to partition the age variable to determine potential group differences (Strobl et al., 2015b, 2021).

Code

RIdifTable(df, dif$Age)

[1] "No statistically significant DIF found."

No DIF found for age.

5.3 Group

Code

RIdifTable(df, dif$Group)

[1] "No statistically significant DIF found."

And no DIF for group.

5.4 Sex and age

The psychotree package also allows for DIF interaction analysis with multiple DIF variables. We can use RIdifTable2() to input two DIF variables.

Code

RIdifTable2(df, dif$Sex, dif$Age)

Item	2	3	Mean location	StDev	MaxDiff
PANAS_11	-0.314	-0.196	-0.255	0.083	0.117
PANAS_12	0.028	-0.044	-0.008	0.051	0.073
PANAS_13	0.553	0.402	0.478	0.107	0.151
PANAS_14	-0.328	-0.183	-0.255	0.103	0.146
PANAS_16	0.004	0.114	0.059	0.078	0.111
PANAS_17	0.520	0.290	0.405	0.163	0.230
PANAS_19	-0.495	-0.355	-0.425	0.099	0.140
PANAS_20	0.032	-0.028	0.002	0.042	0.059

No interaction effect found for sex and age. The analysis only shows the previously identified DIF for sex.

5.5 LRT-based DIF

We’ll use the group variable as an example. It’s always important to review the response distribution across DIF groups prior to running the DIF analysis. We need to make sure that there are responses in all cells in all subgroups.

Code

RIdifTileplot(df, dif$Group)

First, we can simply run the test to get the overall result.

Code

erm.out <- PCM(df)
LRtest(erm.out, splitcr = dif$Group)


Andersen LR-test: 
LR-value: 46.864 
Chi-square df: 31 
p-value:  0.034

Review the documentation for further details, using ?LRtest in your R console panel in Rstudio. There is also a plotting function, plotGOF() that may be of interest.

itemnr	item
PANAS_11	Distressed
PANAS_12	Upset
PANAS_13	Hostile
PANAS_14	Irritable
PANAS_16	Afraid
PANAS_17	Ashamed
PANAS_19	Nervous
PANAS_20	Jittery

Code

RIdifTableLR(df, dif$Group)

	Item locations				Standard errors
Item	Earlier Start	Later Start	MaxDiff	All	SE_Earlier Start	SE_Later Start	SE_All
PANAS_11	0.069	0.072	0.003	0.073	0.140	0.124	0.093
PANAS_12	0.33	0.346	0.016	0.342	0.155	0.133	0.101
PANAS_13	0.69	0.966	0.276	0.826	0.185	0.189	0.129
PANAS_14	0.118	0.054	0.064	0.084	0.142	0.121	0.092
PANAS_16	0.447	0.362	0.085	0.401	0.160	0.134	0.102
PANAS_17	0.66	0.822	0.162	0.751	0.184	0.172	0.125
PANAS_19	-0.028	-0.122	0.094	-0.083	0.136	0.117	0.088
PANAS_20	0.351	0.348	0.003	0.352	0.160	0.138	0.104
Note:
Values highlighted in red are above the chosen cutoff 0.5 logits. Background color brown and blue indicate the lowest and highest values among the DIF groups.

Code

RIdifFigureLR(df, dif$Group) + theme_rise()

Code

RIdifThreshTblLR(df, dif$Group)

	Threshold locations				Standard errors
Item threshold	Earlier Start	Later Start	MaxDiff	All	SE_Earlier Start	SE_Later Start	SE_All
PANAS_11
c1	-1.245	-1.241	0.004	-1.240	0.098	0.094	0.068
c2	-0.365	-0.221	0.144	-0.284	0.107	0.100	0.073
c3	0.281	0.096	0.185	0.180	0.131	0.114	0.086
c4	1.604	1.655	0.051	1.637	0.224	0.188	0.144
PANAS_12
c1	-0.484	-0.362	0.122	-0.418	0.092	0.091	0.065
c2	0.241	-0.198	0.439	-0.005	0.126	0.108	0.082
c3	0.479	0.456	0.023	0.467	0.169	0.129	0.103
c4	1.086	1.489	0.403	1.323	0.233	0.205	0.153
PANAS_13
c1	-0.067	0.23	0.297	0.093	0.092	0.089	0.064
c2	0.403	0.115	0.288	0.248	0.135	0.118	0.088
c3	0.889	1.042	0.153	0.983	0.197	0.164	0.126
c4	1.536	2.476	0.94	1.979	0.316	0.384	0.239
PANAS_14
c1	-1.017	-0.972	0.045	-0.990	0.095	0.094	0.066
c2	-0.134	-0.272	0.138	-0.205	0.111	0.103	0.076
c3	0.456	0.095	0.361	0.239	0.146	0.116	0.091
c4	1.168	1.366	0.198	1.294	0.216	0.173	0.135
PANAS_16
c1	-0.156	-0.25	0.094	-0.202	0.097	0.091	0.066
c2	0.009	-0.091	0.1	-0.046	0.130	0.112	0.085
c3	0.324	0.354	0.03	0.344	0.159	0.132	0.102
c4	1.611	1.435	0.176	1.508	0.253	0.201	0.157
PANAS_17
c1	0.264	0.368	0.104	0.324	0.097	0.089	0.066
c2	0.389	0.421	0.032	0.412	0.146	0.128	0.096
c3	0.804	0.955	0.151	0.894	0.205	0.181	0.136
c4	1.182	1.545	0.363	1.373	0.288	0.290	0.204
PANAS_19
c1	-1.339	-1.263	0.076	-1.297	0.100	0.098	0.070
c2	-0.388	-0.27	0.118	-0.323	0.108	0.105	0.075
c3	0.101	-0.333	0.434	-0.143	0.127	0.111	0.083
c4	1.512	1.378	0.134	1.430	0.207	0.156	0.125
PANAS_20
c1	-0.906	-0.877	0.029	-0.887	0.093	0.090	0.065
c2	-0.189	-0.257	0.068	-0.223	0.108	0.099	0.073
c3	1.037	0.585	0.452	0.760	0.162	0.123	0.098
c4	1.463	1.941	0.478	1.756	0.277	0.238	0.180
Note:
Values highlighted in red are above the chosen cutoff 0.5 logits. Background color brown and blue indicate the lowest and highest values among the DIF groups.

Code

RIdifThreshFigLR(df, dif$Group) + theme_rise()

The item threshold table shows that the top threshold for item 13 differs more than 0.5 logits between groups. In this set of 8 items with 4 thresholds each, it is unlikely to result in problematic differences in estimated person scores.

5.6 Conditional ICC

The iarm package function for CICC plots also includes the option to use a DIF variable. This gives us information about DIF across the latent continuum when class intervals are created based on latent score groups. If your analysis contains two groups and the lines for the groups cross each other, DIF is non-uniform.

Code

RIciccPlot(df, dif = "yes", dif_var = dif$Sex)

5.7 Logistic Ordinal Regression DIF

The lordif package (Choi et al., 2011) does not use a Rasch measurement model, it only offers a choice between the Graded Response Model (GRM) and the Generalized Partial Credit Model (GPCM). Both of these are 2PL models, meaning that they estimate a discrimination parameter for each item in addition to the item threshold parameters. lordif relies on the mirt package.

There are several nice features available in the lordif package. First, we get a χ2 test of uniform or non-uniform DIF. Second, there are three possible methods/criteria for flagging items with potential DIF. One of these uses a likelihood ratio (LR) χ2 test, while the other two are indicators of DIF size/magnitude, either using a pseudo R2 statistic (“McFadden”, “Nagelkerke”, or “CoxSnell”) or a Beta criterion. For further details, see ?lordif in your R console or the paper describing the package (Choi et al., 2011).

Below is some sample code to get you started with lordif.

Code

library(lordif)

g_dif <- lordif(as.data.frame(df), as.numeric(dif$Sex), # make sure that the data is in a dataframe-object and that the DIF variable is numeric
                criterion = c("Chisqr"), 
                alpha = 0.01, 
                beta.change = 0.1,
                model = "GPCM",
                R2.change = 0.02)

g_dif_sum <- summary(g_dif)

Code

# threshold values for colorizing the table below
alpha = 0.01
beta.change = 0.1
R2.change = 0.02

g_dif_sum$stats %>% 
  as.data.frame() %>% 
  select(!all_of(c("item","df12","df13","df23"))) %>% 
  round(3) %>% 
  add_column(itemnr = names(df), .before = "ncat") %>% 
  mutate(across(c(chi12,chi13,chi23), ~ cell_spec(.x,
                               color = case_when(
                                 .x < alpha ~ "red",
                                 TRUE ~ "black"
                               )))) %>%
  mutate(across(starts_with("pseudo"), ~ cell_spec(.x,
                               color = case_when(
                                 .x > R2.change ~ "red",
                                 TRUE ~ "black"
                               )))) %>%
  mutate(beta12 =  cell_spec(beta12,
                               color = case_when(
                                 beta12 > beta.change ~ "red",
                                 TRUE ~ "black"
                               ))) %>% 
  kbl_rise()

itemnr	ncat	chi12	chi13	chi23	beta12	pseudo12.McFadden	pseudo13.McFadden	pseudo23.McFadden	pseudo12.Nagelkerke	pseudo13.Nagelkerke	pseudo23.Nagelkerke	pseudo12.CoxSnell	pseudo13.CoxSnell	pseudo23.CoxSnell
PANAS_11	5	0.323	0	0	0.002	0	0.004	0.003	0	0.005	0.005	0	0.005	0.005
PANAS_12	5	0.013	0.019	0.192	0.008	0.001	0.002	0	0.001	0.002	0	0.001	0.002	0
PANAS_13	5	0.106	0.057	0.077	0.007	0.001	0.001	0.001	0.001	0.002	0.001	0.001	0.002	0.001
PANAS_14	5	0.182	0.401	0.83	0.002	0	0	0	0	0	0	0	0	0
PANAS_16	5	0.64	0.17	0.068	0.001	0	0.001	0.001	0	0.001	0.001	0	0.001	0.001
PANAS_17	5	0	0	0.32	0.032	0.008	0.008	0	0.011	0.011	0	0.01	0.01	0
PANAS_19	5	0.178	0.25	0.33	0.002	0	0	0	0	0.001	0	0	0.001	0
PANAS_20	5	0.68	0.349	0.164	0.001	0	0	0	0	0.001	0.001	0	0.001	0.001

We can review the results regarding uniform/non-uniform DIF by looking at the chi* columns. Uniform DIF is indicated by column chi12 and non-uniform DIF by chi23, while column chi13 represents “an overall test of”total DIF effect” (Choi et al., 2011).

While the table indicates significant chi2-tests for items 11 and 17, the magnitude estimates are low for these items.

There are some plots available as well, using the base R plot() function. For some reason the plots won’t render in this Quarto document, so I will try to sort that out at some point.

Code

plot(g_dif) # use option `graphics.off()` to get the plots rendered one by one
#plot(g_dif, graphics.off())

5.8 Partial gamma DIF

The iarm package provides a function to assess DIF by partial gamma (Bjorner et al., 1998). It should be noted that this function only shows a single partial gamma value per item, so if you have more than two groups in your comparison, you will want to also use other methods to understand your results better.

There are some recommended cutoff-values mentioned in the paper above:

No or negligible DIF:

Gamma within the interval -0.21 to 0.21, or
Gamma not significantly different from 0

Slight to moderate DIF:

Gamma within the interval -0.31 to 0.31 (and outside -0.21 to 0.21), or
not significantly outside the interval -0.21 to 0.21

Moderate to large DIF:

Gamma outside the interval -0.31 to 0.31, and
significantly outside the interval -0.21 to 0.21

Code

RIpartgamDIF(df, dif$Sex)

Item	Partial gamma	SE	Lower CI	Upper CI	Adjusted p-value (BH)
PANAS_17	0.234	0.054	0.127	0.341	0

We can see “slight” DIF for item 17, with a statistically significant gamma of .23.

6 Rasch analysis 3

While there were no significant issues with DIF for any item/subgroup combination, we need to address the previously identified problem:

Items 16 and 19 have the largest residual correlation.

We’ll remove item 19 since item 16 has better targeting.

Code

removed.items <- c(removed.items,"PANAS_19")

df <- df_backup %>% 
  select(!any_of(removed.items))

df500 <- df500_backup %>% 
  select(!any_of(removed.items))

itemnr	item
PANAS_11	Distressed
PANAS_12	Upset
PANAS_13	Hostile
PANAS_14	Irritable
PANAS_16	Afraid
PANAS_17	Ashamed
PANAS_20	Jittery

Code

RIbootRestscore(df, iterations = 250, samplesize = 800)

Item	Item-restscore result	% of iterations	Conditional MSQ infit	Relative average item location
PANAS_20	underfit	48.0	1.16	1.35
PANAS_11	underfit	14.4	1.13	1.07
PANAS_12	overfit	98.8	0.78	1.34
PANAS_17	overfit	16.0	1.00	1.74
PANAS_16	overfit	8.8	0.94	1.40
Note:
Results based on 250 bootstrap iterations with n = 800 and 7 items. Conditional mean-square infit based on complete responders only (n = 1851).

Code

simfit3 <- RIgetfit(df500, iterations = 200, cpu = 8)
RIitemfit(df500, simfit3)

Item	InfitMSQ	Infit thresholds	Infit diff	Relative location
PANAS_11	1.024	[0.871, 1.165]	no misfit	1.21
PANAS_12	0.784	[0.867, 1.231]	0.083	1.39
PANAS_13	1.063	[0.818, 1.217]	no misfit	1.85
PANAS_14	0.945	[0.851, 1.158]	no misfit	1.15
PANAS_16	1.039	[0.855, 1.171]	no misfit	1.37
PANAS_17	1.044	[0.796, 1.168]	no misfit	1.86
PANAS_20	1.11	[0.888, 1.125]	no misfit	1.47
Note:
MSQ values based on conditional calculations (n = 500 complete cases). Simulation based thresholds from 200 simulated datasets.

Code

RIciccPlot(df)

Code

simcor3 <- RIgetResidCor(df, iterations = 400, cpu = 8)
RIresidcorr(df, cutoff = simcor3$p99)

	PANAS_11	PANAS_12	PANAS_13	PANAS_14	PANAS_16	PANAS_17
PANAS_11
PANAS_12	-0.18
PANAS_13	-0.15	-0.11
PANAS_14	-0.22	0.01	-0.04
PANAS_16	-0.14	-0.06	-0.26	-0.26
PANAS_17	-0.2	-0.09	-0.14	-0.22	0
PANAS_20	-0.12	-0.29	-0.16	-0.13	-0.15	-0.17
Note:
Relative cut-off value is -0.051, which is 0.095 above the average correlation (-0.147). Correlations above the cut-off are highlighted in red text.

Code

RIbootLRT(df, iterations = 1000, samplesize = 400, cpu = 8)

Result	n	Percent
Not statistically significant	79	7.9
Statistically significant	921	92.1

Code

RItargeting(df, bins = 45)

Code

RIitemHierarchy(df)

6.1 Analysis 3 comments

There are still problematic residual correlations and some items show misfit (most notably item 12 being overfit), but we will end this sample analysis here and move on to show other functions.

There are several item thresholds that are very closely located, as shown in the item hierarchy figure. This is not ideal, since it will inflate reliability estimates. However, we will not modify the response categories for this analysis, we only note that this is not workable and should be dealt with by trying variations of merged response categories to achieve better separation of threshold locations without disordering.

7 Reliability

Reliability (or measurement uncertainty) is often reported as a point estimate only, which implies constant reliability across the latent variable. This does not reflect the targeting aspect and the limitations of reliability often found at the lower and upper regions of the latent continuum. One way to try to better describe the varying reliability is by using the Test Information Function (TIF), which is a curve. I have previously advocated for reporting the TIF curve, but have also found some unexpected results when using it. When I finally sat down to figure out the issues with TIF, I found a paper showing that it behaves quite erratically, especially with fewer than six or so items (Milanzi et al., 2015). While the RItif() function remains, I recommend careful use of it, or perhaps not using it at all.

The most commonly used reliability metric in Rasch analysis is the PSI, person separation index. It is however important to note that the PSI excludes extreme values (min/max scores), which explains why it is sometimes quite different from other metrics. The reason for excluding participants is that extreme scores cannot be estimated, they need to be imputed/interpolated along with their corresponding measurement error, which is what is used in the PSI metric.

7.1 RMU — Relative Measurement Uncertainty

Code

rel <- RIreliability(df, draws = 1000)
rel$PSI

Separation Reliability: 0.7028

Code

rel$RMU

[1] "WLE-RMU = 0.806 (95% HDCI [0.792, 0.819]) (1000 draws) using package mirt and 50 RMU iterations."

The new (since easyRasch 0.4.0) function RIreliability() will produce several estimates of reliability. The potentially most useful metric is a new one, based on a recent preprint (Bignardi et al., 2025) which describes a general reliability metric that includes uncertainty around the point estimate, called RMU, “Relative Measurement Uncertainty”. Bignardi and colleagues use Bayesian estimation and posterior draws of each respondents latent scores (thetas/locations) to determine the mean and 95% highest density continuous interval (HDCI) of the distribution of correlations between draws (Bignardi et al., 2025). I have borrowed code that the authors made available, and adapted it to simplify the use by grace of the mirt package and its function to generate plausible values (PVs, Mislevy, 1991; Wu, 2005) instead of having to learn to specify Bayesian models. Preliminary investigations indicate similar results for PVs and fully Bayesian model posteriors, see the example code in the documentation ?RIreliability for an example, and this blog post for a more extensive investigation. The 95% HDCI is a useful addition, and by using PVs, respondents with extreme values are included in a reasonable way. RIreliability() has the setting estim to choose method of estimation for thetas/latent scores used for the RMU, which uses “WLE” (weighted likelihood estimation) as default to minimize bias, which may be an improvement over the fully Bayesian modeling method (Kreiner, 2025; Warm, 1989), but this has not been evaluated. Other possible methods are specified in ?mirt::fscores. The output from RIreliability() indicates which method was used.

In addition to the RMU, RIreliability() outputs the Person Separation Index (PSI), the empirical reliability (as estimated by mirt::empirical_rxx() using WLE, see this post), and a table with values from the WL estimation (using iarm::person_estimates()). I suggest reporting RMU in publications. Also, the default setting is to use 1000 plausible values, and I think it would be prudent to use something like 4000 for a publication.

An addition made to RIreliability() in easyRasch version 0.4.1.2 was the option iter = 50, which refers to the number of times RMU is estimated from the draws generated, to achieve more stable values compared to a single estimation.

7.2 Conditional reliability

Another recent paper (McNeish & Dumas, 2025) advocates for the relevance of a conditional reliability, which I think is a useful addition to the RMU metric presented above. I have again borrowed code from another open source project, namely the dynamic R package. RIrelRep() produces a Cronbach’s alpha point estimate with a confidence interval and a conditional reliability curve showing varying levels of alpha across the ordinal sum score range. Alpha is a criticized metric (McNeish, 2018; e.g Sijtsma, 2008), so again interpret this with care. One of alpha’s issues is the assumption of tau-equivalence (equal factor loadings or discrimination parameters across all items), which is only fulfilled by the Rasch model, and also requires local independence of item. Alpha is a metric for ordinal sum scores, not estimated interval scores. The RIrelRep() function also has an option for using omega instead of alpha, see the tab “Omega” below.

Code

relrep <- RIrelRep(df)

Coefficient Alpha:                  0.85
Coefficient Alpha 95% CI:          [0.83,0.86]

People above Alpha interval:    54.35%
People within Alpha interval   :    0.11%
People below  Alpha interval:    45.54%

Code

relrep$stat

                             X1          X2
1            Coefficient Alpha:        0.85
2     Coefficient Alpha 95% CI: [0.83,0.86]
3                                          
4  People above Alpha interval:      54.35%
5 People within Alpha interval:       0.11%
6 People below Alpha interval :      45.54%

Code

relrep$table

   Sum Score Conditional Reliability Count
1          0                   0.554   268
2          1                   0.671   151
3          2                   0.747   177
4          3                   0.798   127
5          4                   0.832   117
6          5                   0.856   122
7          6                   0.873   124
8          7                   0.886   107
9          8                   0.895    88
10         9                   0.902    86
11        10                   0.907    62
12        11                   0.911    60
13        12                   0.913    71
14        13                   0.915    52
15        14                   0.915    48
16        15                   0.915    43
17        16                   0.914    34
18        17                   0.912    28
19        18                   0.909    23
20        19                   0.906    22
21        20                   0.900    17
22        21                   0.894     8
23        22                   0.885     7
24        23                   0.875     1
25        24                   0.861     3
26        25                   0.844     2
27        26                   0.821     1
28        27                   0.790     1
29        28                   0.746     1

As a comparison, we can also get conditional omega.

Code

rr_omega <- RIrelRep(df, rel = "omega")

Coefficient Omega:                  0.85
Coefficient Omega 95% CI:          [0.84,0.86]

People above Omega interval:    47.76%
People within Omega interval   :    6.7%
People below  Omega interval:    45.54%

To me, there is one outstanding issue with the reliability metrics discussed and presented above. The TIF curve was based on item parameters, describing the test reliability. All estimates above are sample specific, describing the sample reliability. I think both are relevant, depending on what you are doing. For validating a measure, it would be interesting to present the measure’s reliability.

8 Person fit

We can also look at how the respondents fit the Rasch model with these items. By default, RIpfit() outputs a histogram and a hex heatmap with the person infit ZSTD statistic, using +/- 1.96 as cutoff values. This is currently the only person fit method implemented in the easyRasch package, and the curious analyst is suggested to look at the package PerFit for more tools.

Code

RIpfit(df)

You can export the person fit values to a new variable in the dataframe by specifying output = "dataframe", or if you just want the row numbers for respondents with deviant infit values, output = "rowid".

You can also specify a grouping variable to visualize the person fit for different groups.

Code

RIpfit(df, group = dif$Sex, output = "heatmap")

Person fit is a useful way to identify respondents with unexpected response patterns and investigate this further.

8.1 `PerFit` sample code

While none of the functions in the PerFit package has been implemented in easyRasch, this is some code to get you started if you are interested in using it. There are multiple methods/functions available for polytomous and dichotomous data, see the package documentation.

For this example, we’ll use the non-parametric U3 statistic generalized to polytomous items (Emons, 2008) and the smaller sample.

Code

library(PerFit)
pfit_u3poly <- U3poly(matrix = df500, 
                      Ncat = 5, # make sure to input number of response categories, not thresholds
                      IRT.PModel = "PCM")

Code

cutoff(pfit_u3poly)

$Cutoff
[1] 0.3954

$Cutoff.SE
[1] 0.0141

$Prop.flagged
[1] 0.102

$Tail
[1] "upper"

$Cutoff.CI
  2.5%  97.5% 
0.3776 0.4306 

attr(,"class")
[1] "PerFit.cutoff"

Code

flagged.resp(pfit_u3poly) %>% 
  pluck("Scores") %>% 
  as.data.frame() %>% 
  arrange(desc(PFscores))

   FlaggedID It1 It4 It7 It2 It5 It3 It6 PFscores
1        327   0   0   0   0   0   0   1   1.0000
2         53   0   0   0   0   0   4   0   0.9915
3         28   0   0   0   0   0   3   0   0.9638
4        121   0   0   0   0   0   3   0   0.9638
5         82   0   0   0   0   0   2   0   0.9016
6        272   0   0   0   0   0   1   0   0.8747
7        387   0   0   0   0   0   1   0   0.8747
8        277   0   0   0   0   3   0   0   0.7206
9        281   0   0   0   2   0   0   0   0.6465
10        87   0   0   0   0   1   0   0   0.6448
11       192   0   0   0   0   1   0   0   0.6448
12       471   0   0   0   0   1   0   0   0.6448
13       181   4   0   0   0   0   0   0   0.6403
14       279   1   0   0   0   4   0   0   0.6256
15       256   0   2   0   4   0   0   0   0.6024
16        65   0   0   0   0   1   0   1   0.5847
17        21   1   0   0   0   4   0   1   0.5805
18       154   4   4   4   4   4   0   0   0.5721
19       187   0   4   1   4   0   3   2   0.5693
20       419   1   4   0   4   0   0   0   0.5618
21        52   0   1   0   0   0   0   3   0.5546
22       499   0   0   0   0   1   1   0   0.5382
23       352   1   0   0   1   4   0   0   0.5265
24        32   0   0   0   1   0   0   1   0.5250
25       475   3   4   4   1   3   0   4   0.5148
26       158   1   0   0   2   0   0   3   0.5098
27       202   0   0   4   2   3   1   1   0.5007
28       398   1   0   3   0   0   3   0   0.4943
29        17   0   0   0   1   0   0   0   0.4840
30        67   0   0   0   1   0   0   0   0.4840
31       306   0   0   0   1   0   0   0   0.4840
32       337   0   0   0   1   0   0   0   0.4840
33       446   0   0   0   1   0   0   0   0.4840
34       186   0   1   4   1   1   0   0   0.4729
35       371   0   0   3   2   0   0   0   0.4636
36        55   0   0   2   0   0   0   0   0.4597
37       318   0   0   2   0   0   0   0   0.4597
38       409   0   0   2   0   0   0   0   0.4597
39       472   0   0   2   0   0   0   0   0.4597
40       474   0   0   2   0   0   0   0   0.4597
41       466   0   1   0   0   0   2   0   0.4559
42       340   1   4   0   0   1   0   0   0.4517
43       216   0   2   2   3   4   2   4   0.4452
44       468   3   3   0   3   4   1   4   0.4414
45       391   3   0   3   0   0   0   0   0.4305
46       230   1   2   4   3   0   0   0   0.4275
47       407   0   0   1   0   0   0   1   0.4165
48       168   0   2   0   3   2   3   0   0.4140
49        40   0   2   0   3   0   0   2   0.4031
50       386   2   0   0   0   3   0   0   0.4010
51        47   0   0   1   0   3   0   0   0.4004
52       278   0   0   0   1   1   0   0   0.3932
53       167   2   2   1   1   0   4   0   0.3926
54       178   2   2   1   1   0   4   0   0.3926

The dataframe shown under the tab Flagged respondents above contains a variable named FlaggedID which represents the row id’s. This variable is useful if one wants to filter out respondents with deviant response patterns (person misfit). There are indications that persons with misfit may affect results of Andersen’s LR-test for DIF (Artner, 2016).

8.2 Item fit without aberrant responses

We can remove the misfitting persons to see how that affects item fit. Let’s also compare with the misfitting respondents identified by RIpfit(), also using the smaller sample.

Code

misfits <- flagged.resp(pfit_u3poly) %>% 
  pluck("Scores") %>% 
  as.data.frame() %>% 
  pull(FlaggedID)

misfits2 <- RIpfit(df500, output = "rowid")

Code

RIitemfit(df500, simcut = simfit3)

Item	InfitMSQ	Infit thresholds	Infit diff	Relative location
PANAS_11	1.024	[0.871, 1.165]	no misfit	1.21
PANAS_12	0.784	[0.867, 1.231]	0.083	1.39
PANAS_13	1.063	[0.818, 1.217]	no misfit	1.85
PANAS_14	0.945	[0.851, 1.158]	no misfit	1.15
PANAS_16	1.039	[0.855, 1.171]	no misfit	1.37
PANAS_17	1.044	[0.796, 1.168]	no misfit	1.86
PANAS_20	1.11	[0.888, 1.125]	no misfit	1.47
Note:
MSQ values based on conditional calculations (n = 500 complete cases). Simulation based thresholds from 200 simulated datasets.

Code

RIitemfit(df500[-misfits,], simcut = simfit3)

Item	InfitMSQ	Infit thresholds	Infit diff	Relative location
PANAS_11	1.068	[0.871, 1.165]	no misfit	1.27
PANAS_12	0.766	[0.867, 1.231]	0.101	1.59
PANAS_13	0.961	[0.818, 1.217]	no misfit	2.06
PANAS_14	1.03	[0.851, 1.158]	no misfit	1.27
PANAS_16	1.034	[0.855, 1.171]	no misfit	1.59
PANAS_17	1.09	[0.796, 1.168]	no misfit	2.15
PANAS_20	1.085	[0.888, 1.125]	no misfit	1.78
Note:
MSQ values based on conditional calculations (n = 452 complete cases). Simulation based thresholds from 200 simulated datasets.

Code

RIitemfit(df500[-misfits2,], simcut = simfit3)

Item	InfitMSQ	Infit thresholds	Infit diff	Relative location
PANAS_11	1.024	[0.871, 1.165]	no misfit	1.28
PANAS_12	0.778	[0.867, 1.231]	0.089	1.49
PANAS_13	0.993	[0.818, 1.217]	no misfit	1.95
PANAS_14	0.98	[0.851, 1.158]	no misfit	1.23
PANAS_16	0.966	[0.855, 1.171]	no misfit	1.48
PANAS_17	1.084	[0.796, 1.168]	no misfit	1.90
PANAS_20	1.156	[0.888, 1.125]	0.031	1.52
Note:
MSQ values based on conditional calculations (n = 470 complete cases). Simulation based thresholds from 200 simulated datasets.

9 Item parameters

To allow others (and oneself) to use the item parameters estimated for estimation of person locations/thetas, we should make the item parameters available. The function will also write a csv-file with the item threshold locations. Estimations of person locations/thetas can be done with the thetaEst() function from the catR package. This is implemented in the function RIestThetasOLD(), see below for details.

First, we’ll output the parameters into a table.

Code

RIitemparams(df)

	Threshold 1	Threshold 2	Threshold 3	Threshold 4	Item location
PANAS_11	-1.63	-0.67	-0.22	1.21	-0.33
PANAS_12	-0.80	-0.39	0.06	0.89	-0.06
PANAS_13	-0.29	-0.15	0.56	1.53	0.42
PANAS_14	-1.38	-0.59	-0.16	0.87	-0.32
PANAS_16	-0.59	-0.44	-0.06	1.08	0
PANAS_17	-0.06	0.01	0.48	0.93	0.34
PANAS_20	-1.27	-0.61	0.35	1.32	-0.05
Note:
Item location is the average of the thresholds for each item.

The parameters can also be output to a dataframe or a file, using the option output = "dataframe" or output = "file".

10 Ordinal sum score to interval score

This table shows the corresponding “raw” ordinal sum score values and logit scores, with standard errors for each logit value. Interval scores are estimated using WL based on a simulated dataset using the item parameters estimated from the input dataset. The choice of WL as default is due to the lower bias compared to ML estimation (Kreiner, 2025; Warm, 1989).

(An option will hopefully be added at some point to create this table based on only item parameters.)

Code

RIscoreSE(df)

Ordinal sum score	Logit score	Logit std.error
0	-3.642	0.620
1	-2.543	0.716
2	-2.032	0.653
3	-1.693	0.578
4	-1.437	0.515
5	-1.228	0.468
6	-1.050	0.432
7	-0.893	0.406
8	-0.750	0.386
9	-0.618	0.371
10	-0.493	0.361
11	-0.373	0.353
12	-0.258	0.348
13	-0.144	0.346
14	-0.031	0.346
15	0.083	0.348
16	0.198	0.352
17	0.317	0.359
18	0.441	0.368
19	0.572	0.381
20	0.712	0.397
21	0.866	0.418
22	1.037	0.447
23	1.233	0.484
24	1.464	0.534
25	1.747	0.601
26	2.118	0.681
27	2.666	0.745
28	3.804	0.638

10.1 Ordinal/interval figure

The figure below can also be generated to illustrate the relationship between ordinal sum score and logit interval score. The errorbars default to show the standard error at each point, multiplied by 1.96.

Code

RIscoreSE(df, output = "figure")

10.2 Estimating interval level person scores

Based on the Rasch analysis output of item parameters, we can estimate each individuals location or score (also known as “theta”). RIestThetas() estimates item parameters from a partial credit model (PCM) and as default uses WL estimation (Kreiner, 2025; Warm, 1989) to output a dataframe with person locations and their measurement error (SEM) on the logit scale. However, this function is only appropriate if you have complete response data for all participants. If you have partial missingness, you need to use either RIestThetasOLD() or RIestThetasCATr().

Code

thetas <- RIestThetas(df)

head(thetas)

         WLE       SEM
1 -2.5430672 0.7160122
2 -2.0319918 0.6533338
3 -0.1439296 0.3460439
4 -3.6420132 0.6202853
5 -2.0319918 0.6533338
6  0.1980813 0.3520345

Each individual has a standard error of measurement (SEM) associated with their estimated location/score. This is included in the output of the RIestThetas() function as the SEM variable, as seen above. We can review the distribution of measurement error with a figure.

We can take a look at the distribution of person locations (thetas) using a histogram.

Code

hist(thetas$WLE, 
     col = "#009ca6", 
     main = "Histogram of person locations (thetas)", 
     breaks = 20)

RIestThetasOLD() and RIestThetasCATr() can be used with a pre-specified item (threshold) location matrix.

If you would like to use an existing item threshold location matrix, this code may be helpful:

Code

itemParameters <- read_csv("itemParameters.csv") %>% 
  as.matrix()
itemParameters

     Threshold 1 Threshold 2 Threshold 3 Threshold 4
[1,]     -1.2382     -0.3288      0.0666      1.4649
[2,]      0.0623      0.1480      0.8241      1.7680
[3,]     -0.9915     -0.2566      0.1231      1.1222
[4,]     -0.2179     -0.1217      0.2099      1.3248
[5,]      0.2868      0.3061      0.7319      1.1655
[6,]     -0.8929     -0.2860      0.6297      1.5673

As you can see, this is a matrix object (not a dataframe), with each item as a row, and the threshold locations as columns.

11 Figure design

Most of the figures created by the functions can be styled (colors, fonts, etc) by adding theme settings to them. You can use the standard ggplot function theme() and related theme-functions. As usual it is possible to “stack” theme functions, as seen in the example below.

You can also change coloring, axis limits/breaks, etc, just by adding ggplot options with a + sign.

A custom theme function, theme_rise(), is included in the easyRasch package. It might be easier to use if you are not familiar with theme().

For instance, you might like to change the font to “Lato” for the item hierarchy figure, and make the background transparent.

Code

RIitemHierarchy(df) +
  theme_minimal() + # first apply the minimal theme to make the background transparent
  theme_rise(fontfamily = "Lato") # then apply theme_rise, which simplifies making changes to all plot elements

As of package version 0.1.30.0, the RItargeting() function allows more flexibility in styling too, by having an option to return a list object with the three separate plots. See the NEWS file for more details. Since the RItargeting() function uses the patchwork library to combine plots, you can also make use of the many functions that patchwork includes. For instance, you can set a title with a specific theme:

Code

RItargeting(df) + plot_annotation(title = "Targeting", theme = theme_rise(fontfamily = "Arial"))

In order to change font for text inside plots (such as “t1” for thresholds) you will need to add an additional line of code.

update_geom_defaults("text", list(family = "Lato"))

Please note that the line of code above updates the default settings for geom_text() for the whole session. Also, some functions, such as RIloadLoc(), make use of geom_text_repel(), for which you would need to change the code above from “text” to “text_repel”.

A simple way to only change font family and font size would be to use theme_minimal(base_family = "Calibri", base_size = 14). Please see the reference page for default ggplot themes for alternatives to theme_minimal().

12 Software used

The grateful package is a nice way to give credit to the packages used in making the analysis. The package can create both a bibliography file and a table object, which is handy for automatically creating a reference list based on the packages used (or at least explicitly loaded).

Code

library(grateful)
pkgs <- cite_packages(cite.tidyverse = TRUE, 
                      output = "table",
                      bib.file = "grateful-refs.bib",
                      include.RStudio = TRUE,
                      out.dir = getwd())
# If kbl() is used to generate this table, the references will not be added to the Reference list.
formattable(pkgs, 
            table.attr = 'class=\"table table-striped\" style="font-size: 13px; font-family: Lato; width: 80%"')

Package	Version	Citation
base	4.4.3	R Core Team (2025)
car	3.1.2	Fox & Weisberg (2019)
easyRasch	0.4.1	Johansson (2025b)
eRm	1.0.10	Mair & Hatzinger (2007b); Mair & Hatzinger (2007a); Hatzinger & Rusch (2009); Rusch et al. (2013); Koller et al. (2015); Debelak & Koller (2019)
foreach	1.5.2	Microsoft & Weston (2022)
formattable	0.2.1	Ren & Russell (2021)
ggrepel	0.9.6	Slowikowski (2024)
glue	1.8.0	Hester & Bryan (2024)
gtsummary	2.4.0	Sjoberg et al. (2021)
iarm	0.4.3	Mueller (2022)
kableExtra	1.4.0	Zhu (2024)
knitr	1.48	Xie (2014); Xie (2015); Xie (2024)
lordif	0.3.3	Choi et al. (2016)
matrixStats	1.4.1	Bengtsson (2024)
mice	3.16.0	van Buuren & Groothuis-Oudshoorn (2011)
mirt	1.45.1	Chalmers (2012)
patchwork	1.3.2	Pedersen (2025)
PerFit	1.4.6	Tendeiro et al. (2016)
psych	2.5.6	William Revelle (2025)
psychotree	0.16.2	Trepte & Verbeet (2010); Strobl et al. (2011); Strobl et al. (2015a); Komboz et al. (2018); Wickelmaier & Zeileis (2018)
RASCHplot	0.1.0	Buchardt et al. (2022)
reshape	0.8.10	Wickham (2007)
rmarkdown	2.29	Xie et al. (2018); Xie et al. (2020); Allaire et al. (2024)
tidyverse	2.0.0	Wickham et al. (2019)

13 Additional credits

Thanks to my colleagues at RISE for providing feedback and testing the package on Windows and MacOS platforms. Also, thanks to Mike Linacre and Jeanette Melin for providing useful feedback to improve this vignette.

14 Session info

Code

sessionInfo()

R version 4.4.3 (2025-02-28)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.7.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Stockholm
tzcode source: internal

attached base packages:
 [1] parallel  grid      stats4    stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] PerFit_1.4.6      ltm_1.2-0         polycor_0.8-1     msm_1.7.1        
 [5] MASS_7.3-64       lordif_0.3-3      rms_6.8-1         Hmisc_5.2-3      
 [9] RASCHplot_0.1.0   knitr_1.48        readxl_1.4.5      car_3.1-2        
[13] carData_3.0-5     grateful_0.2.4    easyRasch_0.4.1   Bayesrel_0.7.7   
[17] TAM_4.3-25        CDM_8.2-6         doParallel_1.0.17 iterators_1.0.14 
[21] furrr_0.3.1       future_1.67.0     foreach_1.5.2     ggdist_3.3.3     
[25] janitor_2.2.1     iarm_0.4.3        hexbin_1.28.4     catR_3.17        
[29] glue_1.8.0        ggrepel_0.9.6     patchwork_1.3.2   reshape_0.8.10   
[33] matrixStats_1.4.1 psychotree_0.16-2 psychotools_0.7-4 partykit_1.2-24  
[37] mvtnorm_1.3-3     libcoin_1.0-10    psych_2.5.6       mirt_1.45.1      
[41] lattice_0.22-6    eRm_1.0-10        lubridate_1.9.4   forcats_1.0.0    
[45] stringr_1.5.2     dplyr_1.1.4       purrr_1.1.0       readr_2.1.5      
[49] tidyr_1.3.1       tibble_3.3.0      ggplot2_3.5.2     tidyverse_2.0.0  
[53] kableExtra_1.4.0  formattable_0.2.1

loaded via a namespace (and not attached):
  [1] bitops_1.0-8         RColorBrewer_1.1-3   tools_4.4.3         
  [4] backports_1.5.0      utf8_1.2.4           R6_2.5.1            
  [7] DT_0.33              vegan_2.6-8          sm_2.2-6.0          
 [10] mgcv_1.9-1           jomo_2.7-6           permute_0.9-7       
 [13] withr_3.0.2          gridExtra_2.3        progressr_0.14.0    
 [16] archive_1.1.8        quantreg_5.98        cli_3.6.3           
 [19] gt_1.0.0             sandwich_3.1-1       labeling_0.4.3      
 [22] sass_0.4.9           polspline_1.1.25     pbapply_1.7-2       
 [25] pbivnorm_0.6.0       systemfonts_1.2.3    commonmark_1.9.2    
 [28] foreign_0.8-88       relimp_1.0-5         svglite_2.1.3       
 [31] R.utils_2.12.3       parallelly_1.45.1    sessioninfo_1.2.3   
 [34] rstudioapi_0.17.1    generics_0.1.3       shape_1.4.6.1       
 [37] vroom_1.6.5          distributional_0.4.0 qvcalc_1.0.3        
 [40] Matrix_1.7-2         abind_1.4-5          R.methodsS3_1.8.2   
 [43] lifecycle_1.0.4      multcomp_1.4-26      yaml_2.3.10         
 [46] snakecase_0.11.1     inum_1.0-5           gtsummary_2.4.0     
 [49] ggstance_0.3.7       lavaan_0.6-20        irtoys_0.2.2        
 [52] promises_1.3.2       crayon_1.5.3         mitml_0.4-5         
 [55] cowplot_1.1.3        pillar_1.10.1        boot_1.3-31         
 [58] fda_6.1.8            vcdExtra_0.8-5       future.apply_1.20.0 
 [61] admisc_0.35          codetools_0.2-20     pan_1.9             
 [64] beepr_2.0            data.table_1.17.8    vcd_1.4-12          
 [67] vctrs_0.6.5          Rdpack_2.6.1         testthat_3.2.1.1    
 [70] cellranger_1.1.0     gtable_0.3.6         ks_1.14.2           
 [73] cachem_1.1.0         xfun_0.46            rbibutils_2.2.16    
 [76] mime_0.12            pracma_2.4.4         fds_1.8             
 [79] coda_0.19-4.1        reformulas_0.4.1     pcaPP_2.0-4         
 [82] survival_3.8-3       audio_0.1-11         TH.data_1.1-2       
 [85] nlme_3.1-167         bit64_4.0.5          LaplacesDemon_16.1.6
 [88] rprojroot_2.1.1      Deriv_4.1.3          KernSmooth_2.23-26  
 [91] rpart_4.1.24         colorspace_2.1-1     gnm_1.1-5           
 [94] nnet_7.3-20          mnormt_2.1.1         tidyselect_1.2.1    
 [97] bit_4.0.5            compiler_4.4.3       curl_7.0.0          
[100] glmnet_4.1-8         htmlTable_2.4.3      expm_0.999-9        
[103] mice_3.16.0          SparseM_1.84-2       xml2_1.3.6          
[106] checkmate_2.3.2      scales_1.4.0         lmtest_0.9-40       
[109] quadprog_1.5-8       digest_0.6.37        rainbow_3.8         
[112] minqa_1.2.7          rmarkdown_2.29       ca_0.71.1           
[115] htmltools_0.5.8.1    pkgconfig_2.0.3      base64enc_0.1-3     
[118] SimDesign_2.20.0     lme4_1.1-37          fastmap_1.2.0       
[121] rlang_1.1.6          htmlwidgets_1.6.4    shiny_1.10.0        
[124] farver_2.1.2         zoo_1.8-12           jsonlite_1.8.9      
[127] mclust_6.1.1         dcurver_0.9.2        R.oo_1.26.0         
[130] RCurl_1.98-1.16      magrittr_2.0.3       Formula_1.2-5       
[133] Rcpp_1.0.14          stringi_1.8.7        brio_1.1.5          
[136] plyr_1.8.9           listenv_0.9.1        splines_4.4.3       
[139] hms_1.1.3            ggpubr_0.6.0         ggsignif_0.6.4      
[142] markdown_1.13        reshape2_1.4.4       GPArotation_2024.3-1
[145] evaluate_1.0.1       renv_1.0.7           deSolve_1.40        
[148] nloptr_2.1.1         tzdb_0.4.0           httpuv_1.6.15       
[151] MatrixModels_0.5-3   cards_0.7.0          broom_1.0.10        
[154] xtable_1.8-4         rstatix_0.7.2        later_1.4.1         
[157] viridisLite_0.4.2    memoise_2.0.1        cluster_2.1.8       
[160] corrplot_0.92        timechange_0.3.0     globals_0.18.0      
[163] hdrcde_3.4           here_1.0.2

15 References

Allaire, J., Xie, Y., Dervieux, C., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., Wickham, H., Cheng, J., Chang, W., & Iannone, R. (2024). rmarkdown: Dynamic documents for r. https://github.com/rstudio/rmarkdown

Andersen, E. B. (1973). A goodness of fit test for the rasch model. Psychometrika, 38(1), 123–140. https://doi.org/10.1007/BF02291180

Anvari, F., Efendić, E., Olsen, J., Arslan, R. C., Elson, M., & Schneider, I. K. (2022). Bias in Self-Reports: An Initial Elevation Phenomenon. Social Psychological and Personality Science, 19485506221129160. https://doi.org/10.1177/19485506221129160

Artner, R. (2016). A simulation study of person-fit in the Rasch model. Psychological Test and Assessment Modeling, 58(3), 531–563.

Bengtsson, H. (2024). matrixStats: Functions that apply to rows and columns of matrices (and to vectors). https://CRAN.R-project.org/package=matrixStats

Bignardi, G., Kievit, R., & Bürkner, P.-C. (2025). A general method for estimating reliability using Bayesian Measurement Uncertainty. OSF. https://doi.org/10.31234/osf.io/h54k8_v1

Bjorner, J. B., Kreiner, S., Ware, J. E., Damsgaard, M. T., & Bech, P. (1998). Differential Item Functioning in the Danish Translation of the SF-36. Journal of Clinical Epidemiology, 51(11), 1189–1202. https://doi.org/10.1016/S0895-4356(98)00111-5

Buchardt, A.-S., Christensen, K. B., & Jensen, N. (2023). Visualizing Rasch item fit using conditional item characteristic curves in R. Psychological Test and Assessment Modeling, 65(2), 206–219.

Buchardt, A.-S., Jensen, S. N., & Christensen, K. B. (2022). RASCHplot: Visualisation tool for validity of rasch models. https://github.com/ERRTG/RASCHplot

Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06

Chalmers, R. P., Counsell, A., & Flora, D. B. (2016). It Might Not Make a Big DIF: Improved Differential Test Functioning Statistics That Account for Sampling Variability. Educational and Psychological Measurement, 76(1), 114–140. https://doi.org/10.1177/0013164415584576

Chen, W.-H., & Thissen, D. (1997). Local Dependence Indexes for Item Pairs Using Item Response Theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289. https://doi.org/10.2307/1165285

Choi, S. W., Gibbons, L. E., & Crane, P. K. (2011). Lordif: An R Package for Detecting Differential Item Functioning Using Iterative Hybrid Ordinal Logistic Regression/Item Response Theory and Monte Carlo Simulations. Journal of Statistical Software, 39(1), 1–30. https://doi.org/10.18637/jss.v039.i08

Choi, S. W., Laura E. Gibbons, with contributions from, & Crane, P. K. (2016). lordif: Logistic ordinal regression differential item functioning using IRT. https://CRAN.R-project.org/package=lordif

Chou, Y.-T., & Wang, W.-C. (2010). Checking Dimensionality in Item Response Models With Principal Component Analysis on Standardized Residuals. Educational and Psychological Measurement, 70(5), 717–731. https://doi.org/10.1177/0013164410379322

Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical Values for Yen’s Q3: Identification of Local Dependence in the Rasch Model Using Residual Correlations. Applied Psychological Measurement, 41(3), 178–194. https://doi.org/10.1177/0146621616677520

Debelak, R., & Koller, I. (2019). Testing the local independence assumption of the rasch model with Q3-based nonparametric model tests. Applied Psychological Measurement, 44. https://doi.org/10.1177/0146621619835501

Edwards, M. C., Houts, C. R., & Cai, L. (2018). A diagnostic procedure to detect departures from local independence in item response theory models. Psychological Methods, 23(1), 138–149. https://doi.org/10.1037/met0000121

Emons, W. H. M. (2008). Nonparametric Person-Fit Analysis of Polytomous Item Scores. Applied Psychological Measurement, 32(3), 224–247. https://doi.org/10.1177/0146621607302479

Fox, J., & Weisberg, S. (2019). An R companion to applied regression (Third). Sage. https://socialsciences.mcmaster.ca/jfox/Books/Companion/

Hatzinger, R., & Rusch, T. (2009). IRT models with relaxed assumptions in eRm: A manual-like instruction. Psychology Science Quarterly, 51.

Henninger, M., Radek, J., Sengewald, M.-A., & Strobl, C. (2024). Partial credit trees meet the partial gamma coefficient for quantifying DIF and DSF in polytomous items. OSF. https://doi.org/10.31234/osf.io/47sah

Hester, J., & Bryan, J. (2024). glue: Interpreted string literals. https://CRAN.R-project.org/package=glue

Johansson, M. (2025a). Detecting Item Misfit in Rasch Models. Educational Methods & Psychometrics, 3(18). https://doi.org/10.61186/emp.2025.5

Johansson, M. (2025b). easyRasch: Psychometric analysis in r with rasch measurement theory. https://github.com/pgmj/easyRasch

Johansson, M., Preuter, M., Karlsson, S., Möllerberg, M.-L., Svensson, H., & Melin, J. (2023). Valid and reliable? Basic and expanded recommendations for psychometric reporting and quality assessment. https://doi.org/10.31219/osf.io/3htzc

Koller, I., Maier, M., & Hatzinger, R. (2015). An empirical power analysis of quasi-exact tests for the rasch model: Measurement invariance in small samples. Methodology, 11. https://doi.org/10.1027/1614-2241/a000090

Komboz, B., Zeileis, A., & Strobl, C. (2018). Tree-based global model tests for polytomous Rasch models. Educational and Psychological Measurement, 78(1), 128–166. https://doi.org/10.1177/0013164416664394

Kreiner, S. (2011). A Note on Item–Restscore Association in Rasch Models. Applied Psychological Measurement, 35(7), 557–561. https://doi.org/10.1177/0146621611410227

Kreiner, S. (2025). On Specific Objectivity and Measurement by Rasch Models: A Statistical Viewpoint. Educational Methods & Psychometrics, 3. https://doi.org/10.61186/emp.2025.7

Kreiner, S., & Christensen, K. B. (2004). Analysis of Local Dependence and Multidimensionality in Graphical Loglinear Rasch Models. Communications in Statistics - Theory and Methods, 33(6), 1239–1276. https://doi.org/10.1081/STA-120030148

Mair, P., & Hatzinger, R. (2007a). CML based estimation of extended rasch models with the eRm package in r. Psychology Science, 49. https://doi.org/10.18637/jss.v020.i09

Mair, P., & Hatzinger, R. (2007b). Extended rasch modeling: The eRm package for the application of IRT models in r. Journal of Statistical Software, 20. https://doi.org/10.18637/jss.v020.i09

McNeish, D. (2018). Thanks coefficient alpha, we’ll take it from here. Psychological Methods, 23(3), 412–433. https://doi.org/10.1037/met0000144

McNeish, D., & Dumas, D. (2025). Reliability representativeness: How well does coefficient alpha summarize reliability across the score distribution? Behavior Research Methods, 57(3), 93. https://doi.org/10.3758/s13428-025-02611-8

Microsoft, & Weston, S. (2022). foreach: Provides foreach looping construct. https://CRAN.R-project.org/package=foreach

Milanzi, E., Molenberghs, G., Alonso, A., Verbeke, G., & De Boeck, P. (2015). Reliability measures in item response theory: Manifest versus latent correlation functions. British Journal of Mathematical and Statistical Psychology, 68(1), 43–64. https://doi.org/10.1111/bmsp.12033

Mislevy, R. J. (1991). Randomization-Based Inference about Latent Variables from Complex Samples. Psychometrika, 56(2), 177–196. https://doi.org/10.1007/BF02294457

Mueller, M. (2022). iarm: Item analysis in rasch models. https://CRAN.R-project.org/package=iarm

Mueller, M., & Santiago, P. H. R. (2022). Iarm: Item Analysis in Rasch Models. https://cran.r-project.org/web/packages/iarm/index.html

Müller, M. (2020). Item fit statistics for Rasch analysis: Can we trust them? Journal of Statistical Distributions and Applications, 7(1), 5. https://doi.org/10.1186/s40488-020-00108-7

Ostini, R., & Nering, M. (2006). Polytomous Item Response Theory Models. SAGE Publications, Inc. https://doi.org/10.4135/9781412985413

Pedersen, T. L. (2025). patchwork: The composer of plots. https://CRAN.R-project.org/package=patchwork

R Core Team. (2025). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Ren, K., & Russell, K. (2021). formattable: Create “Formattable” data structures. https://CRAN.R-project.org/package=formattable

Rusch, T., Maier, M., & Hatzinger, R. (2013). Linear logistic models with relaxed assumptions in r. In B. Lausen, D. van den Poel, & A. Ultsch (Eds.), Algorithms from and for nature and life. Springer. https://doi.org/10.1007/978-3-319-00035-0_34

Sijtsma, K. (2008). On the Use, the Misuse, and the Very Limited Usefulness of Cronbach’s Alpha. Psychometrika, 74(1), 107. https://doi.org/10.1007/s11336-008-9101-0

Sjoberg, D. D., Whiting, K., Curry, M., Lavery, J. A., & Larmarange, J. (2021). Reproducible summary tables with the gtsummary package. The R Journal, 13, 570–580. https://doi.org/10.32614/RJ-2021-053

Slowikowski, K. (2024). ggrepel: Automatically position non-overlapping text labels with “ggplot2”. https://CRAN.R-project.org/package=ggrepel

Smith, E. V. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3(2), 205–231.

Strobl, C., Kopf, J., & Zeileis, A. (2015a). Rasch trees: A new method for detecting differential item functioning in the Rasch model. Psychometrika, 80(2), 289–316. https://doi.org/10.1007/s11336-013-9388-3

Strobl, C., Kopf, J., & Zeileis, A. (2015b). Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model. Psychometrika, 80(2), 289–316. https://doi.org/10.1007/s11336-013-9388-3

Strobl, C., Schneider, L., Kopf, J., & Zeileis, A. (2021). Using the raschtree function for detecting differential item functioning in the Rasch model. 12.

Strobl, C., Wickelmaier, F., & Zeileis, A. (2011). Accounting for individual differences in Bradley-Terry models by means of recursive partitioning. Journal of Educational and Behavioral Statistics, 36(2), 135–153. https://doi.org/10.3102/1076998609359791

Tendeiro, J. N., Meijer, R. R., & Niessen, A. S. M. (2016). PerFit: An R package for person-fit analysis in IRT. Journal of Statistical Software, 74(5), 1–27. https://doi.org/10.18637/jss.v074.i05

Thissen, D. (2025). A Review of Some of the History of Factorial Invariance and Differential Item Functioning. Multivariate Behavioral Research, 60(2), 211–235. https://doi.org/10.1080/00273171.2024.2396148

Trepte, S., & Verbeet, M. (Eds.). (2010). Allgemeinbildung in Deutschland – erkenntnisse aus dem SPIEGEL Studentenpisa-Test. VS Verlag. https://doi.org/10.1007/978-3-531-92543-1

van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in r. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03

Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450. https://doi.org/10.1007/BF02294627

Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z

Wickham, H. (2007). Reshaping data with the reshape package. Journal of Statistical Software, 21(12). https://www.jstatsoft.org/v21/i12/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

William Revelle. (2025). psych: Procedures for psychological, psychometric, and personality research. Northwestern University. https://CRAN.R-project.org/package=psych

Wu, M. (2005). The role of plausible values in large-scale surveys. Studies in Educational Evaluation, 31(2-3), 114–128. https://doi.org/10.1016/j.stueduc.2005.05.005

Xie, Y. (2014). knitr: A comprehensive tool for reproducible research in R. In V. Stodden, F. Leisch, & R. D. Peng (Eds.), Implementing reproducible computational research. Chapman; Hall/CRC.

Xie, Y. (2015). Dynamic documents with R and knitr (2nd ed.). Chapman; Hall/CRC. https://yihui.org/knitr/

Xie, Y. (2024). knitr: A general-purpose package for dynamic report generation in r. https://yihui.org/knitr/

Xie, Y., Allaire, J. J., & Grolemund, G. (2018). R markdown: The definitive guide. Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown

Xie, Y., Dervieux, C., & Riederer, E. (2020). R markdown cookbook. Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook

Zhu, H. (2024). kableExtra: Construct complex table with “kable” and pipe syntax. https://CRAN.R-project.org/package=kableExtra

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{johansson2025,
  author = {Johansson, Magnus},
  title = {easyRasch Vignette},
  date = {2025-10-24},
  url = {https://pgmj.github.io/raschrvignette/RaschRvign.html},
  langid = {en}
}

For attribution, please cite this work as:

Johansson, M. (2025, October 24). easyRasch vignette. https://pgmj.github.io/raschrvignette/RaschRvign.html

1 Getting started

1.1 Loading data

1.2 Itemlabels

1.3 Demographics

2 Descriptives

2.1 Missing data

2.2 Overall responses

2.2.1 Floor/ceiling effects

2.2.2 Guttman structure

2.3 Item level descriptives

3 Rasch analysis 1

3.1 Analysis 1 comments

4 Rasch analysis 2

4.1 Analysis 2 comments

5 DIF - differential item functioning

5.1 Sex

5.2 Age

5.3 Group

5.4 Sex and age

5.5 LRT-based DIF

5.6 Conditional ICC

5.7 Logistic Ordinal Regression DIF

5.8 Partial gamma DIF

6 Rasch analysis 3

6.1 Analysis 3 comments

7 Reliability

7.1 RMU — Relative Measurement Uncertainty

7.2 Conditional reliability

8 Person fit

8.1 PerFit sample code

8.2 Item fit without aberrant responses

9 Item parameters

10 Ordinal sum score to interval score

10.1 Ordinal/interval figure

10.2 Estimating interval level person scores

11 Figure design

12 Software used

13 Additional credits

14 Session info

15 References

Reuse

Citation

8.1 `PerFit` sample code