RISEkbmRasch vignette

Magnus Johansson

RISEkbmRasch vignette

An R package for Rasch analysis

Author

Affiliation

Magnus Johansson

RISE Research Institutes of Sweden

Published

February 20, 2024

This is an introduction to using the RISEkbmRasch R package. A changelog for package updates is available here.

Details on installation are available at the package GitHub page. This vignette will walk through a sample analysis using an open dataset with polytomous questionnaire data. This will include some data wrangling to structure the item data and itemlabels, then provide examples of the different functions. The full source code of this document can be found either in this repository or by clicking on </> CODE at the top beside the table of contents. You should be able to use the source code “as is” and reproduce this document locally, as long as you have the required packages installed. This page and this website are built using the open source publishing tool Quarto.

One of the aims with this package is to simplify psychometric analysis to shed light on the measurement properties of a scale, questionnaire or test. In a paper recently made available as a preprint (Johansson et al., 2023), our research group propose that the basic aspects of a psychometric analysis should include information about:

Unidimensionality
Response categories
Invariance
Targeting
Measurement uncertainties (reliability)

We’ll include several ways to investigate these measurement properties, using Rasch Measurement Theory. There are also functions in the package less directly related to the criteria above, that will be shown in this vignette.

Please note that this is a sample analysis to showcase the R package. It is not intended as a “best practice” psychometric analysis example.

You can skip ahead to the Rasch analysis part in Section 3 if you are eager to look at the package output :)

If you are new to Rasch Measurement Theory, you may find this intro presentation useful: https://pgmj.github.io/RaschIRTlecture/slides.html

1 Getting started

Since the package is intended for use with Quarto, this vignette has also been created with Quarto. A “template” .qmd file is available that can be useful to have handy for copy&paste when running a new analysis. You can also download a complete copy of the Quarto/R code to produce this document here.

Loading the RISEkbmRasch package should automatically load all the packages it depends on. However, it could be desirable to explicitly load all packages used, to simplify the automatic creation of citations for them, using the grateful package (see Section 13).

Code

library(RISEkbmRasch) # devtools::install_github("pgmj/RISEkbmRasch")
library(grateful)
library(ggrepel)
library(car)
library(kableExtra)
library(readxl)
library(tidyverse)
library(eRm)
library(mirt)
library(psych)
library(ggplot2)
library(psychotree)
library(matrixStats)
library(reshape)
library(knitr)
library(patchwork)
library(formattable) 
library(glue)
library(foreach)

Note

Quarto automatically adds links to R packages and functions throughout this document. However, this feature only works properly for packages available on CRAN. Since the RISEkbmRasch package is not on CRAN the links related to functions starting with RI will not work.

1.1 Loading data

We will use data from a recent paper investigating the “initial elevation effect” (Anvari et al., 2022), and focus on the 10 negative items from the PANAS. The data is available at the OSF website.

Code

df.all <- read_csv("https://osf.io/download/6fbr5/")
# if you have issues with the link, please try downloading manually using the same URL as above
# and read the file from your local drive.

# subset items and demographic variables
df <- df.all %>% 
  select(starts_with("PANASD2_1"),
         starts_with("PANASD2_20"),
         age,Sex,Group) %>% 
  select(!PANASD2_10_Active) %>% 
  select(!PANASD2_1_Attentive)

The glimpse() function provides a quick overview of our dataframe.

Code

glimpse(df)

Rows: 1,856
Columns: 13
$ PANASD2_11_Distressed <dbl> 2, 2, 2, 1, 2, 2, 4, 1, 1, 3, 1, 4, 2, 4, 4, 1, …
$ PANASD2_12_Upset      <dbl> 1, 1, 4, 1, 1, 5, 2, 1, 2, 2, 2, 3, 1, 3, 5, 1, …
$ PANASD2_13_Hostile    <dbl> 1, 1, 2, 1, 1, 3, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, …
$ PANASD2_14_Irritable  <dbl> 1, 1, 3, 1, 2, 5, 3, 1, 2, 4, 2, 3, 1, 2, 3, 1, …
$ PANASD2_15_Scared     <dbl> 1, 1, 3, 1, 1, 4, 1, 1, 1, 2, 2, 2, 1, 4, 4, 1, …
$ PANASD2_16_Afraid     <dbl> 1, 1, 4, 1, 1, 3, 1, 1, 1, 3, 1, 2, 1, 4, 4, 1, …
$ PANASD2_17_Ashamed    <dbl> 1, 1, 2, 1, 1, 3, 1, 1, 1, 2, 1, 4, 1, 1, 3, 1, …
$ PANASD2_18_Guilty     <dbl> 2, 1, 2, 1, 1, 3, 3, 1, 1, 3, 1, 4, 1, 1, 3, 1, …
$ PANASD2_19_Nervous    <dbl> 1, 1, 2, 1, 2, 4, 4, 1, 1, 4, 2, 4, 2, 1, 5, 1, …
$ PANASD2_20_Jittery    <dbl> 1, 2, 3, 1, 1, 2, 3, 3, 2, 1, 2, 2, 1, 1, 4, 1, …
$ age                   <dbl> 27, 32, 21, 27, 20, 22, 23, 25, 21, 26, 38, 36, …
$ Sex                   <chr> "Male", "Male", "Female", "Male", "Male", "Male"…
$ Group                 <chr> "Later Start", "Later Start", "Later Start", "La…

We have 1856 rows, ie. respondents. All variables except Sex and Group are of class dbl, which means they are numeric and can have decimals. Integer (numeric with no decimals) would also be fine for our purposes. The two demographic variables currently of class chr (character) will need to be converted to factors (fct), and we will do that later on.

(If you import a dataset where item variables are of class character, you will need to recode to numeric.)

1.2 Itemlabels

Then we set up the itemlabels dataframe. This could also be done using the free LibreOffice Calc or MS Excel. Just make sure the file has the same structure, with two variables named itemnr and item that contain the item variable names and item description. The item variable names have to match the variable names in the item dataframe.

Code

itemlabels <- df %>% 
  select(starts_with("PAN")) %>% 
  names() %>% 
  as_tibble() %>% 
  separate(value, c(NA, "item"), sep ="_[0-9][0-9]_") %>% 
  mutate(itemnr = paste0("PANAS_",c(11:20)), .before = "item")

The itemlabels dataframe looks like this.

Code

itemlabels

# A tibble: 10 × 2
   itemnr   item      
   <chr>    <chr>     
 1 PANAS_11 Distressed
 2 PANAS_12 Upset     
 3 PANAS_13 Hostile   
 4 PANAS_14 Irritable 
 5 PANAS_15 Scared    
 6 PANAS_16 Afraid    
 7 PANAS_17 Ashamed   
 8 PANAS_18 Guilty    
 9 PANAS_19 Nervous   
10 PANAS_20 Jittery

1.3 Demographics

Variables for invariance tests such as Differential Item Functioning (DIF) need to be separated into vectors (ideally as factors with specified levels and labels) with the same length as the number of rows in the dataset. This means that any kind of removal of respondents/rows with missing data needs to be done before separating the DIF variables.

We need to check how the Sex variable has been coded and which responses are present in the data.

Code

table(df$Sex)


  CONSENT REVOKED      DATA EXPIRED            Female              Male 
                2                 1               896               955 
Prefer not to say 
                2

Since there are only 5 respondents using labels outside of Female/Male (too few for meaningful statistical analysis), we will remove them to have a complete dataset for all variables in this example.

Code

df <- df %>% 
  filter(Sex %in% c("Female","Male"))

Let’s make the variable a factor (instead of class “character”) and put in in a vector separate from the item dataframe.

Code

dif.sex <- factor(df$Sex)

And remove our DIF demographic variable from the item dataset.

Code

df$Sex <- NULL

We can now make use of a very simple function included in this package!

Code

RIdemographics(dif.sex, "Sex")

Sex	n	Percent
Female	896	48.4
Male	955	51.6

Let’s move on to the age variable.

Code

glimpse(df$age)

 num [1:1851] 27 32 21 27 20 22 23 25 21 26 ...

Sometimes age is provided in categories, but here we have a numeric variable with age in years. Let’s have a quick look at the age distribution using a histogram, and calculate mean, sd and range.

Code

### simpler version of the ggplot below
# hist(df$age, col = "#009ca6")
# 
# df %>% 
#   summarise(Mean = round(mean(age, na.rm = T),1),
#             StDev = round(sd(age, na.rm = T),1)
#             )

ggplot(df) +
  geom_histogram(aes(x = age), 
                 fill = "#009ca6",
                 col = "black") +
  # add the average as a vertical line
  geom_vline(xintercept = mean(df$age), 
             linewidth = 1.5,
             linetype = 2,
             col = "orange") +
  # add a light grey field indicating the standard deviation
  annotate("rect", ymin = 0, ymax = Inf, 
           xmin = (mean(df$age) - sd(df$age)), xmax = (mean(df$age) + sd(df$age)), 
           alpha = .2) +
  labs(title = "",
       x = "Age in years",
       y = "Number of respondents",
       caption = glue("Note. Mean age is {round(mean(df$age, na.rm = T),1)} years with a standard deviation of {round(sd(df$age, na.rm = T),1)}. Age range is {min(df$age)} to {max(df$age)}.")
       ) +
  theme(plot.caption = element_text(hjust = 0, face = "italic"))

Age also needs to be a separate vector, and removed from the item dataframe.

Code

dif.age <- df$age
df$age <- NULL

There is also a grouping variable which needs to be converted to a factor.

Code

dif.group <- factor(df$Group)
df$Group <- NULL
RIdemographics(dif.group, "Group")

Group	n	Percent
Earlier Start	901	48.7
Later Start	950	51.3

With only item data remaining in the dataframe, we can easily rename the items in the item dataframe. These names match the itemlabels variable itemnr.

Code

names(df) <- itemlabels$itemnr

Now we are all set for the psychometric analysis!

2 Descriptives

Let’s familiarize ourselves with the data before diving into the analysis.

2.1 Missing data

First, we visualize the proportion of missing data on item level.

Code

RImissing(df)

No missing data in this dataset. If we had missing data, we could also use RImissingP() to look at which respondents have missing data and how much.

2.2 Overall responses

This provides us with an overall picture of the data distribution. As a bonus, any oddities/mistakes in recoding the item data from categories to numbers will be clearly visible.

Code

RIallresp(df)

Response category	Number of responses	Percent
1	9430	50.9
2	4136	22.3
3	2676	14.5
4	1722	9.3
5	546	2.9

Most R packages for Rasch analysis require the lowest response category to be zero, which makes it necessary for us to recode our data, from using the range of 1-5 to 0-4.

Code

df <- df %>% 
  mutate(across(everything(), ~ car::recode(.x, "1=0;2=1;3=2;4=3;5=4", as.factor = F)))

# always check that your recoding worked as intended.
RIallresp(df)

Response category	Number of responses	Percent
0	9430	50.9
1	4136	22.3
2	2676	14.5
3	1722	9.3
4	546	2.9

2.2.1 Floor/ceiling effects

Now, we can also look at the raw distribution of sum scores. The RIrawdist() function is a bit crude, since it requires responses in all response categories to accurately calculate max and min scores.

Code

RIrawdist(df)

We can see a floor effect with 11.8% of participants responding in the lowest category for all items.

2.2.2 Guttman structure

While not really necessary, it could be interesting to see whether the response patterns follow a Guttman-like structure. Items and persons are sorted based on lower->higher responses, and we should see the color move from yellow in the lower left corner to blue in the upper right corner.

Code

RIheatmap(df) +
  theme(axis.text.x = element_blank())

In this data, we see the floor effect on the left, with 11.8% of respondents all yellow, and a rather weak Guttman structure. This could also be due to a low variation in item locations/difficulties. Since we have a very large sample I added a theme() option to remove the x-axis text, which would anyway just be a blur of the 1851 respondent row numbers. Each thin vertical slice in the figure is one respondent.

2.3 Item level descriptives

There are many ways to look at the item level data, and we’ll get them all together in the tab-panel below. The RItileplot() is probably most informative, since it provides the number of responses in each response category for each item. It is usually recommended to have at least ~10 responses in each category for psychometric analysis, no matter which methodology is used.

Kudos to Solomon Kurz for providing the idea and code on which the tile plot function is built!

Most people will be familiar with the barplot, and this is probably most intuitive to understand the response distribution within each item. However, if there are many items it will take a while to review, and does not provide the same overview as a tileplot or stacked bars.

Code

```{r}
#| column: margin
#| code-fold: true

# This code chunk creates a small table in the margin beside the panel-tabset output below, showing all items currently in the df dataframe.
# The Quarto code chunk option "#| column: margin" is necessary for the layout to work as intended.
RIlistItemsMargin(df, fontsize = 13)
```

itemnr	item
PANAS_11	Distressed
PANAS_12	Upset
PANAS_13	Hostile
PANAS_14	Irritable
PANAS_15	Scared
PANAS_16	Afraid
PANAS_17	Ashamed
PANAS_18	Guilty
PANAS_19	Nervous
PANAS_20	Jittery

Code

RItileplot(df)

While response patterns are skewed for all items, there are more than 10 responses in each category for all items which is helpful for the analysis.

Code

RIbarstack(df) +
  theme_minimal() + # theming is optional, see section 11 for more on this
  theme_rise()

Code

RIbarplot(df)

Code

library(TAM)
tam1 <- tam(as.matrix(df), irtmodel = "PCM", verbose = FALSE) # run TAM Rasch Partial Credit Model on our data, which uses Marginal Maximum Likelihood estimation
plot(tam1) # create ICC plots

Iteration in WLE/MLE estimation  1   | Maximal change  2.949 
Iteration in WLE/MLE estimation  2   | Maximal change  2.0344 
Iteration in WLE/MLE estimation  3   | Maximal change  0.7294 
Iteration in WLE/MLE estimation  4   | Maximal change  0.1271 
Iteration in WLE/MLE estimation  5   | Maximal change  0.0034 
Iteration in WLE/MLE estimation  6   | Maximal change  1e-04 
Iteration in WLE/MLE estimation  7   | Maximal change  0 
----
 WLE Reliability= 0.761

....................................................
 Plots exported in png format into folder:
 /Users/magnuspjo/Library/CloudStorage/OneDrive-RISE/Dokument/R/RISEkbmVignette/raschrvignette/Plots

The expected value curves are made using the TAM package, which uses Marginal Maximum Likelihood (MML) estimation. It is a good way to check if any of your items may need reversed response categories, amongst other things.

3 Rasch analysis 1

The eRm package and Conditional Maximum Likelihood (CML) estimation will be used primarily, with the Partial Credit Model since this is polytomous data.

This is also where the five basic psychometric aspects are good to recall.

Unidimensionality
Response categories
Invariance
Targeting
Measurement uncertainties (reliability)

We will begin by looking at unidimensionality, response categories, and targeting in parallel below. For unidimensionality, we are mostly interested in item fit and residual correlations, as well as PCA of residuals and loadings on the first residual contrast. At the same time, disordered response categories can influence item fit, and targeting can be useful if it is necessary to remove items due to residual correlations.

When unidimensionality and response categories are found to work adequately, we will move on to invariance testing. And when/if invariance looks good, we can investigate reliability/measurement uncertainties.

In the tabset-panel below, each tab will have some explanatory text.

itemnr	item
PANAS_11	Distressed
PANAS_12	Upset
PANAS_13	Hostile
PANAS_14	Irritable
PANAS_15	Scared
PANAS_16	Afraid
PANAS_17	Ashamed
PANAS_18	Guilty
PANAS_19	Nervous
PANAS_20	Jittery

Code

RIitemfitPCM2(df, 300, 32, cpu = 8)

	OutfitMSQ	InfitMSQ	OutfitZSTD	InfitZSTD
PANAS_11	1.147	1.106	1.28	1.417
PANAS_12	0.776	0.776	-1.906	-2.671
PANAS_13	1.238	1.093	1.648	0.715
PANAS_14	1.009	0.978	0.206	-0.255
PANAS_15	0.663	0.724	-2.618	-3.231
PANAS_16	0.709	0.735	-2.368	-3.015
PANAS_17	0.84	0.873	-0.938	-1.086
PANAS_18	1.06	0.999	0.226	-0.077
PANAS_19	0.842	0.842	-1.743	-1.98
PANAS_20	1.161	1.123	1.337	1.321

Since we have a sample size over 500, ZSTD item fit values would be inflated if we use the whole sample. To better estimate accurate ZSTD values the RIitemfitPCM() function allows for multiple subsampling. It is recommended to use a sample size between 250 and 500 (Hagell & Westergren, 2016). We will set the sample size to 300 and run 32 subsamples. If you just want to test things out, I highly recommend lowering the number 32 to 4 to enable faster rendering.

For faster processing, RIitemfitPCM2() enables parallel processing with multiple CPUs/cores. You can check how many available cores you have by running parallel::detectCores(). It is recommended to not use all of them (leave 1 or 2 free). There may be issues with multicore parallel processing, especially when having few responses in some response categories. If you run in to errors, try increasing the sample size, or just use the single cpu/core function RIitemfitPCM() instead.

“Outfit” refers to item fit when person location is relatively far away from the item location, while “infit” provides estimates for when person and item locations are close together. MSQ should be close to 1, with lower and upper cutoffs set to 0.7 and 1.3 as default values, while ZSTD should be around 0, with default cutoffs set to +/- 2.0. Infit is usually more important. You can change the cutoff values by using options in the function, see ?RIitemfitPCM for details.

A low item fit indicates that responses are too predictable and provide little information. A high item fit can indicate several things, most often multidimensionality or, for questionnaires, a question that is difficult to interpret. This could for instance be a question that asks about two things at the same time.

Code

RIpcmPCA(df)

PCA of Rasch model residuals
Eigenvalues
1.79
1.47
1.28
1.14
1.06

The first eigenvalue should be below 2.0 to support unidimensionality.

Code

RIresidcorr(df, cutoff = 0.2)

	PANAS_11	PANAS_12	PANAS_13	PANAS_14	PANAS_15	PANAS_16	PANAS_17	PANAS_18	PANAS_19
PANAS_11
PANAS_12	-0.1
PANAS_13	-0.05	-0.01
PANAS_14	-0.11	0.09	0.07
PANAS_15	-0.14	-0.13	-0.22	-0.29
PANAS_16	-0.17	-0.1	-0.25	-0.27	0.38
PANAS_17	-0.18	-0.09	-0.09	-0.19	-0.13	-0.08
PANAS_18	-0.19	-0.15	-0.16	-0.18	-0.15	-0.13	0.32
PANAS_19	-0.13	-0.13	-0.25	-0.14	0.1	0.08	-0.21	-0.12
PANAS_20	-0.06	-0.22	-0.07	-0.05	-0.13	-0.19	-0.16	-0.15	-0.07
Note:
Relative cut-off value (highlighted in red) is 0.098, which is 0.2 above the average correlation.

The matrix above shows item-pair correlations of item residuals, with highlights in red showing correlations 0.2 or more above the average item-pair correlation (for all item-pairs) (Christensen et al., 2017). Rasch model residual correlations are calculated using the mirt package. Again, you can set the cutoff value you desire in the function call, which will affect the values highlighted in the correlation matrix table and the caption text.

Code

RIloadLoc(df)

Here we see item locations and their loadings on the first residual contrast. This figure can be helpful to identify clusters in data or multidimensionality.

The xlims setting changes the x-axis limits for the ICC plots. The default values usually make sense, and we mostly add this option to point out the possibility of doing so. You can also choose to only show the ICC plots for specific items.

Code

RIitemCats(df, xlims = c(-5,5))

Each response category for each item should have a curve that indicates it to be the most probably response at some point on the latent variable (x axis in the figure).

For a more compact figure.

Code

mirt(df, model=1, itemtype='Rasch', verbose = FALSE) %>% 
  plot(type="trace", as.table = TRUE, 
       theta_lim = c(-5,5))

Code

# increase fig-height in the chunk option above if you have many items
RItargeting(df, xlim = c(-5,4)) # xlim defaults to c(-5,6) if you omit this option

This figure shows how well the items fit the respondents/persons. It is a sort of Wright Map that shows person locations and item threshold locations on the same logit scale.

The top part shows person location histogram, the middle part an inverted histogram of item threshold locations, and the bottom part shows individual item threshold locations. The histograms also show means and standard deviations.

Here the items are sorted on their average threshold location (black diamonds). 95% confidence intervals are shown around each item threshold location. For further details, see the caption text below the figure.

The numbers displayed in the plot can be disabled using the option numbers = FALSE.

Code

RIitemHierarchy(df)

3.1 Analysis 1 comments

Item 18 has issues with the second lowest category being disordered.

Item 15 shows low item fit.

Two item-pairs show residual correlations above the cutoff value:

15 and 16 (scared and afraid)
17 and 18 (ashamed and guilty)

Since item 15 also had low item fit, we will remove it. In the second pair, item 18 will be removed since it also had problems with disordered response categories.

Code

removed.items <- c("PANAS_15","PANAS_18")

df2 <- df %>% 
  select(!any_of(removed.items))

As seen in the code above, I chose to create a copy of the dataframe with the removed items omitted. This can be useful if, at a later stage in the analysis, I want to be able to quickly “go back” and reinstate an item.

4 Rasch analysis 2

With items 15 and 18 removed.

itemnr	item
PANAS_11	Distressed
PANAS_12	Upset
PANAS_13	Hostile
PANAS_14	Irritable
PANAS_16	Afraid
PANAS_17	Ashamed
PANAS_19	Nervous
PANAS_20	Jittery

Code

RIitemfitPCM2(df2, 300, 32, cpu = 8)

	OutfitMSQ	InfitMSQ	OutfitZSTD	InfitZSTD
PANAS_11	1.06	1.032	0.716	0.33
PANAS_12	0.713	0.722	-2.69	-3.302
PANAS_13	1.129	1.01	0.997	0.101
PANAS_14	0.908	0.878	-0.745	-1.17
PANAS_16	0.757	0.788	-2.233	-2.319
PANAS_17	0.898	0.926	-0.689	-0.827
PANAS_19	0.845	0.838	-1.609	-1.991
PANAS_20	1.083	1.06	0.791	0.429

Code

RIpcmPCA(df2)

PCA of Rasch model residuals
Eigenvalues
1.52
1.33
1.19
1.15
1.00

Code

RIresidcorr(df2, cutoff = 0.2)

	PANAS_11	PANAS_12	PANAS_13	PANAS_14	PANAS_16	PANAS_17	PANAS_19
PANAS_11
PANAS_12	-0.16
PANAS_13	-0.11	-0.06
PANAS_14	-0.19	0.03	0.01
PANAS_16	-0.16	-0.08	-0.25	-0.28
PANAS_17	-0.18	-0.06	-0.09	-0.19	0
PANAS_19	-0.15	-0.15	-0.28	-0.18	0.12	-0.16
PANAS_20	-0.1	-0.27	-0.13	-0.11	-0.18	-0.15	-0.09
Note:
Relative cut-off value (highlighted in red) is 0.071, which is 0.2 above the average correlation.

Code

RIloadLoc(df2)

Code

RItargeting(df2, xlim = c(-4,4), bins = 45)

Code

RIitemHierarchy(df2)

4.1 Analysis 2 comments

Items 12 and 16 are a bit low in item fit ZSTD.

Items 16 and 19 have a residual correlation at about 0.25 above the average level.

Let’s look at DIF before taking action upon this information.

5 DIF - differential item functioning

We’ll be looking at how/if item (threshold) locations are stable between demographic subgroups.

5.1 Sex

itemnr	item
PANAS_11	Distressed
PANAS_12	Upset
PANAS_13	Hostile
PANAS_14	Irritable
PANAS_16	Afraid
PANAS_17	Ashamed
PANAS_19	Nervous
PANAS_20	Jittery

Code

RIdifTable(df2, dif.sex)

Item	2	3	Mean location	StDev	MaxDiff
PANAS_11	-0.314	-0.196	-0.255	0.083	0.117
PANAS_12	0.028	-0.044	-0.008	0.051	0.073
PANAS_13	0.553	0.402	0.478	0.107	0.151
PANAS_14	-0.328	-0.183	-0.255	0.103	0.146
PANAS_16	0.004	0.114	0.059	0.078	0.111
PANAS_17	0.520	0.290	0.405	0.163	0.230
PANAS_19	-0.495	-0.355	-0.425	0.099	0.140
PANAS_20	0.032	-0.028	0.002	0.042	0.059

Code

RIdifFigure(df2, dif.sex)

Code

RIdifFigThresh(df2, dif.sex)

While no item shows problematic levels of DIF regarding item location, as shown by the table, there is an interesting pattern in the thresholds figure. The lowest threshold seems to be slightly lower for node 3 (Male) for all items.

The results do not require any action since the difference is small.

5.2 Age

The psychotree package uses a model-based recursive partitioning that is particularly useful when you have a continuous variable such as age in years and a large enough sample. It will test different ways to partition the age variable to determine potential group differences (Strobl et al., 2015b, 2021).

Code

RIdifTable(df2, dif.age)

[1] "No significant DIF found."

No DIF found for age.

5.3 Group

Code

RIdifTable(df2, dif.group)

[1] "No significant DIF found."

And no DIF for group.

5.4 Sex and age

The psychotree package also allows for DIF interaction analysis with multiple DIF variables. We can use RIdifTable2() to input two DIF variables.

Code

RIdifTable2(df2, dif.sex, dif.age)

Item	2	3	Mean location	StDev	MaxDiff
PANAS_11	-0.314	-0.196	-0.255	0.083	0.117
PANAS_12	0.028	-0.044	-0.008	0.051	0.073
PANAS_13	0.553	0.402	0.478	0.107	0.151
PANAS_14	-0.328	-0.183	-0.255	0.103	0.146
PANAS_16	0.004	0.114	0.059	0.078	0.111
PANAS_17	0.520	0.290	0.405	0.163	0.230
PANAS_19	-0.495	-0.355	-0.425	0.099	0.140
PANAS_20	0.032	-0.028	0.002	0.042	0.059

No interaction effect found for sex and age. The analysis only shows the previously identified DIF for sex.

5.5 LRT-based DIF example

Note

As of package version 0.1.16 there are four new functions for analyzing item location DIF. These are all making use of the function LRtest() from the eRm package. And, since version 0.1.31.0, these are also correctly extracting item locations/threshold locations. Results will not be identical to the results from the previous functions that use the psychotree package, since they make some different choices in estimation. I refer the curious to the respective package’s documentation.

We’ll use the group variable as an example. First, we can simply run the test to get the overall result.

Code

erm.out <- PCM(df2)
LRtest(erm.out, splitcr = dif.group)


Andersen LR-test: 
LR-value: 46.864 
Chi-square df: 31 
p-value:  0.034

itemnr	item
PANAS_11	Distressed
PANAS_12	Upset
PANAS_13	Hostile
PANAS_14	Irritable
PANAS_16	Afraid
PANAS_17	Ashamed
PANAS_19	Nervous
PANAS_20	Jittery

Code

RIdifTableLR(df2, dif.group)

	Item locations				Standard errors
Item	Earlier Start	Later Start	MaxDiff	All	SE_Earlier Start	SE_Later Start	SE_All
PANAS_11	0.069	0.072	0.003	0.073	0.140	0.124	0.093
PANAS_12	0.33	0.346	0.016	0.342	0.155	0.133	0.101
PANAS_13	0.69	0.966	0.276	0.826	0.185	0.189	0.129
PANAS_14	0.118	0.054	0.064	0.084	0.142	0.121	0.092
PANAS_16	0.447	0.362	0.085	0.401	0.160	0.134	0.102
PANAS_17	0.66	0.822	0.162	0.751	0.184	0.172	0.125
PANAS_19	-0.028	-0.122	0.094	-0.083	0.136	0.117	0.088
PANAS_20	0.351	0.348	0.003	0.352	0.160	0.138	0.104
Note:
Values highlighted in red are above the chosen cutoff 0.5 logits. Background color brown and blue indicate the lowest and highest values among the DIF groups.

Code

RIdifFigureLR(df2, dif.group) + theme_rise()

Code

RIdifThreshTblLR(df2, dif.group)

	Threshold locations				Standard errors
Item threshold	Earlier Start	Later Start	MaxDiff	All	SE_Earlier Start	SE_Later Start	SE_All
PANAS_11
c1	-1.245	-1.241	0.004	-1.240	0.098	0.094	0.068
c2	-0.365	-0.221	0.144	-0.284	0.107	0.100	0.073
c3	0.281	0.096	0.185	0.180	0.131	0.114	0.086
c4	1.604	1.655	0.051	1.637	0.224	0.188	0.144
PANAS_12
c1	-0.484	-0.362	0.122	-0.418	0.092	0.091	0.065
c2	0.241	-0.198	0.439	-0.005	0.126	0.108	0.082
c3	0.479	0.456	0.023	0.467	0.169	0.129	0.103
c4	1.086	1.489	0.403	1.323	0.233	0.205	0.153
PANAS_13
c1	-0.067	0.23	0.297	0.093	0.092	0.089	0.064
c2	0.403	0.115	0.288	0.248	0.135	0.118	0.088
c3	0.889	1.042	0.153	0.983	0.197	0.164	0.126
c4	1.536	2.476	0.94	1.979	0.316	0.384	0.239
PANAS_14
c1	-1.017	-0.972	0.045	-0.990	0.095	0.094	0.066
c2	-0.134	-0.272	0.138	-0.205	0.111	0.103	0.076
c3	0.456	0.095	0.361	0.239	0.146	0.116	0.091
c4	1.168	1.366	0.198	1.294	0.216	0.173	0.135
PANAS_16
c1	-0.156	-0.25	0.094	-0.202	0.097	0.091	0.066
c2	0.009	-0.091	0.1	-0.046	0.130	0.112	0.085
c3	0.324	0.354	0.03	0.344	0.159	0.132	0.102
c4	1.611	1.435	0.176	1.508	0.253	0.201	0.157
PANAS_17
c1	0.264	0.368	0.104	0.324	0.097	0.089	0.066
c2	0.389	0.421	0.032	0.412	0.146	0.128	0.096
c3	0.804	0.955	0.151	0.894	0.205	0.181	0.136
c4	1.182	1.545	0.363	1.373	0.288	0.290	0.204
PANAS_19
c1	-1.339	-1.263	0.076	-1.297	0.100	0.098	0.070
c2	-0.388	-0.27	0.118	-0.323	0.108	0.105	0.075
c3	0.101	-0.333	0.434	-0.143	0.127	0.111	0.083
c4	1.512	1.378	0.134	1.430	0.207	0.156	0.125
PANAS_20
c1	-0.906	-0.877	0.029	-0.887	0.093	0.090	0.065
c2	-0.189	-0.257	0.068	-0.223	0.108	0.099	0.073
c3	1.037	0.585	0.452	0.760	0.162	0.123	0.098
c4	1.463	1.941	0.478	1.756	0.277	0.238	0.180
Note:
Values highlighted in red are above the chosen cutoff 0.5 logits. Background color brown and blue indicate the lowest and highest values among the DIF groups.

Code

RIdifThreshFigLR(df2, dif.group) + theme_rise()

The item threshold table shows that the highest thresholds for items 13 and 17 differ more than 0.5 logits between groups. In this set of 8 items with 4 thresholds each, it is unlikely to result in problematic differences in estimated person scores.

6 Rasch analysis 3

While there were no significant issues with DIF for any item/subgroup combination, we need to address the previously identified problems

Items 12 and 16 are a bit low in item fit ZSTD.

Items 16 and 19 have a residual correlation at about 0.25 above the average level.

We’ll remove item 19 since item 16 has better targeting.

Code

removed.items <- c(removed.items,"PANAS_19")

df2 <- df2 %>% 
  select(!any_of(removed.items))

itemnr	item
PANAS_11	Distressed
PANAS_12	Upset
PANAS_13	Hostile
PANAS_14	Irritable
PANAS_16	Afraid
PANAS_17	Ashamed
PANAS_20	Jittery

Code

RIitemfitPCM2(df2, 350, 32, 8)

	OutfitMSQ	InfitMSQ	OutfitZSTD	InfitZSTD
PANAS_11	1.01	1.002	0.082	0.299
PANAS_12	0.686	0.695	-3.422	-3.925
PANAS_13	1.001	0.941	0.197	-0.604
PANAS_14	0.86	0.841	-1.528	-2.184
PANAS_16	0.834	0.83	-1.465	-2.005
PANAS_17	0.858	0.879	-1.014	-1.394
PANAS_20	1.07	1.047	0.819	0.34

Code

RIresidcorr(df2, cutoff = 0.2)

	PANAS_11	PANAS_12	PANAS_13	PANAS_14	PANAS_16	PANAS_17
PANAS_11
PANAS_12	-0.18
PANAS_13	-0.15	-0.11
PANAS_14	-0.22	0.01	-0.04
PANAS_16	-0.14	-0.06	-0.26	-0.26
PANAS_17	-0.2	-0.09	-0.14	-0.22	0
PANAS_20	-0.12	-0.29	-0.16	-0.13	-0.15	-0.17
Note:
Relative cut-off value (highlighted in red) is 0.053, which is 0.2 above the average correlation.

Code

RItargeting(df2, bins = 45)

Code

RIitemHierarchy(df2)

6.1 Analysis 3 comments

No problematic residual correlations, but item 12 is a bit low in item fit.

7 Rasch analysis 4

Code

removed.items <- c(removed.items,"PANAS_12")

df2 <- df2 %>% 
  select(!any_of(removed.items))

itemnr	item
PANAS_11	Distressed
PANAS_13	Hostile
PANAS_14	Irritable
PANAS_16	Afraid
PANAS_17	Ashamed
PANAS_20	Jittery

Code

RIitemfitPCM2(df2, 350, 32, 8)

	OutfitMSQ	InfitMSQ	OutfitZSTD	InfitZSTD
PANAS_11	0.926	0.932	-0.756	-1.093
PANAS_13	0.907	0.894	-0.611	-1.08
PANAS_14	0.85	0.83	-1.662	-2.31
PANAS_16	0.795	0.8	-1.769	-2.501
PANAS_17	0.802	0.834	-1.646	-1.766
PANAS_20	0.96	0.946	-0.59	-0.534

Code

RIresidcorr(df2, cutoff = 0.2)

	PANAS_11	PANAS_13	PANAS_14	PANAS_16	PANAS_17
PANAS_11
PANAS_13	-0.17
PANAS_14	-0.22	-0.03
PANAS_16	-0.15	-0.26	-0.24
PANAS_17	-0.22	-0.14	-0.21	0.01
PANAS_20	-0.17	-0.2	-0.16	-0.19	-0.2
Note:
Relative cut-off value (highlighted in red) is 0.03, which is 0.2 above the average correlation.

Code

RItargeting(df2)

Code

RIitemHierarchy(df2)

7.1 Analysis 4 comments

There are several item thresholds that are very closely located, as shown in the item hierarchy figure. This is not ideal, since it will inflate reliability estimates.

However, we will not modify the response categories for this sample/simple analysis, we only note that this is not ideal.

8 Reliability

Code

RItif(df2)

The figure above shows the Test Information Function (TIF), which indicates the reliability of all items making up the test/scale (not the reliability of the sample).

The default cutoff value used in RItif() is TIF = 3.33, which corresponds to person separation index (PSI) = 0.7. PSI is similar to reliability coefficients such as omega and alpha, ranging from 0 to 1. You can change the TIF cutoff by using the option cutoff, for instance cutoff = 2.5 (TIF values range from 1 and up).

While 11.8% of respondents had a floor effect based on the raw sum scored data, the figure above shows us that 41.8% are located below the point where the items produce a PSI of 0.7 or higher. Again, note that this figure shows the reliability of the test/scale, not the sample. If you want to add the sample reliability use option samplePSI = TRUE. More details are available in the documentation ?RItif.

9 Person fit

We can also look at how the respondents fit the Rasch model with these items.

Code

RIpfit(df2)

10 Item parameters

To allow others (and oneself) to use the item parameters estimated for estimation of person locations/thetas, we should make the item parameters available. The function will also write a csv-file with the item threshold locations. Estimations of person locations/thetas can be done with the thetaEst() function from the catR package.

It can also be done by using the new (as of 2023-02-04) RIestTheta() function in this package (does not yet work with dichotomous data), which uses thetaEst() across all the participants in your dataframe.

First, we’ll output the parameters into a table.

Code

RIitemparams(df2)

	Threshold 1	Threshold 2	Threshold 3	Threshold 4	Item location
PANAS_11	-1.24	-0.33	0.07	1.47	-0.01
PANAS_13	0.06	0.15	0.82	1.77	0.7
PANAS_14	-0.99	-0.26	0.12	1.12	0
PANAS_16	-0.22	-0.12	0.21	1.32	0.3
PANAS_17	0.29	0.31	0.73	1.17	0.62
PANAS_20	-0.89	-0.29	0.63	1.57	0.26
Note:
Item location is the average of the thresholds for each item.

We can get more detailed information, such as the relative item locations and highest/lowest thresholds, by using the RIitemparams() function with the option detail = all.

Code

RIitemparams(df2, detail = "all")

itemnr	Item location	Threshold 1	Threshold 2	Threshold 3	Threshold 4	Relative item location	Relative lowest threshold	Relative highest threshold
PANAS_11	-0.01	-1.24	-0.33	0.07	1.47	-0.32	-1.55	1.15
PANAS_13	0.7	0.06	0.15	0.82	1.77	0.39	-0.25	1.46
PANAS_14	0	-0.99	-0.26	0.12	1.12	-0.31	-1.30	0.81
PANAS_16	0.3	-0.22	-0.12	0.21	1.32	-0.01	-0.53	1.01
PANAS_17	0.62	0.29	0.31	0.73	1.17	0.31	-0.02	0.85
PANAS_20	0.26	-0.89	-0.29	0.63	1.57	-0.06	-1.20	1.26
Note:
Item location is the average of the thresholds for each item. Relative item location is the difference between the item location and the average of the item locations for all items. Relative lowest threshold is the difference between the lowest threshold and the average of all item locations. Relative highest threshold is the difference between the highest threshold and the average of all item locations.

The parameters can also be output to a dataframe or a file, using the option output = "dataframe" or output = "file".

11 Ordinal sum score to interval score

This table shows the corresponding “raw” ordinal sum score values and logit scores, with standard errors for each logit value. Interval scores are estimated using WL based on a simulated dataset using the item parameters estimated from the input dataset. The choice of WL as default is due to the lower bias compared to ML estimation (Warm, 1989).

(An option will be added later to create this table based on only item parameters.)

Code

RIscoreSE(df2)

Ordinal sum score	Logit score	Logit std.error
0	-3.136	1.411
1	-2.045	0.817
2	-1.539	0.637
3	-1.204	0.545
4	-0.949	0.488
5	-0.740	0.450
6	-0.559	0.423
7	-0.398	0.403
8	-0.250	0.389
9	-0.111	0.380
10	0.023	0.373
11	0.152	0.370
12	0.281	0.369
13	0.410	0.371
14	0.541	0.376
15	0.678	0.385
16	0.823	0.397
17	0.979	0.414
18	1.151	0.436
19	1.348	0.467
20	1.579	0.510
21	1.864	0.572
22	2.237	0.669
23	2.789	0.853
24	3.931	1.456

Note that if your transformation table does not show the full range of ordinal sum scores, you can try to increase the option sdx from the default setting of 5. Also, if you find the the default range of logit scores is insufficient, it can be adjusted by changing the option score_range (default is c(-4,4)).

11.1 Ordinal/interval figure

The figure below can also be generated to illustrate the relationship between ordinal sum score and logit interval score. The errorbars default to show the standard error at each point, multiplied by 1.96.

Code

RIscoreSE(df2, output = "figure")

11.2 Estimating interval level person scores

Based on the Rasch analysis output of item parameters, we can estimate each individuals location or score (also known as “theta”). Similarly to the RIitemfitPCM() function there is also a parallel processing version of the function available, which makes use of 4 cores by default.

RIestThetas() by default uses WL estimation of a partial credit model and outputs a vector of person locations on the logit scale. If you do not supply a matrix of item (threshold) locations, the function will use eRm’s CML PCM to automatically calculate the item parameters based on the dataframe input.

Code

library(furrr) # for a parallel processing version of purrr::map_dbl
df2$personScores <- RIestThetas2(df2, cpu = 8)

RIestThetas() can also be used with a pre-specified item (threshold) location matrix. The choice of WL as default is due to the lower bias compared to ML estimation (Warm, 1989). Similarly to RIscoreSE() you can change the range of logit scores, using the option theta_range (default is c(-4,4)).

If you would like to use an existing item matrix, this code may be helpful:

Code

itemParameters <- read_csv("itemParameters.csv") %>% 
  as.matrix()
itemParameters

     Threshold 1 Threshold 2 Threshold 3 Threshold 4
[1,]     -1.2382     -0.3288      0.0666      1.4649
[2,]      0.0623      0.1480      0.8241      1.7680
[3,]     -0.9915     -0.2566      0.1231      1.1222
[4,]     -0.2179     -0.1217      0.2099      1.3248
[5,]      0.2868      0.3061      0.7319      1.1655
[6,]     -0.8929     -0.2860      0.6297      1.5673

As you can see, this is a matrix object (not a dataframe), with each item as a row, and the threshold locations as columns.

Finally, we’ll look at the distribution of person scores using a simple histogram.

Code

hist(df2$personScores, col = "#009ca6")

11.2.1 Estimating individual measurement error

Each individual has a standard error of measurement associated with their estimated location/score. This has not yet been implemented as a function in this package, but can be estimated using the following code with the semTheta() function from library(catR):

Code

df2$personSEM <- map_vec(df2$personScores, ~ semTheta(thEst = .x,
                                                      it = itemParameters,
                                                      model = "PCM",
                                                      method = "WL",
                                                      range = c(-4, 4)
                                                      ))

The function map_vec() allows us to apply a function to all variables in a vector. As you can see, the first argument is the vector of estimated person locations, then we use the semTheta() function to calculate the standard error of measurement for each individual. The it argument is the item parameters matrix, and the range argument is the range of logit scores. The range should match the range set when estimating the person locations with RIestThetas(), and -4 to 4 is the default setting for both functions.

12 Figure design

Most of the figures created by the functions can be styled (colors, fonts, etc) by adding theme settings to them. You can use the standard ggplot function theme() and related theme-functions. As usual it is possible to “stack” theme functions, as seen in the example below.

You can also change coloring, axis limits/breaks, etc, just by adding ggplot options with a + sign.

A custom theme function, theme_rise(), is included in the RISEkbmRasch package. It might be easier to use if you are not familiar with theme().

For instance, you might like to change the font to “Lato” for the item hierarchy figure, and make the background transparent.

Code

# first we need to remove the `personScores` and `personSEM` variable from the `df2` dataframe, to ensure that `df2` contains only item data before using it with the item hierarchy function.
df2$personScores <- NULL
df2$personSEM <- NULL

RIitemHierarchy(df2) +
  theme_minimal() + # first apply the minimal theme to make the background transparent
  theme_rise(fontfamily = "Lato") # then apply theme_rise, which simplifies making changes to all plot elements

As of package version 0.1.30.0, the RItargeting() function allows more flexibility in styling too, by having an option to return a list object with the three separate plots. See the NEWS file for more details.

In order to change font for text inside plots you will need to add an additional line of code.

update_geom_defaults("text", list(family = "Lato"))

Please note that this updates the default settings for geom_text() for the whole session. Also, some functions, such as RIloadLoc() make use of geom_text_repel(), for which you would need to change the function above from “text” to “text_repel”.

A simple way to only change font family and font size would be to use theme_minimal(base_family = "Calibri", base_size = 14). Please see the reference page for default ggplot themes for alternatives to theme_minimal().

13 Software used

The grateful package is a nice way to give credit to the packages used in making the analysis. The package can create both a bibliography file and a table object, which is handy for automatically creating a reference list based on the packages used (or at least explicitly loaded).

Code

library(grateful)
pkgs <- cite_packages(cite.tidyverse = TRUE, 
                      output = "table",
                      bib.file = "grateful-refs.bib",
                      include.RStudio = TRUE,
                      out.dir = getwd())
# If kbl() is used to generate this table, the references will not be added to the Reference list.
formattable(pkgs, 
            table.attr = 'class=\"table table-striped\" style="font-size: 13px; font-family: Lato; width: 80%"')

Package	Version	Citation
base	4.3.2	R Core Team (2023)
car	3.1.2	Fox & Weisberg (2019)
eRm	1.0.4	Mair & Hatzinger (2007b); Mair & Hatzinger (2007a); Hatzinger & Rusch (2009); Rusch et al. (2013); Koller et al. (2015); Debelak & Koller (2019)
foreach	1.5.2	Microsoft & Weston (2022)
formattable	0.2.1	Ren & Russell (2021)
furrr	0.3.1	Vaughan & Dancho (2022)
ggrepel	0.9.5	Slowikowski (2024)
glue	1.7.0	Hester & Bryan (2024)
kableExtra	1.3.4	Zhu (2021)
knitr	1.45	Xie (2014); Xie (2015); Xie (2023)
matrixStats	1.2.0	Bengtsson (2023)
mirt	1.41	Chalmers (2012)
patchwork	1.2.0	Pedersen (2024)
psych	2.3.12	William Revelle (2023)
psychotree	0.16.0	Trepte & Verbeet (2010); Strobl et al. (2011); Strobl et al. (2015a); Komboz et al. (2018); Wickelmaier & Zeileis (2018)
reshape	0.8.9	Wickham (2007)
RISEkbmRasch	0.1.32.1	Johansson (2024)
rmarkdown	2.25	Xie et al. (2018); Xie et al. (2020); Allaire et al. (2023)
TAM	4.1.4	@
tidyverse	2.0.0	Wickham et al. (2019)

14 Additional credits

Thanks to my colleagues at RISE for providing feedback and testing the package on Windows and MacOS platforms. Also, thanks to Mike Linacre and Jeanette Melin for providing useful feedback to improve this vignette.

15 References

Allaire, J., Xie, Y., Dervieux, C., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., Wickham, H., Cheng, J., Chang, W., & Iannone, R. (2023). rmarkdown: Dynamic documents for r. https://github.com/rstudio/rmarkdown

Anvari, F., Efendić, E., Olsen, J., Arslan, R. C., Elson, M., & Schneider, I. K. (2022). Bias in Self-Reports: An Initial Elevation Phenomenon. Social Psychological and Personality Science, 19485506221129160. https://doi.org/10.1177/19485506221129160

Bengtsson, H. (2023). matrixStats: Functions that apply to rows and columns of matrices (and to vectors). https://CRAN.R-project.org/package=matrixStats

Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06

Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical Values for Yen’s Q3: Identification of Local Dependence in the Rasch Model Using Residual Correlations. Applied Psychological Measurement, 41(3), 178–194. https://doi.org/10.1177/0146621616677520

Debelak, R., & Koller, I. (2019). Testing the local independence assumption of the rasch model with Q3-based nonparametric model tests. Applied Psychological Measurement, 44. https://doi.org/10.1177/0146621619835501

Fox, J., & Weisberg, S. (2019). An R companion to applied regression (Third). Sage. https://socialsciences.mcmaster.ca/jfox/Books/Companion/

Hagell, P., & Westergren, A. (2016). Sample size and statistical conclusions from tests of fit to the rasch model according to the rasch unidimensional measurement model (RUMM) program in health outcome measurement. Journal of Applied Measurement, 17(4), 416–431.

Hatzinger, R., & Rusch, T. (2009). IRT models with relaxed assumptions in eRm: A manual-like instruction. Psychology Science Quarterly, 51.

Hester, J., & Bryan, J. (2024). glue: Interpreted string literals. https://CRAN.R-project.org/package=glue

Johansson, M. (2024). RISEkbmRasch: Psychometric analysis in r with rasch measurement theory. https://github.com/pgmj/RISEkbmRasch

Johansson, M., Preuter, M., Karlsson, S., Möllerberg, M.-L., Svensson, H., & Melin, J. (2023). Valid and reliable? Basic and expanded recommendations for psychometric reporting and quality assessment. https://doi.org/10.31219/osf.io/3htzc

Koller, I., Maier, M., & Hatzinger, R. (2015). An empirical power analysis of quasi-exact tests for the rasch model: Measurement invariance in small samples. Methodology, 11. https://doi.org/10.1027/1614-2241/a000090

Komboz, B., Zeileis, A., & Strobl, C. (2018). Tree-based global model tests for polytomous Rasch models. Educational and Psychological Measurement, 78(1), 128–166. https://doi.org/10.1177/0013164416664394

Mair, P., & Hatzinger, R. (2007a). CML based estimation of extended rasch models with the eRm package in r. Psychology Science, 49. https://doi.org/10.18637/jss.v020.i09

Mair, P., & Hatzinger, R. (2007b). Extended rasch modeling: The eRm package for the application of IRT models in r. Journal of Statistical Software, 20. https://doi.org/10.18637/jss.v020.i09

Microsoft, & Weston, S. (2022). foreach: Provides foreach looping construct. https://CRAN.R-project.org/package=foreach

Pedersen, T. L. (2024). patchwork: The composer of plots. https://CRAN.R-project.org/package=patchwork

R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Ren, K., & Russell, K. (2021). formattable: Create “Formattable” data structures. https://CRAN.R-project.org/package=formattable

Rusch, T., Maier, M., & Hatzinger, R. (2013). Linear logistic models with relaxed assumptions in r. In B. Lausen, D. van den Poel, & A. Ultsch (Eds.), Algorithms from and for nature and life. Springer. https://doi.org/10.1007/978-3-319-00035-0_34

Slowikowski, K. (2024). ggrepel: Automatically position non-overlapping text labels with “ggplot2”. https://CRAN.R-project.org/package=ggrepel

Strobl, C., Kopf, J., & Zeileis, A. (2015a). Rasch trees: A new method for detecting differential item functioning in the Rasch model. Psychometrika, 80(2), 289–316. https://doi.org/10.1007/s11336-013-9388-3

Strobl, C., Kopf, J., & Zeileis, A. (2015b). Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model. Psychometrika, 80(2), 289–316. https://doi.org/10.1007/s11336-013-9388-3

Strobl, C., Schneider, L., Kopf, J., & Zeileis, A. (2021). Using the raschtree function for detecting differential item functioning in the Rasch model. 12.

Strobl, C., Wickelmaier, F., & Zeileis, A. (2011). Accounting for individual differences in Bradley-Terry models by means of recursive partitioning. Journal of Educational and Behavioral Statistics, 36(2), 135–153. https://doi.org/10.3102/1076998609359791

Trepte, S., & Verbeet, M. (Eds.). (2010). Allgemeinbildung in Deutschland – erkenntnisse aus dem SPIEGEL Studentenpisa-Test. VS Verlag.

Vaughan, D., & Dancho, M. (2022). furrr: Apply mapping functions in parallel using futures. https://CRAN.R-project.org/package=furrr

Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450. https://doi.org/10.1007/BF02294627

Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z

Wickham, H. (2007). Reshaping data with the reshape package. Journal of Statistical Software, 21(12). https://www.jstatsoft.org/v21/i12/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

William Revelle. (2023). psych: Procedures for psychological, psychometric, and personality research. Northwestern University. https://CRAN.R-project.org/package=psych

Xie, Y. (2014). knitr: A comprehensive tool for reproducible research in R. In V. Stodden, F. Leisch, & R. D. Peng (Eds.), Implementing reproducible computational research. Chapman; Hall/CRC.

Xie, Y. (2015). Dynamic documents with R and knitr (2nd ed.). Chapman; Hall/CRC. https://yihui.org/knitr/

Xie, Y. (2023). knitr: A general-purpose package for dynamic report generation in r. https://yihui.org/knitr/

Xie, Y., Allaire, J. J., & Grolemund, G. (2018). R markdown: The definitive guide. Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown

Xie, Y., Dervieux, C., & Riederer, E. (2020). R markdown cookbook. Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook

Zhu, H. (2021). kableExtra: Construct complex table with “kable” and pipe syntax. https://CRAN.R-project.org/package=kableExtra

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{johansson2024,
  author = {Johansson, Magnus},
  title = {RISEkbmRasch Vignette},
  date = {2024-02-20},
  url = {https://pgmj.github.io/raschrvignette/RaschRvign.html},
  langid = {en}
}

For attribution, please cite this work as:

Johansson, M. (2024, February 20). RISEkbmRasch vignette. https://pgmj.github.io/raschrvignette/RaschRvign.html