This vignette will walk through a sample analysis using an open dataset with polytomous questionnaire data. This will include some data wrangling to structure the item data and itemlabels, then provide examples of the different functions. The full source code of this document can be found either in this repository or by clicking on </> CODE at the top of this page. You should be able to use the source code “as is” and reproduce this document locally, as long as you have the required packages installed. This page and this website are built using the open source publishing tool Quarto.
One of the aims with this package is to simplify reproducible psychometric analysis to shed light on the measurement properties of a scale, questionnaire or test. In a paper recently made available as a preprint (Johansson et al., 2023), our research group propose that the basic aspects of a psychometric analysis should include information about:
Unidimensionality
Response categories
Invariance
Targeting
Measurement uncertainties (reliability)
We’ll include several ways to investigate these measurement properties, using Rasch Measurement Theory. There are also functions in the package less directly related to the criteria above, that will be demonstrated in this vignette.
Please note that this is just a sample analysis to showcase the R package. It is not intended as a “best practice” psychometric analysis example.
You can skip ahead to the Rasch analysis part in Section 3 if you are eager to look at the package output :)
There is a separate GitHub repository containing a template R-project to simplify using the easyRasch package when conducting a reproducible Rasch analysis in R: https://github.com/pgmj/RISEraschTemplate
1 Getting started
Since the package is intended for use with Quarto, this vignette has also been created with Quarto. A “template” .qmd file is available that can be useful to have handy for copy&paste when running a new analysis. You can also download a complete copy of the Quarto/R code to produce this document here.
Loading the easyRasch package will automatically load all the packages it depends on. However, it could be desirable to explicitly load all packages used, to simplify the automatic creation of citations for them, using the grateful package (see Section 12).
Quarto automatically adds links to R packages and functions throughout this document. However, this feature only works properly for packages available on CRAN. Since the easyRasch package is not on CRAN the links related to functions starting with RI will not work.
1.1 Loading data
We will use data from a recent paper investigating the “initial elevation effect” (Anvari et al., 2022), and focus on the 10 negative items from the PANAS. The data is available at the OSF website.
Code
df.all<-read_csv("https://osf.io/download/6fbr5/")# if you have issues with the link, please try downloading manually using the same URL as above# and read the file from your local drive.# subset items and demographic variablesdf<-df.all%>%select(starts_with("PANASD2_1"),starts_with("PANASD2_20"),age,Sex,Group)%>%select(!PANASD2_10_Active)%>%select(!PANASD2_1_Attentive)
The glimpse() function provides a quick overview of our dataframe.
We have 1856 rows, ie. respondents. All variables except Sex and Group are of class dbl, which means they are numeric and can have decimals. Integer (numeric with no decimals) would also be fine for our purposes. The two demographic variables currently of class chr (character) will need to be converted to factors (fct), and we will do that later on.
(If you import a dataset where item variables are of class character, you will need to recode to numeric.)
1.2 Itemlabels
Then we set up the itemlabels dataframe. This could also be done using the free LibreOffice Calc or MS Excel. Just make sure the file has the same structure, with two variables named itemnr and item that contain the item variable names and item description. The item variable names have to match the variable names in the item dataframe.
Variables for invariance tests such as Differential Item Functioning (DIF) need to be separated into vectors (ideally as factors with specified levels and labels) with the same length as the number of rows in the dataset. This means that any kind of removal of respondents/rows with missing data needs to be done before separating the DIF variables.
We need to check how the Sex variable has been coded and which responses are present in the data.
CONSENT REVOKED DATA EXPIRED Female Male
2 1 896 955
Prefer not to say
2
Since there are only 5 respondents using labels outside of Female/Male (too few for meaningful statistical analysis), we will remove them to have a complete dataset for all variables in this example.
Sometimes age is provided in categories, but here we have a numeric variable with age in years. Let’s have a quick look at the age distribution using a histogram, and calculate mean, sd and range.
Code
### simpler version of the ggplot below using base R function hist()# hist(df$age, col = "#009ca6")# abline(v = mean(age, na.rm = TRUE))# # df %>% # summarise(Mean = round(mean(age, na.rm = T),1),# StDev = round(sd(age, na.rm = T),1)# )ggplot(df)+geom_histogram(aes(x =age), fill ="#009ca6", col ="black")+# add the average as a vertical linegeom_vline(xintercept =mean(df$age), linewidth =1.5, linetype =2, col ="orange")+# add a light grey field indicating the standard deviationannotate("rect", ymin =0, ymax =Inf, xmin =(mean(df$age, na.rm =TRUE)-sd(df$age, na.rm =TRUE)), xmax =(mean(df$age, na.rm =TRUE)+sd(df$age, na.rm =TRUE)), alpha =.2)+labs(title ="", x ="Age in years", y ="Number of respondents", caption =glue("Note. Mean age is {round(mean(df$age, na.rm = T),1)} years with a standard deviation of {round(sd(df$age, na.rm = T),1)}. Age range is {min(df$age)} to {max(df$age)}."))+theme(plot.caption =element_text(hjust =0, face ="italic"))
Age also needs to be a separate vector, and removed from the item dataframe.
Code
dif.age<-df$agedf$age<-NULL
There is also a grouping variable which needs to be converted to a factor.
With only item data remaining in the dataframe, we can easily rename the items in the item dataframe. These names match the itemlabels variable itemnr.
No missing data in this dataset. If we had missing data, we could also use RImissingP() to look at which respondents have missing data and how much.
2.2 Overall responses
This provides us with an overall picture of the data distribution. As a bonus, any oddities/mistakes in recoding the item data from categories to numbers will be clearly visible.
Most R packages for Rasch analysis require the lowest response category to be zero, which makes it necessary for us to recode our data, from using the range of 1-5 to 0-4.
Now, we can also look at the raw distribution of sum scores. The RIrawdist() function is a bit crude, since it requires responses in all response categories to accurately calculate max and min scores.
We can see a floor effect with 11.8% of participants responding in the lowest category for all items.
2.2.2 Guttman structure
While not really necessary, it could be interesting to see whether the response patterns follow a Guttman-like structure. Items and persons are sorted based on lower->higher responses, and we should see the color move from yellow in the lower left corner to blue in the upper right corner.
In this data, we see the floor effect on the left, with 11.8% of respondents all yellow, and a rather weak Guttman structure. This could also be due to a low variation in item locations/difficulties. Since we have a very large sample I added a theme() option to remove the x-axis text, which would anyway just be a blur of the 1851 respondent row numbers. Each thin vertical slice in the figure is one respondent.
2.3 Item level descriptives
There are many ways to look at the item level data, and we’ll get them all together in the tab-panel below. The RItileplot() is probably most informative, since it provides the number of responses in each response category for each item. It is usually recommended to have at least ~10 responses in each category for psychometric analysis, no matter which methodology is used.
Kudos to Solomon Kurz for providing the idea and code on which the tile plot function is built!
Most people will be familiar with the barplot, and this is probably most intuitive to understand the response distribution within each item. However, if there are many items it will take a while to review, and does not provide the same overview as a tileplot or stacked bars.
Code
```{r}#| column: margin#| code-fold: true# This code chunk creates a small table in the margin beside the panel-tabset output below, showing all items currently in the df dataframe.# The Quarto code chunk option "#| column: margin" is necessary for the layout to work as intended.RIlistItemsMargin(df, fontsize = 13)```
The eRm package and Conditional Maximum Likelihood (CML) estimation will be used primarily, with the Partial Credit Model since this is polytomous data.
We will begin by looking at unidimensionality, response categories, and targeting in parallel below. For unidimensionality, we are mostly interested in item fit and residual correlations, as well as PCA of residuals and loadings on the first residual contrast. At the same time, disordered response categories can influence item fit to some extent (and vice versa), and knowledge about targeting can be useful if it is necessary to remove items due to residual correlations.
When unidimensionality and response categories are found to work adequately, we will move on to invariance testing (Differential Item Functioning, DIF). It should be noted that DIF should be evaluated in parallel with all other psychometric aspects, but since it is a more complex issue it is kept in a separate section in this vignette (as is person fit). Finally, when/if invariance/DIF also looks acceptable, we can investigate reliability/measurement uncertainties.
Note
In the tabset-panel below, each tab contains explanatory text, which is sometimes a bit lengthy. Remember to scroll back up and click on all tabs.
simfit1<-RIgetfit(df, iterations =400, cpu =8)# save simulation output to object `simfit1`RIitemfit(df, simfit1)
Item
InfitMSQ
Infit thresholds
OutfitMSQ
Outfit thresholds
Infit diff
Outfit diff
Relative location
PANAS_11
1.198
[0.931, 1.079]
1.255
[0.923, 1.095]
0.119
0.16
1.14
PANAS_12
0.842
[0.921, 1.081]
0.852
[0.884, 1.109]
0.079
0.032
1.42
PANAS_13
1.174
[0.93, 1.089]
1.346
[0.879, 1.149]
0.085
0.197
1.91
PANAS_14
1.064
[0.913, 1.093]
1.11
[0.898, 1.112]
no misfit
no misfit
1.15
PANAS_15
0.783
[0.91, 1.093]
0.725
[0.858, 1.159]
0.127
0.133
1.63
PANAS_16
0.798
[0.908, 1.082]
0.781
[0.892, 1.133]
0.11
0.111
1.48
PANAS_17
0.953
[0.897, 1.081]
0.92
[0.839, 1.177]
no misfit
no misfit
1.83
PANAS_18
1.088
[0.914, 1.094]
1.172
[0.868, 1.17]
no misfit
0.002
1.77
PANAS_19
0.921
[0.921, 1.069]
0.933
[0.918, 1.096]
0
no misfit
0.98
PANAS_20
1.205
[0.903, 1.092]
1.257
[0.892, 1.103]
0.113
0.154
1.42
Note:
MSQ values based on conditional calculations (n = 1851 complete cases). Simulation based thresholds from 400 simulated datasets.
RIitemfit() and RIgetfit() both work with both dichotomous and polytomous data (using the partial credit model) and automatically selects the model based on the data structure.
It is important to note that the new (since version 0.2.2, released 2024-08-19) RIitemfit() function uses conditional outfit/infit, which is both robust to different sample sizes and makes ZSTD unnecessary (Müller, 2020).
Since the distribution of item fit statistics are not known, we need to use simulation to determine appropriate cutoff threshold values for the current sample and items. RIitemfit() can also use the simulation based cutoff values and use them for conditional highlighting of misfitting items. See the blog post on simulation based cutoffs for some more details on this. RIitemfit() can also be used without cutoffs and conditional highlighting. For a possibly useful rule-of-thumb cutoff for infit MSQ only, use the option cutoff = "Smith98"(Müller, 2020; R. M. Smith et al., 1998). However, this cutoff is not applicable for all items, only for what can be expected for the average item fit. The simulation/bootstrap-based cutoff values will be more accurate for every item in your data.
Briefly stated, the simulation uses the properties of the current sample and items, and simulates n iterations of data that fit the Rasch model to get an empirical distribution of item fit that we can use for comparison with the observed data. This is also known as “parametric bootstrapping”.
The simulation can take quite a bit of time to run if you have complex data/many items/many participants, and/or choose to use many iterations. Simulation experiments (Johansson, 2025b) indicate that 100-400 iterations should be a useful range, where smaller samples (n < 300) should use 100 iterations, and 200-400 is more appropriate when one has larger samples. Another important finding from simulation studies is that there is a large risk of false positive indication of misfit in sample larger than n = 1000 when using item infit or item-restscore. The recommended primary method for large samples is the bootstrapped item-restscore (Johansson, 2025b), as illustrated in an adjacent tab labeled “Item-restscore bootstrap” .
For reference, the simulation above, using 10 items with 5 response categories each and 1851 respondents, takes about 24 seconds to run on 8 cpu cores (Macbook Pro M1 Max) for 400 iterations.
I’ll cite Ostini & Nering (2006) on the description of outfit and infit (pages 86-87):
Response residuals can be summed over respondents to obtain an item fit measure. Generally, the accumulation is done with squared standardized residuals, which are then divided by the total number of respondents to obtain a mean square statistic. In this form, the statistic is referred to as an unweighted mean square (Masters & Wright, 1997; Wright & Masters, 1982) and has also come to be known as “outfit” (Smith, Schumacker, & Bush, 1998; Wu, 1997), perhaps because it is highly sensitive to outlier responses (Adams & Khoo, 1996; Smith et al., 1998; Wright & Masters, 1982).
A weighted version of this statistic was developed to counteract its sensitivity to outliers (Smith, 2000). In its weighted form, the squared standardized residual is multiplied by the observed response variance and then divided by the sum of the item response variances. This is sometimes referred to as an information weighted mean square and has become known as “infit” (Smith et al., 1998; Wu, 1997).
A low item fit value (sometimes referred to as an item “overfitting” the Rasch model) indicates that responses are too predictable and provide little information. This is often the case for items that are very general/broad in scope in relation to the latent variable. You will often find overfitting items to also have residual correlations with other items.
A high item fit value (sometimes referred to as “underfitting” the Rasch model) can indicate several things, often multidimensionality or a question that is difficult to interpret and thus has noisy response data. The latter could for instance be caused by a question that asks about two things at the same time or is ambiguous for other reasons.
Note
Remember to scroll back up and click on all tabs.
This is another useful function from the iarm package. It shows the expected and observed correlation between an item and a score based on the rest of the items (Kreiner, 2011). Similarly, but inverted, to item fit, a lower observed correlation value than expected indicates underfit, that the item may not belong to the dimension. A higher than expected observed value indicates an overfitting and possibly redundant item. Overfitting items will often also show issues with residual correlations. Both of these problems can often be (at least partially) resolved by removing underfitting items.
Both item-restscore and conditional item infit/outfit will indicate “false misfit” when sample sizes are large (even when using simulation/bootstrap based cutoff values). This behavior can occur from about n = 500, and certainly will occur at samples of 800 and above (Johansson, 2025b). This “false misfit” is caused by truly misfitting items, which underlines the importance of removing one item at a time when one finds issues with misfit/multidimensionality. However, a useful way to get additional information about the probability of actual misfit is to use non-parametric bootstrapping. This function resamples with replacement from your response data and reports the percentage and type of misfit indicated by the item-restscore function. You will also get information about conditional MSQ infit (based on the full sample, using complete responders). Simulation studies indicate that a sample size of 800 results in 95+% detection rate of 1-3 misfitting items amongst 20 dichotomous items (Johansson, 2025b).
Results based on 250 bootstrap iterations with a sample size of 800. Conditional mean-square infit based on complete responders only, n = 1851.
The iarm package (Mueller & Santiago, 2022) provides several interesting functions for assessing item fit, DIF and other things. Some of these functions may be included in a future version of the easyRasch package. Below are conditional item characteristic curves (ICC’s) using the estimated theta (factor score).
These curves indicate item fit on a group level, where respondents are split into “class intervals” based on their sum score/factor score.
Code
ICCplot(as.data.frame(df), itemnumber =1:4, method ="cut", cinumber =6, # number of class intervals to split respondents into itemdescrip =c("PANAS_11","PANAS_12","PANAS_13","PANAS_14"))
[1] "Please press Zoom on the Plots window to see the plot"
A similar, but even more informative and flexible, visualization has been made available in the RASCHplot package (Buchardt et al., 2023), which needs to be installed from GitHub (see code below). The linked paper is recommended reading, not least for descriptions of the useful options available. Below are some sample plots showing conditional ICC’s using the raw sum score.
Based on an old rule-of-thumb, the first eigenvalue should be below 1.5 to support unidimensionality (E. V. Smith, 2002). However, as with many other metrics this is dependent on sample size and test length (Chou & Wang, 2010) and needs to be used together with checking item fit (or item-restscore) and residual correlations (local dependence). I personally find this metric less useful and prefer to use the Conditional Likelihood Ratio Test as a global test of model fit. The PCA eigenvalues are only included here for those coming from Winsteps who might be looking for it. Speaking of Winsteps, the “explained variance” will not be comparable to Winsteps corresponding metric, since this one only shows the results from the analysis of residuals and not the explained variance of the Rasch model itself.
The Conditional Likelihood Ratio Test (LRT, Andersen, 1973) is a global test of fit and can be a useful addition to more informative item-level metrics (Johansson, 2025b).
In order to support unidimensionality, items should only be related to each other through the latent variable. This is called “local independence”. By investigating patterns in model residuals, we can determine whether items are independent or not.
Similarly to item fit, we need to run simulations to get a useful cutoff threshold value for when residual correlations amongst item pairs are larger than would be expected from items that fit a unidimensional Rasch model (Christensen et al., 2017).
The simulation/bootstrap procedure can take some time to run, depending on the complexity of your data, but it is necessary to set the appropriate cutoff value.
Relative cut-off value is 0.001, which is 0.103 above the average correlation (-0.102). Correlations above the cut-off are highlighted in red text.
The matrix above shows item-pair correlations of item residuals, with highlights in red showing correlations crossing the threshold compared to the average item-pair correlation (for all item-pairs) (Christensen et al., 2017). Rasch model residual correlations (Yen’s Q3) are calculated using the mirt package.
Another way to assess local (in)dependence is by partial gamma coefficients (Kreiner & Christensen, 2004). This is also a function from the iarm package. See ?iarm::partgam_LD for details.
Here we see item locations and their loadings on the first residual contrast. This figure can be helpful to identify clusters in data or multidimensionality.
The xlims setting changes the x-axis limits for the plots. The default values usually make sense, and we mostly add this option to point out the possibility of doing so. You can also choose to only show plots for only specific items.
Each response category for each item should have a curve that indicates it to be the most probably response at some point on the latent variable (x axis in the figure).
# increase fig-height in the chunk option above if you have many itemsRItargeting(df, xlim =c(-5,4))# xlim defaults to c(-4,4) if you omit this option
This figure shows how well the items fit the respondents/persons. It is a sort of Wright Map that shows person locations and item threshold locations on the same logit scale.
The top part shows person location histogram, the middle part an inverted histogram of item threshold locations, and the bottom part shows individual item threshold locations. The histograms also show means and standard deviations.
Here the items are sorted on their average threshold location (black diamonds). 84% confidence intervals are shown around each item threshold location. For further details, see the caption text below the figure.
The numbers displayed in the plot can be disabled using the option numbers = FALSE.
Item 18 has issues with the second lowest category being disordered. Several other items have very short distances between thresholds 1 and 2, which is also clearly seen in the Item Hierarchy figure above.
Two item-pairs show residual correlations far above the cutoff value:
15 and 16 (scared and afraid)
17 and 18 (ashamed and guilty)
Since item 15 also has a residual correlation with item 19, we will remove it. In the second pair, item 18 will be removed since it also has problems with disordered response categories.
Note
We have multiple “diagnostics” to review when deciding which item to remove if there are strong residual correlations between two items. Here is a list of commonly used criteria:
item fit
item threshold locations compared to sample locations (targeting)
ordering of response categories
DIF
and whether there are residual correlations between one item and multiple other items
As seen in the code above, I chose to create a copy of the dataframe with the removed items omitted. This can be useful if, at a later stage in the analysis, I want to be able to quickly “go back” and reinstate an item or undo any other change I have made.
Items 16 & 19, and 12 & 14 show problematic residual correlations.
Let’s look at DIF before taking action upon this information. While we are keeping DIF as a separate section in this vignette, it is recommended to include DIF-analysis in the panel-tabset above (on item fit, PCA, residual correlations, etc).
5 DIF - differential item functioning
We’ll be looking at whether item (threshold) locations are stable between demographic subgroups.
There are several DIF analysis tools available. The first one uses the package psychotree, which relies on statistical significance at p < .05 as an indicator for DIF. This is a criterion that is highly sample size sensitive, and we are always interested in the size/magnitude of DIF as well, since that will inform us about the impact of DIF on the estimated latent variable.
The structure of DIF is also an important and complex aspect, particularly for polytomous data. Uniform DIF means that the DIF is similar across the latent continuum. We can test this in R using the lordif package, as demonstrated in Section 5.6. However, it should be noted that the lordif package does not provide an option to use Rasch models, and there may be results that are caused by also allowing the discrimination parameter to vary across items.
A recent preprint (Henninger et al., 2024) does a great job illustrating “differential step functioning” (DSF), which is when item threshold locations in polytomous data show varying levels of DIF. It also describes a forthcoming development of the psychotree where one can use DIF effect size and purification functions to evaluate DIF/DSF. When the updated package is available, I will work to implement these new functions into the easyRasch package as well.
Note
It is important to ensure that no cells in the data are empty for subgroups when conducting a DIF analysis. Split the data using the DIF-variable and create separate tileplots to review the response distribution in the DIF-groups.
Code
difPlots<-df%>%# save the output into the `difPlots` objectadd_column(gender =dif.sex)%>%# add the DIF variable to the dataframesplit(.$gender)%>%# split the data using the DIF variablemap(~RItileplot(.x%>%dplyr::select(!gender))+labs(title =.x$gender))# create separate tileplots for each groupdifPlots$Female+difPlots$Male# the actual name of the plots (in this case Male/Female) will be determined by the factor labels
While no item shows problematic levels of DIF regarding item location, as shown by the table, there is an interesting pattern in the thresholds figure. The lowest threshold seems to be slightly lower for node 3 (Male) for all items. Also, item 11 shows a much wider spread of item locations for node 3 compared to node 2.
The results do not require any action since the difference is small.
5.2 Age
The psychotree package uses a model-based recursive partitioning that is particularly useful when you have a continuous variable such as age in years and a large enough sample. It will test different ways to partition the age variable to determine potential group differences (Strobl et al., 2015b, 2021).
Review the documentation for further details, using ?LRtest in your R console panel in Rstudio. There is also a plotting function, plotGOF() that may be of interest.
Values highlighted in red are above the chosen cutoff 0.5 logits. Background color brown and blue indicate the lowest and highest values among the DIF groups.
Values highlighted in red are above the chosen cutoff 0.5 logits. Background color brown and blue indicate the lowest and highest values among the DIF groups.
The item threshold table shows that the top threshold for item 13 differs more than 0.5 logits between groups. In this set of 8 items with 4 thresholds each, it is unlikely to result in problematic differences in estimated person scores.
5.6 Logistic Ordinal Regression DIF
The lordif package (Choi et al., 2011) does not use a Rasch measurement model, it only offers a choice between the Graded Response Model (GRM) and the Generalized Partial Credit Model (GPCM). Both of these are 2PL models, meaning that they estimate a discrimination parameter for each item in addition to the item threshold parameters. lordif relies on the mirt package.
There are several nice features available in the lordif package. First, we get a χ2 test of uniform or non-uniform DIF. Second, there are three possible methods/criteria for flagging items with potential DIF. One of these uses a likelihood ratio (LR) χ2 test, while the other two are indicators of DIF size/magnitude, either using a pseudo R2 statistic (“McFadden”, “Nagelkerke”, or “CoxSnell”) or a Beta criterion. For further details, see ?lordif in your R console or the paper describing the package (Choi et al., 2011).
Below is some sample code to get you started with lordif.
Code
library(lordif)g_dif<-lordif(as.data.frame(df), as.numeric(dif.sex), # make sure that the data is in a dataframe-object and that the DIF variable is numeric criterion =c("Chisqr"), alpha =0.01, beta.change =0.1, model ="GPCM", R2.change =0.02)g_dif_sum<-summary(g_dif)
We can review the results regarding uniform/non-uniform DIF by looking at the chi* columns. Uniform DIF is indicated by column chi12 and non-uniform DIF by chi23, while column chi13 represents “an overall test of”total DIF effect” (Choi et al., 2011).
While the table indicates significant chi2-tests for items 11 and 17, the magnitude estimates are low for these items.
There are some plots available as well, using the base R plot() function. For some reason the plots won’t render in this Quarto document, so I will try to sort that out at some point.
Code
plot(g_dif)# use option `graphics.off()` to get the plots rendered one by one#plot(g_dif, graphics.off())
5.7 Partial gamma DIF
The iarm package provides a function to assess DIF by partial gamma (Bjorner et al., 1998). It should be noted that this function only shows a single partial gamma value per item, so if you have more than two groups in your comparison, you will want to also use other methods to understand your results better.
There are some recommended cutoff-values mentioned in the paper above:
No or negligible DIF:
Gamma within the interval -0.21 to 0.21, or
Gamma not significantly different from 0
Slight to moderate DIF:
Gamma within the interval -0.31 to 0.31 (and outside -0.21 to 0.21), or
not significantly outside the interval -0.21 to 0.21
No problematic residual correlations remaining. Several items show misfit but we will end this sample analysis here and move on to show other functions.
There are several item thresholds that are very closely located, as shown in the item hierarchy figure. This is not ideal, since it will inflate reliability estimates. However, we will not modify the response categories for this analysis, we only note that this is not workable and should be dealt with by trying variations of merged response categories to achieve better separation of threshold locations without disordering.
The figure above shows the Test Information Function (TIF), which indicates the reliability of all items making up the test/scale (not the reliability of the sample).
The default cutoff value used in RItif() is TIF = 3.33, which corresponds to person separation index (PSI) = 0.7. PSI is similar to reliability coefficients such as omega and alpha, ranging from 0 to 1. You can change the TIF cutoff by using the option cutoff, for instance cutoff = 2.5 (TIF values range from 1 and up).
While 11.8% of respondents had a floor effect based on the raw sum scored data, the figure above shows us that 41.8% are located below the point where the items produce a PSI of 0.7 or higher. Again, note that this figure shows the reliability of the test/scale, not the sample. If you want to add the sample reliability use option samplePSI = TRUE. More details are available in the documentation ?RItif.
8 Person fit
We can also look at how the respondents fit the Rasch model with these items. By default, RIpfit() outputs a histogram and a hex heatmap with the person infit ZSTD statistic, using +/- 1.96 as cutoff values. This is currently the only person fit method implemented in the easyRasch package, and the curious analyst is suggested to look at the package PerFit for more tools.
You can export the person fit values to a new variable in the dataframe by specifying output = "dataframe", or if you just want the row numbers for respondents with deviant infit values, output = "rowid".
You can also specify a grouping variable to visualize the person fit for different groups.
Person fit is a useful way to identify respondents with unexpected response patterns and investigate this further.
8.1PerFit sample code
While none of the functions in the PerFit package has been implemented in easyRasch, this is some code to get you started if you are interested in using it. There are multiple methods/functions available for polytomous and dichotomous data, see the package documentation.
For this example, we’ll use the non-parametric U3 statistic generalized to polytomous items (Emons, 2008).
The dataframe shown under the tab Flagged respondents above contains a variable named FlaggedID which represents the row id’s. This variable is useful if one wants to filter out respondents with deviant response patterns (person misfit). There are indications that persons with misfit may affect results of Andersen’s LR-test for DIF (Artner, 2016).
8.2 Item fit without aberrant responses
We can remove the misfitting persons to see how that affects item fit. Let’s also compare with the misfitting respondents identified by RIpfit().
MSQ values based on conditional calculations (n = 1695 complete cases). Simulation based thresholds from 400 simulated datasets.
9 Item parameters
To allow others (and oneself) to use the item parameters estimated for estimation of person locations/thetas, we should make the item parameters available. The function will also write a csv-file with the item threshold locations. Estimations of person locations/thetas can be done with the thetaEst() function from the catR package. This is implemented in the function RIestThetasOLD(), see below for details.
Item location is the average of the thresholds for each item.
The parameters can also be output to a dataframe or a file, using the option output = "dataframe" or output = "file".
10 Ordinal sum score to interval score
This table shows the corresponding “raw” ordinal sum score values and logit scores, with standard errors for each logit value. Interval scores are estimated using WL based on a simulated dataset using the item parameters estimated from the input dataset. The choice of WL as default is due to the lower bias compared to ML estimation (Warm, 1989).
(An option will hopefully be added at some point to create this table based on only item parameters.)
The figure below can also be generated to illustrate the relationship between ordinal sum score and logit interval score. The errorbars default to show the standard error at each point, multiplied by 1.96.
Based on the Rasch analysis output of item parameters, we can estimate each individuals location or score (also known as “theta”). RIestThetas() by default uses WLE estimation based on item parameters from a partial credit model (PCM) and outputs a dataframe with person locations (WLE) and measurement error (SEM) on the logit scale.
Each individual has a standard error of measurement (SEM) associated with their estimated location/score. This is included in the output of the RIestThetas() function as the SEM variable, as seen above. We can review the distribution of measurement error with a figure.
We can take a look at the distribution of person locations (thetas) using a histogram.
Code
hist(thetas$WLE, col ="#009ca6", main ="Histogram of person locations (thetas)", breaks =20)
RIestThetasOLD() can be used with a pre-specified item (threshold) location matrix. The choice of WL as default is due to the lower bias compared to ML estimation (Warm, 1989). Similarly to RIscoreSE() you can (and may indeed need to) change the range of logit scores, using the option theta_range. The default is c(-7,7), which should hopefully work in most circumstances.
If you would like to use an existing item threshold location matrix, this code may be helpful:
As you can see, this is a matrix object (not a dataframe), with each item as a row, and the threshold locations as columns.
11 Figure design
Most of the figures created by the functions can be styled (colors, fonts, etc) by adding theme settings to them. You can use the standard ggplot function theme() and related theme-functions. As usual it is possible to “stack” theme functions, as seen in the example below.
You can also change coloring, axis limits/breaks, etc, just by adding ggplot options with a + sign.
A custom theme function, theme_rise(), is included in the easyRasch package. It might be easier to use if you are not familiar with theme().
For instance, you might like to change the font to “Lato” for the item hierarchy figure, and make the background transparent.
Code
RIitemHierarchy(df)+theme_minimal()+# first apply the minimal theme to make the background transparenttheme_rise(fontfamily ="Lato")# then apply theme_rise, which simplifies making changes to all plot elements
As of package version 0.1.30.0, the RItargeting() function allows more flexibility in styling too, by having an option to return a list object with the three separate plots. See the NEWS file for more details. Since the RItargeting() function uses the patchwork library to combine plots, you can also make use of the many functions that patchwork includes. For instance, you can set a title with a specific theme:
Please note that the line of code above updates the default settings for geom_text() for the whole session. Also, some functions, such as RIloadLoc(), make use of geom_text_repel(), for which you would need to change the code above from “text” to “text_repel”.
A simple way to only change font family and font size would be to use theme_minimal(base_family = "Calibri", base_size = 14). Please see the reference page for default ggplot themes for alternatives to theme_minimal().
12 Software used
The grateful package is a nice way to give credit to the packages used in making the analysis. The package can create both a bibliography file and a table object, which is handy for automatically creating a reference list based on the packages used (or at least explicitly loaded).
Code
library(grateful)pkgs<-cite_packages(cite.tidyverse =TRUE, output ="table", bib.file ="grateful-refs.bib", include.RStudio =TRUE, out.dir =getwd())# If kbl() is used to generate this table, the references will not be added to the Reference list.formattable(pkgs, table.attr ='class=\"table table-striped\" style="font-size: 13px; font-family: Lato; width: 80%"')
Thanks to my colleagues at RISE for providing feedback and testing the package on Windows and MacOS platforms. Also, thanks to Mike Linacre and Jeanette Melin for providing useful feedback to improve this vignette.
Allaire, J., Xie, Y., Dervieux, C., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., Wickham, H., Cheng, J., Chang, W., & Iannone, R. (2024). rmarkdown: Dynamic documents for r. https://github.com/rstudio/rmarkdown
Anvari, F., Efendić, E., Olsen, J., Arslan, R. C., Elson, M., & Schneider, I. K. (2022). Bias in Self-Reports: An Initial Elevation Phenomenon. Social Psychological and Personality Science, 19485506221129160. https://doi.org/10.1177/19485506221129160
Artner, R. (2016). A simulation study of person-fit in the Rasch model. Psychological Test and Assessment Modeling, 58(3), 531–563.
Bjorner, J. B., Kreiner, S., Ware, J. E., Damsgaard, M. T., & Bech, P. (1998). Differential ItemFunctioning in the DanishTranslation of the SF-36. Journal of Clinical Epidemiology, 51(11), 1189–1202. https://doi.org/10.1016/S0895-4356(98)00111-5
Buchardt, A.-S., Christensen, K. B., & Jensen, N. (2023). Visualizing Rasch item fit using conditional item characteristic curves in R. Psychological Test and Assessment Modeling, 65(2), 206–219.
Buchardt, A.-S., Jensen, S. N., & Christensen, K. B. (2022). RASCHplot: Visualisation tool for validity of rasch models. https://github.com/ERRTG/RASCHplot
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
Choi, S. W., Gibbons, L. E., & Crane, P. K. (2011). Lordif: AnRPackage for DetectingDifferentialItemFunctioningUsingIterativeHybridOrdinalLogisticRegression/ItemResponseTheory and MonteCarloSimulations. Journal of Statistical Software, 39(1), 1–30. https://doi.org/10.18637/jss.v039.i08
Choi, S. W., Laura E. Gibbons, with contributions from, & Crane, P. K. (2016). lordif: Logistic ordinal regression differential item functioning using IRT. https://CRAN.R-project.org/package=lordif
Chou, Y.-T., & Wang, W.-C. (2010). Checking Dimensionality in ItemResponseModelsWithPrincipalComponentAnalysis on StandardizedResiduals. Educational and Psychological Measurement, 70(5), 717–731. https://doi.org/10.1177/0013164410379322
Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical Values for Yen’s Q3: Identification of Local Dependence in the Rasch Model Using Residual Correlations. Applied Psychological Measurement, 41(3), 178–194. https://doi.org/10.1177/0146621616677520
Debelak, R., & Koller, I. (2019). Testing the local independence assumption of the rasch model with Q3-based nonparametric model tests. Applied Psychological Measurement, 44. https://doi.org/10.1177/0146621619835501
Emons, W. H. M. (2008). Nonparametric Person-FitAnalysis of PolytomousItemScores. Applied Psychological Measurement, 32(3), 224–247. https://doi.org/10.1177/0146621607302479
Hatzinger, R., & Rusch, T. (2009). IRT models with relaxed assumptions in eRm: A manual-like instruction. Psychology Science Quarterly, 51.
Henninger, M., Radek, J., Sengewald, M.-A., & Strobl, C. (2024). Partial credit trees meet the partial gamma coefficient for quantifying DIF and DSF in polytomous items. OSF. https://doi.org/10.31234/osf.io/47sah
Johansson, M., Preuter, M., Karlsson, S., Möllerberg, M.-L., Svensson, H., & Melin, J. (2023). Valid and reliable? Basic and expanded recommendations for psychometric reporting and quality assessment.https://doi.org/10.31219/osf.io/3htzc
Koller, I., Maier, M., & Hatzinger, R. (2015). An empirical power analysis of quasi-exact tests for the rasch model: Measurement invariance in small samples. Methodology, 11. https://doi.org/10.1027/1614-2241/a000090
Komboz, B., Zeileis, A., & Strobl, C. (2018). Tree-based global model tests for polytomous Rasch models. Educational and Psychological Measurement, 78(1), 128–166. https://doi.org/10.1177/0013164416664394
Kreiner, S. (2011). A Note on Item–RestscoreAssociation in RaschModels. Applied Psychological Measurement, 35(7), 557–561. https://doi.org/10.1177/0146621611410227
Kreiner, S., & Christensen, K. B. (2004). Analysis of LocalDependence and Multidimensionality in GraphicalLoglinearRaschModels. Communications in Statistics - Theory and Methods, 33(6), 1239–1276. https://doi.org/10.1081/STA-120030148
Mair, P., & Hatzinger, R. (2007a). CML based estimation of extended rasch models with the eRm package in r. Psychology Science, 49. https://doi.org/10.18637/jss.v020.i09
Mair, P., & Hatzinger, R. (2007b). Extended rasch modeling: The eRm package for the application of IRT models in r. Journal of Statistical Software, 20. https://doi.org/10.18637/jss.v020.i09
Müller, M. (2020). Item fit statistics for Rasch analysis: Can we trust them? Journal of Statistical Distributions and Applications, 7(1), 5. https://doi.org/10.1186/s40488-020-00108-7
Rusch, T., Maier, M., & Hatzinger, R. (2013). Linear logistic models with relaxed assumptions in r. In B. Lausen, D. van den Poel, & A. Ultsch (Eds.), Algorithms from and for nature and life. Springer. https://doi.org/10.1007/978-3-319-00035-0_34
Strobl, C., Kopf, J., & Zeileis, A. (2015a). Rasch trees: A new method for detecting differential item functioning in the Rasch model. Psychometrika, 80(2), 289–316. https://doi.org/10.1007/s11336-013-9388-3
Strobl, C., Kopf, J., & Zeileis, A. (2015b). Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model. Psychometrika, 80(2), 289–316. https://doi.org/10.1007/s11336-013-9388-3
Strobl, C., Schneider, L., Kopf, J., & Zeileis, A. (2021). Using the raschtree function for detecting differential item functioning in the Rasch model. 12.
Strobl, C., Wickelmaier, F., & Zeileis, A. (2011). Accounting for individual differences in Bradley-Terry models by means of recursive partitioning. Journal of Educational and Behavioral Statistics, 36(2), 135–153. https://doi.org/10.3102/1076998609359791
Tendeiro, J. N., Meijer, R. R., & Niessen, A. S. M. (2016). PerFit: An R package for person-fit analysis in IRT. Journal of Statistical Software, 74(5), 1–27. https://doi.org/10.18637/jss.v074.i05
Trepte, S., & Verbeet, M. (Eds.). (2010). Allgemeinbildung in Deutschland – erkenntnisse aus dem SPIEGELStudentenpisa-Test. VS Verlag.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450. https://doi.org/10.1007/BF02294627
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Wickham, H. (2007). Reshaping data with the reshape package. Journal of Statistical Software, 21(12). https://www.jstatsoft.org/v21/i12/
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
William Revelle. (2024). psych: Procedures for psychological, psychometric, and personality research. Northwestern University. https://CRAN.R-project.org/package=psych
Xie, Y. (2014). knitr: A comprehensive tool for reproducible research in R. In V. Stodden, F. Leisch, & R. D. Peng (Eds.), Implementing reproducible computational research. Chapman; Hall/CRC.
Xie, Y. (2015). Dynamic documents with R and knitr (2nd ed.). Chapman; Hall/CRC. https://yihui.org/knitr/
Xie, Y. (2024). knitr: A general-purpose package for dynamic report generation in r. https://yihui.org/knitr/
---title: "easyRasch vignette"subtitle: "R package for Rasch analysis"author: name: 'Magnus Johansson' affiliation: 'RISE Research Institutes of Sweden' affiliation-url: 'https://www.ri.se/en/what-we-do/expertises/category-based-measurements' orcid: '0000-0003-1669-592X'date: last-modifiedgoogle-scholar: truecitation: truewebsite: open-graph: image: "/RaschRvign_files/figure-html/unnamed-chunk-34-1.png"execute: cache: true warning: false message: falsebibliography: - references.bib - grateful-refs.bibcsl: apa.csleditor_options: chunk_output_type: console---This is an introduction to using the [`easyRasch` R package](https://pgmj.github.io/easyRasch/). A changelog for package updates is available [here](https://pgmj.github.io/easyRasch/news/index.html).::: {.callout-note icon="true"}## NoteThis packages was previously known as `RISEkbmRasch`.:::Details on package installation are available at the [package GitHub page](https://pgmj.github.io/easyRasch).If you are new to Rasch Measurement Theory, you may find this intro presentation useful:<https://pgmj.github.io/RaschIRTlecture/slides.html>This vignette will walk through a sample analysis using an open dataset with polytomous questionnaire data. This will include some data wrangling to structure the item data and itemlabels, then provide examples of the different functions. The full source code of this document can be found either [in this repository](https://github.com/pgmj/pgmj.github.io/blob/main/raschrvignette/RaschRvign.qmd) or by clicking on **\</\> CODE** at the top of this page. You should be able to use the source code "as is" and reproduce this document locally, as long as you have the required packages installed. This page and this website are built using the open source publishing tool [Quarto](https://www.quarto.org).One of the aims with this package is to simplify reproducible psychometric analysis to shed light on the measurement properties of a scale, questionnaire or test. In a paper recently made available as a preprint [@johansson], our [research group](https://www.ri.se/en/what-we-do/projects/center-for-categorically-based-measurements) propose that the basic aspects of a psychometric analysis should include information about:- Unidimensionality- Response categories- Invariance- Targeting- Measurement uncertainties (reliability)We'll include several ways to investigate these measurement properties, using Rasch Measurement Theory. There are also functions in the package less directly related to the criteria above, that will be demonstrated in this vignette.Please note that this is just a sample analysis to showcase the R package. It is not intended as a "best practice" psychometric analysis example.You can skip ahead to the Rasch analysis part in @sec-rasch if you are eager to look at the package output :)There is a separate GitHub repository containing a template R-project to simplify using the `easyRasch` package when conducting a reproducible Rasch analysis in R: <https://github.com/pgmj/RISEraschTemplate>## Getting startedSince the package is intended for use with Quarto, this vignette has also been created with Quarto. A "template" .qmd file [is available](https://github.com/pgmj/RISEraschTemplate/blob/main/analysis.qmd) that can be useful to have handy for copy&paste when running a new analysis. You can also download a complete copy of the Quarto/R code to produce this document [here](https://github.com/pgmj/pgmj.github.io/blob/main/raschrvignette/RaschRvign.qmd).Loading the `easyRasch` package will automatically load all the packages it depends on. However, it could be desirable to explicitly load all packages used, to simplify the automatic creation of citations for them, using the `grateful` package (see @sec-grateful).```{r}library(easyRasch) # devtools::install_github("pgmj/easyRasch", dependencies = TRUE)library(grateful)library(ggrepel)library(car)library(kableExtra)library(readxl)library(tidyverse)library(eRm)library(iarm)library(mirt)library(psych)library(ggplot2)library(psychotree)library(matrixStats)library(reshape)library(knitr)library(patchwork)library(formattable) library(glue)library(foreach)### some commands exist in multiple packages, here we define preferred ones that are frequently usedselect <- dplyr::selectcount <- dplyr::countrecode <- car::recoderename <- dplyr::rename```::: {.callout-note icon="true"}## NoteQuarto automatically adds links to R packages and functions throughout this document. However, this feature only works properly for packages available on [CRAN](https://cran.r-project.org/). Since the `easyRasch` package is not on CRAN the links related to functions starting with **RI** will not work.:::### Loading dataWe will use data from a recent paper investigating the "initial elevation effect" [@anvari2022], and focus on the 10 negative items from the PANAS. The data is available at the OSF website.```{r}df.all <-read_csv("https://osf.io/download/6fbr5/")# if you have issues with the link, please try downloading manually using the same URL as above# and read the file from your local drive.# subset items and demographic variablesdf <- df.all %>%select(starts_with("PANASD2_1"),starts_with("PANASD2_20"), age,Sex,Group) %>%select(!PANASD2_10_Active) %>%select(!PANASD2_1_Attentive)```The `glimpse()` function provides a quick overview of our dataframe.```{r}glimpse(df)```We have `r nrow(df)` rows, ie. respondents. All variables except Sex and Group are of class `dbl`, which means they are numeric and can have decimals. Integer (numeric with no decimals) would also be fine for our purposes. The two demographic variables currently of class `chr` (character) will need to be converted to factors (`fct`), and we will do that later on.(If you import a dataset where item variables are of class character, you will need to recode to numeric.)### ItemlabelsThen we set up the itemlabels dataframe. This could also be done using the free [LibreOffice Calc](https://www.libreoffice.org/download/download-libreoffice/) or MS Excel. Just make sure the file has the same structure, with two variables named `itemnr` and `item` that contain the item variable names and item description. The item variable names have to match the variable names in the item dataframe.```{r}itemlabels <- df %>%select(starts_with("PAN")) %>%names() %>%as_tibble() %>%separate(value, c(NA, "item"), sep ="_[0-9][0-9]_") %>%mutate(itemnr =paste0("PANAS_",c(11:20)), .before ="item")```The `itemlabels` dataframe looks like this.```{r}itemlabels```### DemographicsVariables for invariance tests such as Differential Item Functioning (DIF) need to be separated into vectors (ideally as factors with specified levels and labels) with the same length as the number of rows in the dataset. This means that any kind of removal of respondents/rows with missing data needs to be done before separating the DIF variables.We need to check how the `Sex` variable has been coded and which responses are present in the data.```{r}table(df$Sex)```Since there are only 5 respondents using labels outside of Female/Male (too few for meaningful statistical analysis), we will remove them to have a complete dataset for all variables in this example.```{r}df <- df %>%filter(Sex %in%c("Female","Male"))```Let's make the variable a factor (instead of class "character") and put in in a vector separate from the item dataframe.```{r}dif.sex <-factor(df$Sex)```And remove our DIF demographic variable from the item dataset.```{r}df$Sex <-NULL```We can now make use of a very simple function included in this package!```{r}RIdemographics(dif.sex, "Sex")```Let's move on to the age variable.```{r}glimpse(df$age)```Sometimes age is provided in categories, but here we have a numeric variable with age in years. Let's have a quick look at the age distribution using a histogram, and calculate mean, sd and range.```{r}### simpler version of the ggplot below using base R function hist()# hist(df$age, col = "#009ca6")# abline(v = mean(age, na.rm = TRUE))# # df %>% # summarise(Mean = round(mean(age, na.rm = T),1),# StDev = round(sd(age, na.rm = T),1)# )ggplot(df) +geom_histogram(aes(x = age), fill ="#009ca6",col ="black") +# add the average as a vertical linegeom_vline(xintercept =mean(df$age), linewidth =1.5,linetype =2,col ="orange") +# add a light grey field indicating the standard deviationannotate("rect", ymin =0, ymax =Inf, xmin = (mean(df$age, na.rm =TRUE) -sd(df$age, na.rm =TRUE)), xmax = (mean(df$age, na.rm =TRUE) +sd(df$age, na.rm =TRUE)), alpha = .2) +labs(title ="",x ="Age in years",y ="Number of respondents",caption =glue("Note. Mean age is {round(mean(df$age, na.rm = T),1)} years with a standard deviation of {round(sd(df$age, na.rm = T),1)}. Age range is {min(df$age)} to {max(df$age)}.") ) +theme(plot.caption =element_text(hjust =0, face ="italic"))```Age also needs to be a separate vector, and removed from the item dataframe.```{r}dif.age <- df$agedf$age <-NULL```There is also a grouping variable which needs to be converted to a factor.```{r}dif.group <-factor(df$Group)df$Group <-NULLRIdemographics(dif.group, "Group")```With only item data remaining in the dataframe, we can easily rename the items in the item dataframe. These names match the `itemlabels` variable `itemnr`.```{r}names(df) <- itemlabels$itemnr```Now we are all set for the psychometric analysis!## DescriptivesLet's familiarize ourselves with the data before diving into the analysis.### Missing dataFirst, we visualize the proportion of missing data on item level.```{r}RImissing(df)```No missing data in this dataset. If we had missing data, we could also use `RImissingP()` to look at which respondents have missing data and how much.### Overall responsesThis provides us with an overall picture of the data distribution. As a bonus, any oddities/mistakes in recoding the item data from categories to numbers will be clearly visible.```{r}RIallresp(df)```Most R packages for Rasch analysis require the lowest response category to be zero, which makes it necessary for us to recode our data, from using the range of 1-5 to 0-4.```{r}df <- df %>%mutate(across(everything(), ~ car::recode(.x, "1=0;2=1;3=2;4=3;5=4", as.factor = F)))# always check that your recoding worked as intended.RIallresp(df)```#### Floor/ceiling effectsNow, we can also look at the raw distribution of sum scores. The `RIrawdist()` function is a bit crude, since it requires responses in all response categories to accurately calculate max and min scores.```{r}RIrawdist(df)```We can see a floor effect with 11.8% of participants responding in the lowest category for all items.#### Guttman structureWhile not really necessary, it could be interesting to see whether the response patterns follow a Guttman-like structure. Items and persons are sorted based on lower-\>higher responses, and we should see the color move from yellow in the lower left corner to blue in the upper right corner.```{r}RIheatmap(df) +theme(axis.text.x =element_blank())```In this data, we see the floor effect on the left, with 11.8% of respondents all yellow, and a rather weak Guttman structure. This could also be due to a low variation in item locations/difficulties. Since we have a very large sample I added a `theme()` option to remove the x-axis text, which would anyway just be a blur of the `r nrow(df)` respondent row numbers. Each thin vertical slice in the figure is one respondent.### Item level descriptivesThere are many ways to look at the item level data, and we'll get them all together in the tab-panel below. The `RItileplot()` is probably most informative, since it provides the number of responses in each response category for each item. It is usually recommended to have at least \~10 responses in each category for psychometric analysis, no matter which methodology is used.Kudos to [Solomon Kurz](https://solomonkurz.netlify.app/blog/2021-05-11-yes-you-can-fit-an-exploratory-factor-analysis-with-lavaan/) for providing the idea and code on which the tile plot function is built!Most people will be familiar with the barplot, and this is probably most intuitive to understand the response distribution within each item. However, if there are many items it will take a while to review, and does not provide the same overview as a tileplot or stacked bars.```{r}#| column: margin#| code-fold: true#| echo: fenced# This code chunk creates a small table in the margin beside the panel-tabset output below, showing all items currently in the df dataframe.# The Quarto code chunk option "#| column: margin" is necessary for the layout to work as intended.RIlistItemsMargin(df, fontsize =13)```::: column-page-left::: panel-tabset#### Tile plot```{r}RItileplot(df)```While response patterns are skewed for all items, there are more than 10 responses in each category for all items which is helpful for the analysis.#### Stacked bars```{r}RIbarstack(df) +theme_minimal() +# theming is optional, see section 11 for more on thistheme_rise() ```#### Barplots```{r}#| layout-ncol: 2RIbarplot(df)```::::::## Rasch analysis 1 {#sec-rasch}The eRm package and Conditional Maximum Likelihood (CML) estimation will be used primarily, with the Partial Credit Model since this is polytomous data.This is also where the [five basic psychometric aspects](https://doi.org/10.31219/osf.io/3htzc) are good to recall.- Unidimensionality & local independence- Response categories- Invariance- Targeting- Measurement uncertainties (reliability)We will begin by looking at unidimensionality, response categories, and targeting in parallel below. For unidimensionality, we are mostly interested in item fit and residual correlations, as well as PCA of residuals and loadings on the first residual contrast. At the same time, disordered response categories can influence item fit to some extent (and vice versa), and knowledge about targeting can be useful if it is necessary to remove items due to residual correlations.When unidimensionality and response categories are found to work adequately, we will move on to invariance testing (Differential Item Functioning, DIF). It should be noted that DIF should be evaluated in parallel with all other psychometric aspects, but since it is a more complex issue it is kept in a separate section in this vignette (as is person fit). Finally, when/if invariance/DIF also looks acceptable, we can investigate reliability/measurement uncertainties.::: {.callout-note}In the tabset-panel below, each tab contains explanatory text, which is sometimes a bit lengthy. Remember to **scroll back up and click on all tabs**.:::```{r}#| column: margin#| echo: falseRIlistItemsMargin(df, fontsize =13)```::: panel-tabset### Conditional item fit```{r}simfit1 <-RIgetfit(df, iterations =400, cpu =8) # save simulation output to object `simfit1`RIitemfit(df, simfit1)````RIitemfit()` and `RIgetfit()` both work with both dichotomous and polytomous data (using the partial credit model) and automatically selects the model based on the data structure.It is important to note that the new (since version 0.2.2, released 2024-08-19) `RIitemfit()` function uses **conditional** outfit/infit, which is both robust to different sample sizes and makes ZSTD unnecessary [@muller_item_2020].Since the distribution of item fit statistics are not known, we need to use simulation to determine appropriate cutoff threshold values for the current sample and items. `RIitemfit()` can also use the simulation based cutoff values and use them for conditional highlighting of misfitting items. See the [blog post on simulation based cutoffs](https://pgmj.github.io/simcutoffs.html) for some more details on this. `RIitemfit()` can also be used without cutoffs and conditional highlighting. For a possibly useful rule-of-thumb cutoff for infit MSQ only, use the option `cutoff = "Smith98"`[@smith_using_1998;@muller_item_2020]. However, this cutoff is not applicable for all items, only for what can be expected for the *average* item fit. The simulation/bootstrap-based cutoff values will be more accurate for every item in your data.Briefly stated, the simulation uses the properties of the current sample and items, and simulates n iterations of data that fit the Rasch model to get an empirical distribution of item fit that we can use for comparison with the observed data. This is also known as "parametric bootstrapping".The simulation can take quite a bit of time to run if you have complex data/many items/many participants, and/or choose to use many iterations. Simulation experiments [@johansson_detecting_2025] indicate that 100-400 iterations should be a useful range, where smaller samples (n < 300) should use 100 iterations, and 200-400 is more appropriate when one has larger samples. Another important finding from simulation studies is that there is a large risk of false positive indication of misfit in sample larger than n = 1000 when using item infit or item-restscore. The recommended primary method for large samples is the bootstrapped item-restscore [@johansson_detecting_2025], as illustrated in an adjacent tab labeled "Item-restscore bootstrap" .For reference, the simulation above, using 10 items with 5 response categories each and 1851 respondents, takes about 24 seconds to run on 8 cpu cores (Macbook Pro M1 Max) for 400 iterations.I'll cite Ostini & Nering [-@ostini_polytomous_2006] on the description of outfit and infit (pages 86-87):> Response residuals can be summed over respondents to obtain an item fit measure. Generally, the accumulation is done with squared standardized residuals, which are then divided by the total number of respondents to obtain a mean square statistic. In this form, the statistic is referred to as an **unweighted mean square** (Masters & Wright, 1997; Wright & Masters, 1982) and has also come to be known as **“outfit”** (Smith, Schumacker, & Bush, 1998; Wu, 1997), perhaps because it is highly sensitive to outlier responses (Adams & Khoo, 1996; Smith et al., 1998; Wright & Masters, 1982).> A weighted version of this statistic was developed to counteract its sensitivity to outliers (Smith, 2000). In its weighted form, the squared standardized residual is multiplied by the observed response variance and then divided by the sum of the item response variances. This is sometimes referred to as an **information weighted mean square** and has become known as **“infit”** (Smith et al., 1998; Wu, 1997).A low item fit value (sometimes referred to as an item "overfitting" the Rasch model) indicates that responses are too predictable and provide little information. This is often the case for items that are very general/broad in scope in relation to the latent variable. You will often find overfitting items to also have residual correlations with other items.A high item fit value (sometimes referred to as "underfitting" the Rasch model) can indicate several things, often multidimensionality or a question that is difficult to interpret and thus has noisy response data. The latter could for instance be caused by a question that asks about two things at the same time or is ambiguous for other reasons.::: {.callout-note}Remember to **scroll back up and click on all tabs**.:::### Item-restscoreThis is another useful function from the `iarm` package. It shows the expected and observed correlation between an item and a score based on the rest of the items [@kreiner_note_2011]. Similarly, but inverted, to item fit, a lower observed correlation value than expected indicates underfit, that the item may not belong to the dimension. A higher than expected observed value indicates an overfitting and possibly redundant item. Overfitting items will often also show issues with residual correlations. Both of these problems can often be (at least partially) resolved by removing underfitting items.```{r}RIrestscore(df)```### Item-restscore bootstrapBoth item-restscore and conditional item infit/outfit will indicate "false misfit" when sample sizes are large (even when using simulation/bootstrap based cutoff values). This behavior can occur from about n = 500, and certainly will occur at samples of 800 and above [@johansson_detecting_2025]. This "false misfit" is caused by truly misfitting items, which underlines the importance of removing one item at a time when one finds issues with misfit/multidimensionality. However, a useful way to get additional information about the probability of actual misfit is to use non-parametric bootstrapping. This function resamples with replacement from your response data and reports the percentage and type of misfit indicated by the item-restscore function. You will also get information about conditional MSQ infit (based on the full sample, using complete responders). Simulation studies indicate that a sample size of 800 results in 95+% detection rate of 1-3 misfitting items amongst 20 dichotomous items [@johansson_detecting_2025].```{r}RIbootRestscore(df, iterations =250, samplesize =800)```### Conditional item characteristic curvesThe [`iarm`](https://cran.r-project.org/web/packages/iarm/index.html) package [@mueller_iarm_2022] provides several interesting functions for assessing item fit, DIF and other things. Some of these functions may be included in a future version of the `easyRasch` package. Below are conditional item characteristic curves (ICC's) using the estimated theta (factor score).These curves indicate item fit on a group level, where respondents are split into "class intervals" based on their sum score/factor score.```{r}ICCplot(as.data.frame(df), itemnumber =1:4, method ="cut", cinumber =6, # number of class intervals to split respondents intoitemdescrip =c("PANAS_11","PANAS_12","PANAS_13","PANAS_14"))```A similar, but even more informative and flexible, visualization has been made available in the [`RASCHplot`](https://github.com/ERRTG/RASCHplot/) package [@buchardt_visualizing_2023], which needs to be installed from GitHub (see code below). The linked paper is recommended reading, not least for descriptions of the useful options available. Below are some sample plots showing conditional ICC's using the raw sum score.```{r}library(RASCHplot) # devtools::install_github("ERRTG/RASCHplot")CICCplot(PCM(df),which.item =c(1:4),lower.groups =c(0,7,14,21,28),grid.items =TRUE)```### PCA of residualsPrincipal Component Analysis of Rasch model residuals.```{r}RIpcmPCA(df)```Based on an old rule-of-thumb, the first eigenvalue should be below 1.5 to support unidimensionality [@smith_detecting_2002]. However, as with many other metrics this is dependent on sample size and test length [@chou_checking_2010] and needs to be used together with checking item fit (or item-restscore) and residual correlations (local dependence). I personally find this metric less useful and prefer to use the Conditional Likelihood Ratio Test as a global test of model fit. The PCA eigenvalues are only included here for those coming from Winsteps who might be looking for it. Speaking of Winsteps, the "explained variance" will not be comparable to Winsteps corresponding metric, since this one only shows the results from the analysis of residuals and not the explained variance of the Rasch model itself.### Conditional LRTThe Conditional Likelihood Ratio Test [LRT, @andersen_goodness_1973] is a global test of fit and can be a useful addition to more informative item-level metrics [@johansson_detecting_2025]. ### Residual correlationsIn order to support unidimensionality, items should only be related to each other through the latent variable. This is called "local independence". By investigating patterns in model residuals, we can determine whether items are independent or not.Similarly to item fit, we need to run simulations to get a useful cutoff threshold value for when residual correlations amongst item pairs are larger than would be expected from items that fit a unidimensional Rasch model [@christensen2017].The simulation/bootstrap procedure can take some time to run, depending on the complexity of your data, but it is necessary to set the appropriate cutoff value.```{r}simcor1 <-RIgetResidCor(df, iterations =400, cpu =8)RIresidcorr(df, cutoff = simcor1$p99)```The matrix above shows item-pair correlations of item residuals, with highlights in red showing correlations crossing the threshold compared to the average item-pair correlation (for all item-pairs) [@christensen2017]. Rasch model residual correlations (Yen's Q3) are calculated using the [mirt](https://cran.r-project.org/web/packages/mirt/index.html) package.### Partial gamma LDAnother way to assess local (in)dependence is by partial gamma coefficients [@kreiner_analysis_2004]. This is also a function from the `iarm` package. See `?iarm::partgam_LD` for details.```{r}RIpartgamLD(df)```### 1st contrast loadings```{r}RIloadLoc(df)```Here we see item locations and their loadings on the first residual contrast. This figure can be helpful to identify clusters in data or multidimensionality.### Analysis of response categoriesThe `xlims` setting changes the x-axis limits for the plots. The default values usually make sense, and we mostly add this option to point out the possibility of doing so. You can also choose to only show plots for only specific items.```{r}#| layout-ncol: 2RIitemCats(df, xlims =c(-5,5))```Each response category for each item should have a curve that indicates it to be the most probably response at some point on the latent variable (x axis in the figure).### Response categories MIRTFor a more compact figure.```{r}mirt(df, model=1, itemtype='Rasch', verbose =FALSE) %>%plot(type="trace", as.table =TRUE, theta_lim =c(-5,5)) # changes x axis limits```### Targeting```{r}#| fig-height: 7# increase fig-height in the chunk option above if you have many itemsRItargeting(df, xlim =c(-5,4)) # xlim defaults to c(-4,4) if you omit this option```This figure shows how well the items fit the respondents/persons. It is a sort of [Wright Map](https://www.rasch.org/rmt/rmt253b.htm) that shows person locations and item threshold locations on the same logit scale.The top part shows person location histogram, the middle part an inverted histogram of item threshold locations, and the bottom part shows individual item threshold locations. The histograms also show means and standard deviations.### Item hierarchyHere the items are sorted on their average threshold location (black diamonds). 84% confidence intervals are shown around each item threshold location. For further details, see the caption text below the figure.The numbers displayed in the plot can be disabled using the option `numbers = FALSE`.```{r}#| fig-height: 6RIitemHierarchy(df)```:::### Analysis 1 commentsItem fit shows a lot of issues.Item 18 has issues with the second lowest category being disordered. Several other items have very short distances between thresholds 1 and 2, which is also clearly seen in the Item Hierarchy figure above.Two item-pairs show residual correlations far above the cutoff value:- 15 and 16 (scared and afraid)- 17 and 18 (ashamed and guilty)Since item 15 also has a residual correlation with item 19, we will remove it. In the second pair, item 18 will be removed since it also has problems with disordered response categories.::: {.callout-note icon="true"}We have multiple "diagnostics" to review when deciding which item to remove if there are strong residual correlations between two items. Here is a list of commonly used criteria:- item fit- item threshold locations compared to sample locations (targeting)- ordering of response categories- DIF- and whether there are residual correlations between one item and multiple other items:::```{r}removed.items <-c("PANAS_15","PANAS_18")df_backup <- dfdf <- df_backup %>%select(!any_of(removed.items))```As seen in the code above, I chose to create a copy of the dataframe with the removed items omitted. This can be useful if, at a later stage in the analysis, I want to be able to quickly "go back" and reinstate an item or undo any other change I have made.## Rasch analysis 2With items 15 and 18 removed.```{r}#| column: margin#| echo: falseRIlistItemsMargin(df, fontsize =13)```::: panel-tabset### Item-restscore bootstrap```{r}RIbootRestscore(df, iterations =250, samplesize =800)```### Conditional item fit```{r}simfit2 <-RIgetfit(df, iterations =400, cpu =8)RIitemfit(df, simcut = simfit2)```### PCA of residuals```{r}RIpcmPCA(df)```### Residual correlations```{r}simcor2 <-RIgetResidCor(df, iterations =400, cpu =8)RIresidcorr(df, cutoff = simcor2$p99)```### 1st contrast loadings```{r}RIloadLoc(df)```### Targeting```{r}#| fig-height: 5RItargeting(df, xlim =c(-4,4), bins =45)```### Item hierarchy```{r}#| fig-height: 5RIitemHierarchy(df)```:::### Analysis 2 commentsItems 16 & 19, and 12 & 14 show problematic residual correlations.Let's look at DIF before taking action upon this information. While we are keeping DIF as a separate section in this vignette, it is recommended to include DIF-analysis in the `panel-tabset` above (on item fit, PCA, residual correlations, etc).## DIF - differential item functioningWe'll be looking at whether item (threshold) locations are stable between demographic subgroups.There are several DIF analysis tools available. The first one uses the package `psychotree`, which relies on statistical significance at p < .05 as an indicator for DIF. This is a criterion that is highly sample size sensitive, and we are always interested in the size/magnitude of DIF as well, since that will inform us about the impact of DIF on the estimated latent variable. The structure of DIF is also an important and complex aspect, particularly for polytomous data. Uniform DIF means that the DIF is similar across the latent continuum. We can test this in R using the `lordif` package, as demonstrated in @sec-lordif. However, it should be noted that the `lordif` package does not provide an option to use Rasch models, and there may be results that are caused by also allowing the discrimination parameter to vary across items.A recent preprint [@henninger_partial_2024] does a great job illustrating "differential step functioning" (DSF), which is when item threshold locations in polytomous data show varying levels of DIF. It also describes a forthcoming development of the `psychotree` where one can use DIF effect size and purification functions to evaluate DIF/DSF. When the updated package is available, I will work to implement these new functions into the `easyRasch` package as well.::: {.callout-note icon="true"}It is important to ensure that no cells in the data are empty for subgroups when conducting a DIF analysis. Split the data using the DIF-variable and create separate tileplots to review the response distribution in the DIF-groups.:::```{r}#| fig-height: 5difPlots <- df %>%# save the output into the `difPlots` objectadd_column(gender = dif.sex) %>%# add the DIF variable to the dataframesplit(.$gender) %>%# split the data using the DIF variablemap(~RItileplot(.x %>% dplyr::select(!gender)) +labs(title = .x$gender)) # create separate tileplots for each groupdifPlots$Female + difPlots$Male # the actual name of the plots (in this case Male/Female) will be determined by the factor labels```### Sex```{r}#| column: margin#| echo: falseRIlistItemsMargin(df, fontsize =13)```::: panel-tabset#### Table```{r}#| fig-height: 3RIdifTable(df, dif.sex)```#### Figure items```{r}RIdifFigure(df, dif.sex)```#### Figure thresholds```{r}RIdifFigThresh(df, dif.sex)```:::While no item shows problematic levels of DIF regarding item location, as shown by the table, there is an interesting pattern in the thresholds figure. The lowest threshold seems to be slightly lower for node 3 (Male) for all items. Also, item 11 shows a much wider spread of item locations for node 3 compared to node 2.The results do not require any action since the difference is small.### AgeThe `psychotree` package uses a model-based recursive partitioning that is particularly useful when you have a continuous variable such as age in years and a large enough sample. It will test different ways to partition the age variable to determine potential group differences [@strobl2015; @strobl2021].```{r}RIdifTable(df, dif.age)```No DIF found for age.### Group```{r}RIdifTable(df, dif.group)```And no DIF for group.### Sex and ageThe `psychotree` package also allows for DIF interaction analysis with multiple DIF variables. We can use `RIdifTable2()` to input two DIF variables.```{r}RIdifTable2(df, dif.sex, dif.age)```No interaction effect found for sex and age. The analysis only shows the previously identified DIF for sex.### LRT-based DIF {#sec-diflrt}We'll use the group variable as an example. First, we can simply run the test to get the overall result.```{r}erm.out <-PCM(df)LRtest(erm.out, splitcr = dif.group)```Review the documentation for further details, using `?LRtest` in your R console panel in Rstudio. There is also a plotting function, `plotGOF()` that may be of interest.```{r}#| column: margin#| echo: falseRIlistItemsMargin(df, fontsize =13)```::: panel-tabset#### Item location table```{r}RIdifTableLR(df, dif.group)```#### Item location figure```{r}#| fig-height: 7RIdifFigureLR(df, dif.group) +theme_rise()```#### Item threshold table```{r}RIdifThreshTblLR(df, dif.group)```#### Item threshold figure```{r}#| fig-height: 7RIdifThreshFigLR(df, dif.group) +theme_rise()```:::The item threshold table shows that the top threshold for item 13 differs more than 0.5 logits between groups. In this set of 8 items with 4 thresholds each, it is unlikely to result in problematic differences in estimated person scores.### Logistic Ordinal Regression DIF {#sec-lordif}The `lordif` package [@choi_lordif_2011] does not use a Rasch measurement model, it only offers a choice between the Graded Response Model (GRM) and the Generalized Partial Credit Model (GPCM). Both of these are 2PL models, meaning that they estimate a discrimination parameter for each item in addition to the item threshold parameters. `lordif` relies on the `mirt` package.There are several nice features available in the `lordif` package. First, we get a χ2 test of uniform or non-uniform DIF. Second, there are three possible methods/criteria for flagging items with potential DIF. One of these uses a likelihood ratio (LR) χ2 test, while the other two are indicators of DIF size/magnitude, either using a pseudo R2 statistic ("McFadden", "Nagelkerke", or "CoxSnell") or a Beta criterion. For further details, see `?lordif` in your R console or the paper describing the package [@choi_lordif_2011].Below is some sample code to get you started with `lordif`.```{r}#| results: hidelibrary(lordif)g_dif <-lordif(as.data.frame(df), as.numeric(dif.sex), # make sure that the data is in a dataframe-object and that the DIF variable is numericcriterion =c("Chisqr"), alpha =0.01, beta.change =0.1,model ="GPCM",R2.change =0.02)g_dif_sum <-summary(g_dif)``````{r}# threshold values for colorizing the table belowalpha =0.01beta.change =0.1R2.change =0.02g_dif_sum$stats %>%as.data.frame() %>%select(!all_of(c("item","df12","df13","df23"))) %>%round(3) %>%add_column(itemnr =names(df), .before ="ncat") %>%mutate(across(c(chi12,chi13,chi23), ~cell_spec(.x,color =case_when( .x < alpha ~"red",TRUE~"black" )))) %>%mutate(across(starts_with("pseudo"), ~cell_spec(.x,color =case_when( .x > R2.change ~"red",TRUE~"black" )))) %>%mutate(beta12 =cell_spec(beta12,color =case_when( beta12 > beta.change ~"red",TRUE~"black" ))) %>%kbl_rise()```We can review the results regarding uniform/non-uniform DIF by looking at the `chi*` columns. Uniform DIF is indicated by column `chi12` and non-uniform DIF by `chi23`, while column `chi13` represents "an overall test of "total DIF effect" [@choi_lordif_2011].While the table indicates significant chi2-tests for items 11 and 17, the magnitude estimates are low for these items.There are some plots available as well, using the base R `plot()` function. For some reason the plots won't render in this Quarto document, so I will try to sort that out at some point.```{r}#| layout-ncol: 2plot(g_dif) # use option `graphics.off()` to get the plots rendered one by one#plot(g_dif, graphics.off())```### Partial gamma DIFThe `iarm` package provides a function to assess DIF by partial gamma [@bjorner_differential_1998]. It should be noted that this function only shows a single partial gamma value per item, so if you have more than two groups in your comparison, you will want to also use other methods to understand your results better.There are some recommended cutoff-values mentioned in the paper above:No or negligible DIF:- Gamma within the interval -0.21 to 0.21, *or*- Gamma not significantly different from 0Slight to moderate DIF:- Gamma within the interval -0.31 to 0.31 (and outside -0.21 to 0.21), *or*- not significantly outside the interval -0.21 to 0.21Moderate to large DIF:- Gamma outside the interval -0.31 to 0.31, **and**- significantly outside the interval -0.21 to 0.21```{r}RIpartgamDIF(df, dif.sex)```We can see "slight" DIF for item 17, with a statistically significant gamma of .23. ## Rasch analysis 3While there were no significant issues with DIF for any item/subgroup combination, we need to address the previously identified problem:- Items 16 and 19 have the largest residual correlation.We'll remove item 19 since item 16 has better targeting.```{r}removed.items <-c(removed.items,"PANAS_19")df_backup2 <- dfdf <- df_backup2 %>%select(!any_of(removed.items))``````{r}#| column: margin#| echo: falseRIlistItemsMargin(df, fontsize =13)```::: panel-tabset### Item-restscore bootstrap```{r}RIbootRestscore(df, iterations =250, samplesize =800)```### Item fit```{r}simfit3 <-RIgetfit(df, iterations =400, cpu =8)RIitemfit(df, simfit3)```### CICC```{r}CICCplot(PCM(df),which.item =c(1:3,7),lower.groups =c(0,7,14,21,28),grid.items =TRUE)```### Residual correlations```{r}simcor3 <-RIgetResidCor(df, iterations =400, cpu =8)RIresidcorr(df, cutoff = simcor3$p99)```### Targeting```{r}#| fig-height: 5RItargeting(df, bins =45)```### Item hierarchy```{r}#| fig-height: 5RIitemHierarchy(df)```:::### Analysis 3 commentsNo problematic residual correlations remaining. Several items show misfit but we will end this sample analysis here and move on to show other functions.There are several item thresholds that are very closely located, as shown in the item hierarchy figure. This is not ideal, since it will inflate reliability estimates. However, we will not modify the response categories for this analysis, we only note that this is not workable and should be dealt with by trying variations of merged response categories to achieve better separation of threshold locations without disordering.## Reliability```{r}#| fig-height: 6RItif(df)```The figure above shows the Test Information Function (TIF), which indicates the reliability of all items making up the test/scale (not the reliability of the sample).The default cutoff value used in `RItif()` is TIF = 3.33, which corresponds to person separation index (PSI) = 0.7. PSI is similar to reliability coefficients such as omega and alpha, ranging from 0 to 1. You can change the TIF cutoff by using the option `cutoff`, for instance `cutoff = 2.5` (TIF values range from 1 and up).While 11.8% of respondents had a floor effect based on the raw sum scored data, the figure above shows us that 41.8% are located below the point where the items produce a PSI of 0.7 or higher. Again, note that this figure shows the reliability of the test/scale, not the sample. If you want to add the sample reliability use option `samplePSI = TRUE`. More details are available in the documentation `?RItif`.## Person fitWe can also look at how the respondents fit the Rasch model with these items. By default, `RIpfit()` outputs a histogram and a hex heatmap with the person infit ZSTD statistic, using +/- 1.96 as cutoff values. This is currently the only person fit method implemented in the `easyRasch` package, and the curious analyst is suggested to look at the package [PerFit](https://www.rdocumentation.org/packages/PerFit/versions/1.4.6/topics/PerFit-package) for more tools.```{r}RIpfit(df)```You can export the person fit values to a new variable in the dataframe by specifying `output = "dataframe"`, or if you just want the row numbers for respondents with deviant infit values, `output = "rowid"`.You can also specify a grouping variable to visualize the person fit for different groups.```{r}RIpfit(df, group = dif.sex, output ="heatmap")```Person fit is a useful way to identify respondents with unexpected response patterns and investigate this further.### `PerFit` sample codeWhile none of the functions in the `PerFit` package has been implemented in `easyRasch`, this is some code to get you started if you are interested in using it. There are multiple methods/functions available for polytomous and dichotomous data, see the package [documentation](https://www.rdocumentation.org/packages/PerFit/versions/1.4.6/topics/PerFit-package).For this example, we'll use the non-parametric U3 statistic generalized to polytomous items [@emons_nonparametric_2008].::: panel-tabset#### U3poly```{r}library(PerFit)pfit_u3poly <-U3poly(matrix = df, Ncat =5, # make sure to input number of response categories, not thresholdsIRT.PModel ="PCM")```#### Cutoff information```{r}cutoff(pfit_u3poly)```#### Flagged respondents```{r}flagged.resp(pfit_u3poly) %>%pluck("Scores") %>%as.data.frame() %>%arrange(desc(PFscores))```:::The dataframe shown under the tab `Flagged respondents` above contains a variable named `FlaggedID` which represents the row id's. This variable is useful if one wants to filter out respondents with deviant response patterns (person misfit). There are indications that persons with misfit may affect results of Andersen's LR-test for DIF [@artner_simulation_2016].### Item fit without aberrant responsesWe can remove the misfitting persons to see how that affects item fit. Let's also compare with the misfitting respondents identified by `RIpfit()`.```{r}misfits <-flagged.resp(pfit_u3poly) %>%pluck("Scores") %>%as.data.frame() %>%pull(FlaggedID)misfits2 <-RIpfit(df, output ="rowid")```::: panel-tabset#### All respondents```{r}RIitemfit(df, simcut = simfit3)```#### U3 misfit removed```{r}RIitemfit(df[-misfits,], simcut = simfit3)```#### ZSTD misfit removed```{r}RIitemfit(df[-misfits2,], simcut = simfit3)```:::## Item parametersTo allow others (and oneself) to use the item parameters estimated for estimation of person locations/thetas, we should make the item parameters available. The function will also write a csv-file with the item threshold locations. Estimations of person locations/thetas can be done with the `thetaEst()` function from the `catR` package. This is implemented in the function `RIestThetasOLD()`, see below for details.First, we'll output the parameters into a table.```{r}RIitemparams(df)```The parameters can also be output to a dataframe or a file, using the option `output = "dataframe"` or `output = "file"`.## Ordinal sum score to interval scoreThis table shows the corresponding "raw" ordinal sum score values and logit scores, with standard errors for each logit value. Interval scores are estimated using WL based on a simulated dataset using the item parameters estimated from the input dataset. The choice of WL as default is due to the lower bias compared to ML estimation [@warm1989].(An option will hopefully be added at some point to create this table based on only item parameters.)```{r}RIscoreSE(df)```### Ordinal/interval figureThe figure below can also be generated to illustrate the relationship between ordinal sum score and logit interval score. The errorbars default to show the standard error at each point, multiplied by 1.96.```{r}RIscoreSE(df, output ="figure")```### Estimating interval level person scoresBased on the Rasch analysis output of item parameters, we can estimate each individuals location or score (also known as "theta"). `RIestThetas()` by default uses WLE estimation based on item parameters from a partial credit model (PCM) and outputs a dataframe with person locations (WLE) and measurement error (SEM) on the logit scale.```{r}thetas <-RIestThetas(df)head(thetas)```Each individual has a standard error of measurement (SEM) associated with their estimated location/score. This is included in the output of the `RIestThetas()` function as the `SEM` variable, as seen above. We can review the distribution of measurement error with a figure.We can take a look at the distribution of person locations (thetas) using a histogram.```{r}hist(thetas$WLE, col ="#009ca6", main ="Histogram of person locations (thetas)", breaks =20)````RIestThetasOLD()` can be used with a pre-specified item (threshold) location matrix. The choice of WL as default is due to the lower bias compared to ML estimation [@warm1989]. Similarly to `RIscoreSE()` you can (and may indeed need to) change the range of logit scores, using the option `theta_range`. The default is `c(-7,7)`, which should hopefully work in most circumstances.If you would like to use an existing item threshold location matrix, this code may be helpful:```{r}itemParameters <-read_csv("itemParameters.csv") %>%as.matrix()itemParameters```As you can see, this is a matrix object (not a dataframe), with each item as a row, and the threshold locations as columns.## Figure designMost of the figures created by the functions can be styled (colors, fonts, etc) by adding theme settings to them. You can use the standard ggplot function `theme()` and related theme-functions. As usual it is possible to "stack" theme functions, as seen in the example below.You can also change coloring, axis limits/breaks, etc, just by adding ggplot options with a `+` sign.A custom theme function, `theme_rise()`, is included in the `easyRasch` package. It might be easier to use if you are not familiar with `theme()`.For instance, you might like to change the font to "Lato" for the item hierarchy figure, and make the background transparent.```{r}RIitemHierarchy(df) +theme_minimal() +# first apply the minimal theme to make the background transparenttheme_rise(fontfamily ="Lato") # then apply theme_rise, which simplifies making changes to all plot elements```As of package version 0.1.30.0, the `RItargeting()` function allows more flexibility in styling too, by having an option to return a list object with the three separate plots. See the [NEWS](https://github.com/pgmj/easyRasch/blob/main/NEWS.md#01300) file for more details. Since the `RItargeting()` function uses the `patchwork` library to combine plots, you can also make use of [the many functions that `patchwork` includes](https://patchwork.data-imaginist.com/articles/patchwork.html). For instance, you can set a title with a specific theme:``` {r}RItargeting(df) + plot_annotation(title = "Targeting", theme = theme_rise(fontfamily = "Arial"))```In order to change font for text *inside* plots (such as "t1" for thresholds) you will need to add an additional line of code.``` rupdate_geom_defaults("text", list(family ="Lato"))```Please note that the line of code above updates the default settings for `geom_text()` for the whole session. Also, some functions, such as `RIloadLoc()`, make use of `geom_text_repel()`, for which you would need to change the code above from "text" to "text_repel".A simple way to only change font family and font size would be to use `theme_minimal(base_family = "Calibri", base_size = 14)`. Please see the [reference page](https://ggplot2.tidyverse.org/reference/ggtheme.html) for default ggplot themes for alternatives to `theme_minimal()`.## Software used {#sec-grateful}The `grateful` package is a nice way to give credit to the packages used in making the analysis. The package can create both a bibliography file and a table object, which is handy for automatically creating a reference list based on the packages used (or at least explicitly loaded).```{r}library(grateful)pkgs <-cite_packages(cite.tidyverse =TRUE, output ="table",bib.file ="grateful-refs.bib",include.RStudio =TRUE,out.dir =getwd())# If kbl() is used to generate this table, the references will not be added to the Reference list.formattable(pkgs, table.attr ='class=\"table table-striped\" style="font-size: 13px; font-family: Lato; width: 80%"')```## Additional creditsThanks to my [colleagues at RISE](https://www.ri.se/en/what-we-do/projects/center-for-category-based-measurements) for providing feedback and testing the package on Windows and MacOS platforms. Also, thanks to [Mike Linacre](https://www.winsteps.com/linacre.htm) and [Jeanette Melin](https://www.ri.se/en/person/jeanette-melin) for providing useful feedback to improve this vignette.## Session info```{r}sessionInfo()```## References