R, Rasch, etc
  • Home
  • easyRasch vignette
  • About
  • Blog
    • Automating reports with Quarto
    • What’s in a latent trait score?
    • Data retrieval with R using API calls
    • Power analysis for multilevel models
    • Data wrangling for psychometrics in R
    • Simulation based cutoff values for Rasch item fit and residual correlations
    • Comparing Rasch packages/estimators
    • Rasch DIF magnitude & item split
    • Conditional Likelihood Ratio Test and sample size

Note: The RISEkbmRasch R package is now known as easyRasch.

Table of contents

  • 1 Introduction
  • 2 Method
  • 3 Results
  • 4 Discussion
  • 5 References

Conditional Likelihood Ratio Test and sample size

  • Show All Code
  • Hide All Code

  • View Source

False positive rate simulation tests

Author
Affiliation

Magnus Johansson

RISE Research Institutes of Sweden

Published

2025-02-07

1 Introduction

In my recent preprint on detection of item misfit in Rasch models (Johansson 2025a), the conditional likelihood ratio test (LRT, Andersen 1973) was part of one of the simulation studies. In that study, only the detection rate of misfitting items was assessed. In this brief note, the false detection rate across varying sample sizes will be investigated.

2 Method

For simplicity, the simulated dataset from the previously mentioned preprint will be re-used, and the misfitting item removed. This results in 19 dichotomous items, all simulated to fit the Rasch model. The easyRasch (Johansson 2025b) package contains a function to use non-parametric bootstrap with the LRT. The code to do so is presented below. As a comparison, a subset of 10 items from the same data was used to evaluate the impact of number of items on LRT false positive performance.

RIbootLRT(simdata[[1]][,-9], iterations = 5000, samplesize = 300, cpu = 8)

Each sample size variation used 5000 bootstrap iterations.

3 Results

Results are presented in Figure 1.

Code
library(dplyr)
library(ggplot2)
data.frame(Percent = c(6.6,6.8,8.2,10.4,12.2, 5.3, 5.4, 5.3, 5.9, 6.7), 
           n = c(250,500,1000,1500,2000,250,500,1000,1500,2000),
           k = factor(c(19,19,19,19,19,10,10,10,10,10))) %>% 
  ggplot(aes(x = n, y = Percent, color = k)) + 
  geom_hline(yintercept = 5, linetype = "dashed") +
  geom_point(position = "dodge") +
  geom_line() +
  geom_text(aes(label = paste0(Percent,"%")),
            position = position_dodge(width = 9),
            hjust = 0.3, vjust = -1,
            color = "black"
            ) +
  scale_x_continuous('Sample size', limits = c(0,2200), breaks = c(0,250,500,1000,1500,2000)) +
  scale_y_continuous('% false positives', limits = c(0,20)) +
  scale_color_brewer('Items',palette = "Dark2") +
  theme_bw()
Figure 1: Percent of false positives indicated by LRT bootstrap procedure across sample sizes

4 Discussion

This is a brief note, not a full scale simulation study. Many variables could be manipulated to better understand the expected behavior of LRT when all items fit a Rasch model. Nevertheless, this small study provides some useful information about the relationship between sample size, number of items, and false positive rate for the LRT. Even at the smaller sample sizes of 250 and 500, the false positive rate is above the expected 5%. The effect is stronger for the condition with more items. It seems clear that one should not rely too heavily on the LRT in determining model fit, especially when sample size is above 1000 and number of items is high.

I will add a condition using polytomous items later on.

5 References

Andersen, Erling B. 1973. “A Goodness of Fit Test for the Rasch Model.” Psychometrika 38 (1): 123–40. https://doi.org/10.1007/BF02291180.
Johansson, Magnus. 2025a. “Detecting Item Misfit in Rasch Models.” OSF Preprints. https://doi.org/10.31219/osf.io/j8fg2.
———. 2025b. easyRasch: Psychometric Analysis in r with Rasch Measurement Theory. https://github.com/pgmj/easyRasch.

Reuse

CC BY 4.0

Citation

BibTeX citation:
@online{johansson2025,
  author = {Johansson, Magnus},
  title = {Conditional {Likelihood} {Ratio} {Test} and Sample Size},
  date = {2025-02-07},
  url = {https://pgmj.github.io/clrt.html},
  langid = {en}
}
For attribution, please cite this work as:
Johansson, Magnus. 2025. “Conditional Likelihood Ratio Test and Sample Size.” February 7, 2025. https://pgmj.github.io/clrt.html.
Source Code
---
title: "Conditional Likelihood Ratio Test and sample size"
subtitle: "False positive rate simulation tests"
author:
  name: 'Magnus Johansson'
  affiliation: 'RISE Research Institutes of Sweden'
  affiliation-url: 'https://ri.se/shic'
  orcid: '0000-0003-1669-592X'
date: 2025-02-07
date-format: iso
google-scholar: true
citation:
  type: 'webpage'
format: 
  html:
    code-fold: true
execute: 
  cache: false
  warning: false
  message: false
editor_options: 
  chunk_output_type: console
bibliography: clrt.bib
---

# Introduction

In my recent preprint on detection of item misfit in Rasch models [@johansson_detecting_2025], the conditional likelihood ratio test [LRT, @andersen_goodness_1973] was part of one of the simulation studies. In that study, only the detection rate of misfitting items was assessed. In this brief note, the false detection rate across varying sample sizes will be investigated.

# Method

For simplicity, [the simulated dataset](https://github.com/pgmj/rasch_itemfit/blob/main/data/simdata10000.rds) from the previously mentioned preprint will be re-used, and the misfitting item removed. This results in 19 dichotomous items, all simulated to fit the Rasch model. The `easyRasch` [@johansson_easyrasch] package contains a function to use non-parametric bootstrap with the LRT. The code to do so is presented below. As a comparison, a subset of 10 items from the same data was used to evaluate the impact of number of items on LRT false positive performance.

```{r}
#| eval: false
#| code-fold: false
RIbootLRT(simdata[[1]][,-9], iterations = 5000, samplesize = 300, cpu = 8)
```

Each sample size variation used 5000 bootstrap iterations. 

# Results

Results are presented in @fig-results.

```{r}
#| label: fig-results
#| fig-cap: Percent of false positives indicated by LRT bootstrap procedure across sample sizes
library(dplyr)
library(ggplot2)
data.frame(Percent = c(6.6,6.8,8.2,10.4,12.2, 5.3, 5.4, 5.3, 5.9, 6.7), 
           n = c(250,500,1000,1500,2000,250,500,1000,1500,2000),
           k = factor(c(19,19,19,19,19,10,10,10,10,10))) %>% 
  ggplot(aes(x = n, y = Percent, color = k)) + 
  geom_hline(yintercept = 5, linetype = "dashed") +
  geom_point(position = "dodge") +
  geom_line() +
  geom_text(aes(label = paste0(Percent,"%")),
            position = position_dodge(width = 9),
            hjust = 0.3, vjust = -1,
            color = "black"
            ) +
  scale_x_continuous('Sample size', limits = c(0,2200), breaks = c(0,250,500,1000,1500,2000)) +
  scale_y_continuous('% false positives', limits = c(0,20)) +
  scale_color_brewer('Items',palette = "Dark2") +
  theme_bw()
```


# Discussion

This is a brief note, not a full scale simulation study. Many variables could be manipulated to better understand the expected behavior of LRT when all items fit a Rasch model. Nevertheless, this small study provides some useful information about the relationship between sample size, number of items, and false positive rate for the LRT. Even at the smaller sample sizes of 250 and 500, the false positive rate is above the expected 5%. The effect is stronger for the condition with more items. It seems clear that one should not rely too heavily on the LRT in determining model fit, especially when sample size is above 1000 and number of items is high.

I will add a condition using polytomous items later on.

# References