Example: Synthetic Aperture Personality Assessment

The Synthetic Aperture Personality Assessment (SAPA) is a web based personality assessment project (https://www.sapa-project.org/). The purpose of SAPA is to find patterns among the vast number of ways that people differ from one another in terms of their thoughts, feelings, interests, abilities, desires, values, and preferences (Condon & Revelle, 2014; Revelle et al., 2010). In this example, we will use a subset of SAPA (16 items) sampled from the full instrument (80 items) to develop online measures of ability. These 16 items measure four subskills (i.e., verbal reasoning, letter series, matrix reasoning, and spatial rotations) as part of the general intelligence, also known as g factor. The “sapa_data.csv” dataset is a data frame with 1525 individuals who responded to 16 multiple-choice items in SAPA. The original dataset is included in the psych package (Revelle, 2022). The dataset can be downloaded from here. In addition, the R codes for the item response theory (IRT) analyses presented on this page are available here.


Setting up R

In our example, we will conduct IRT and other relevant analyses using the following packages:

Package URL
Exploratory Data Analysis
DataExplorer http://CRAN.R-project.org/package=DataExplorer
ggcorrplot http://CRAN.R-project.org/package=ggcorrplot
Psychometric Analysis
psych http://CRAN.R-project.org/package=psych
lavaan http://CRAN.R-project.org/package=lavaan
mirt http://CRAN.R-project.org/package=mirt
ShinyItemAnalysis http://CRAN.R-project.org/package=ShinyItemAnalysis


We have already installed and used some of the above packages in the CTT section. Therefore, we will only install the new R packages (lavaan and mirt) at this time and then activate all the required packages one by one using the library() command:

# Install the missing packages
install.packages(c("lavaan", "mirt"))

# Activate the required packages
library("DataExplorer")
library("ggcorrplot")
library("psych")
library("lavaan")
library("mirt")
library("ShinyItemAnalysis")

We will use lavaan (Rosseel et al., 2023) to conduct confirmatory factor analysis and mirt (Chalmers, 2023) to estimate dichotomous and polytomous IRT models.


🔔 INFORMATION: There are many other packages for estimating IRT models in R, such as ltm (Rizopoulos, 2006), eRm (Mair et al., 2020), TAM (Robitzsch et al., 2021), and irtoys (Partchev & Maris, 2017). I prefer the mirt package because it includes functions to estimate various IRT models (e.g., unidimensional, multidimensional, and explanatory IRT models), additional functions to check model assumptions (e.g., local independence), and various tools to visualize IRT-related objects (e.g., item characteristic curve, item information function, and test information function). You can check out Choi & Asilkalkan (2019) for a detailed review of IRT packages available in R.


Exploratory data analysis

We will begin our analysis by conducting exploratory data analysis (EDA). As you may remember from the CTT section, we use EDA to check the quality of our data and identify potential problems (i.e., missing values) in the data. In this section, we will import sapa_data.csv into R and review the variables in the dataset using descriptive statistics and visualizations.

First, we need to set up our working directory. I have created a new folder called “IRT Analysis” on my desktop and put our data (sapa_data.csv) into this folder. Now, we will change our working directory to this new folder (use the path where you keep these files in your own computer):

setwd("C:/Users/Okan/Desktop/IRT Analysis")

Next, we will import the data into R using the read.csv() function and save it as “sapa”.

sapa <- read.csv("sapa_data.csv", header = TRUE)

Using the head() function, we can now view the first 6 rows of the sapa dataset:

head(sapa)
  reason.4 reason.16 reason.17 reason.19 letter.7 letter.33 letter.34 letter.58 matrix.45 matrix.46 matrix.47
1        0         0         0         0        0         1         0         0         0         0         0
2        0         0         1         0        1         0         1         0         0         0         0
3        0         1         1         0        1         0         0         0         1         1         0
4        1         0         0         0        0         0         1         0         0         0         0
5        0         1         1         0        0         1         0         0         1         1         0
6        1         1         1         1        1         1         1         1         1         1         1
  matrix.55 rotate.3 rotate.4 rotate.6 rotate.8
1         1        0        0        0        0
2         0        0        0        1        0
3         0        0        0        0        0
4         0        0        0        0        0
5         0        0        0        0        0
6         0        1        1        1        0

We can also see the names and types of the variables in our dataset using the str() function:

str(sapa)
'data.frame':   1525 obs. of  16 variables:
 $ reason.4 : int  0 0 0 1 0 1 1 0 1 1 ...
 $ reason.16: int  0 0 1 0 1 1 1 1 1 1 ...
 $ reason.17: int  0 1 1 0 1 1 1 0 0 1 ...
 $ reason.19: int  0 0 0 0 0 1 1 0 1 1 ...
 $ letter.7 : int  0 1 1 0 0 1 1 0 0 0 ...
 $ letter.33: int  1 0 0 0 1 1 1 0 1 0 ...
 $ letter.34: int  0 1 0 1 0 1 1 0 1 1 ...
 $ letter.58: int  0 0 0 0 0 1 1 0 1 0 ...
 $ matrix.45: int  0 0 1 0 1 1 1 0 1 1 ...
 $ matrix.46: int  0 0 1 0 1 1 1 1 0 1 ...
 $ matrix.47: int  0 0 0 0 0 1 1 1 0 0 ...
 $ matrix.55: int  1 0 0 0 0 0 0 0 0 0 ...
 $ rotate.3 : int  0 0 0 0 0 1 1 0 0 0 ...
 $ rotate.4 : int  0 0 0 0 0 1 1 1 0 0 ...
 $ rotate.6 : int  0 1 0 0 0 1 1 0 0 0 ...
 $ rotate.8 : int  0 0 0 0 0 0 1 0 0 0 ...

The dataset consists of 1525 rows (i.e., examinees) and 16 variables (i.e., SAPA items). We can get more information on the dataset using the introduce() and plot_intro() functions from the DataExplorer package (Cui, 2020):

DataExplorer::introduce(sapa)

DataExplorer::plot_intro(sapa)
rows 1,525
columns 16
discrete_columns 0
continuous_columns 16
all_missing_columns 0
total_missing_values 25
complete_rows 1,523
total_observations 24,400
memory_usage 101,832

The plot above shows that all of the variables in the dataset are continuous. We also see that some of the variables have missing values but the proportion of missing data is very small (only 0.10%). To have a closer look at missing values, we can visualize the proportion of missingness for each variable using plot_missing() from DataExplorer.

DataExplorer::plot_missing(sapa)

To obtain a detailed summary of the sapa dataset, we will use the describe() function from the psych package (Revelle, 2022).

psych::describe(x = sapa)
          vars    n mean   sd median trimmed mad min max range  skew kurtosis   se
reason.4     1 1523 0.64 0.48      1    0.68   0   0   1     1 -0.58    -1.66 0.01
reason.16    2 1524 0.70 0.46      1    0.75   0   0   1     1 -0.86    -1.26 0.01
reason.17    3 1523 0.70 0.46      1    0.75   0   0   1     1 -0.86    -1.26 0.01
reason.19    4 1523 0.62 0.49      1    0.64   0   0   1     1 -0.47    -1.78 0.01
letter.7     5 1524 0.60 0.49      1    0.62   0   0   1     1 -0.41    -1.84 0.01
letter.33    6 1523 0.57 0.50      1    0.59   0   0   1     1 -0.29    -1.92 0.01
letter.34    7 1523 0.61 0.49      1    0.64   0   0   1     1 -0.46    -1.79 0.01
letter.58    8 1525 0.44 0.50      0    0.43   0   0   1     1  0.23    -1.95 0.01
matrix.45    9 1523 0.53 0.50      1    0.53   0   0   1     1 -0.10    -1.99 0.01
matrix.46   10 1524 0.55 0.50      1    0.56   0   0   1     1 -0.20    -1.96 0.01
matrix.47   11 1523 0.61 0.49      1    0.64   0   0   1     1 -0.47    -1.78 0.01
matrix.55   12 1524 0.37 0.48      0    0.34   0   0   1     1  0.52    -1.73 0.01
rotate.3    13 1523 0.19 0.40      0    0.12   0   0   1     1  1.55     0.40 0.01
rotate.4    14 1523 0.21 0.41      0    0.14   0   0   1     1  1.40    -0.03 0.01
rotate.6    15 1523 0.30 0.46      0    0.25   0   0   1     1  0.88    -1.24 0.01
rotate.8    16 1524 0.19 0.39      0    0.11   0   0   1     1  1.62     0.63 0.01

From the output above, we can see the number of individuals who responded to each SAPA item, the mean response value (i.e., proportion-correct or item difficulty), and other descriptive statistics. We see that most SAPA items have moderate difficulty values although the rotation items (i.e., rotate.3, rotate.4, rotate.6, and rotate.8) are more difficult than the remaining items in the dataset.

In the CTT section, we checked the correlations among the nfc items to gauge how strongly the items were associated with each other. We expected the items to be associated with each other because they were designed to measure the same latent trait (i.e., need for cognition). For the sapa dataset, we will have to make a similar assumption: all SAPA items measure the same latent trait (general intelligence or g). However, given that the items come from different content areas (i.e., verbal reasoning, letter series, matrix reasoning, and spatial rotations), we must ensure that these items are sufficiently correlated with each other.

To compute the correlations among the SAPA items, we will use the tetrachoric() function from psych. Since the SAPA items are dichotomously scored (i.e., 0: incorrect and 1: correct), we cannot use Pearson correlations (which could be obtained using the cor() function in R). We will compute the correlations and then extract “rho” (i.e., the correlation matrix of the items).

# Save the correlation matrix
cormat <- psych::tetrachoric(x = sapa)$rho

# Print the correlation matrix
print(cormat)
reason.4 reason.16 reason.17 reason.19 letter.7 letter.33 letter.34 letter.58 matrix.45 matrix.46 matrix.47 matrix.55 rotate.3 rotate.4 rotate.6 rotate.8
reason.4 1.0000 0.4742 0.6000 0.4839 0.4623 0.3803 0.4792 0.4549 0.4313 0.3994 0.4010 0.2988 0.4562 0.4909 0.4468 0.4360
reason.16 0.4742 1.0000 0.5358 0.4578 0.4708 0.3737 0.4494 0.3805 0.3513 0.3375 0.4210 0.3135 0.3266 0.4425 0.4079 0.3642
reason.17 0.6000 0.5358 1.0000 0.5496 0.4729 0.4405 0.4726 0.4797 0.3578 0.3924 0.4656 0.3156 0.3856 0.4274 0.5061 0.4007
reason.19 0.4839 0.4578 0.5496 1.0000 0.4341 0.4351 0.4648 0.4231 0.3830 0.3227 0.4075 0.3058 0.3636 0.4192 0.3691 0.3329
letter.7 0.4623 0.4708 0.4729 0.4341 1.0000 0.5380 0.6026 0.5228 0.3416 0.4119 0.4423 0.2794 0.3276 0.4432 0.3641 0.2751
letter.33 0.3803 0.3737 0.4405 0.4351 0.5380 1.0000 0.5744 0.4483 0.3351 0.3966 0.3921 0.3178 0.3400 0.3942 0.3694 0.2679
letter.34 0.4792 0.4494 0.4726 0.4648 0.6026 0.5744 1.0000 0.5154 0.3647 0.4362 0.4807 0.2563 0.3736 0.4158 0.3476 0.3079
letter.58 0.4549 0.3805 0.4797 0.4231 0.5228 0.4483 0.5154 1.0000 0.3304 0.3415 0.3853 0.3708 0.4159 0.4574 0.4387 0.3975
matrix.45 0.4313 0.3513 0.3578 0.3830 0.3416 0.3351 0.3647 0.3304 1.0000 0.5106 0.4026 0.3482 0.3071 0.3280 0.2657 0.3089
matrix.46 0.3994 0.3375 0.3924 0.3227 0.4119 0.3966 0.4362 0.3415 0.5106 1.0000 0.3832 0.2516 0.2868 0.3230 0.3495 0.2899
matrix.47 0.4010 0.4210 0.4656 0.4075 0.4423 0.3921 0.4807 0.3853 0.4026 0.3832 1.0000 0.3624 0.4074 0.4017 0.3413 0.3420
matrix.55 0.2988 0.3135 0.3156 0.3058 0.2794 0.3178 0.2563 0.3708 0.3482 0.2516 0.3624 1.0000 0.3274 0.3226 0.2998 0.3413
rotate.3 0.4562 0.3266 0.3856 0.3636 0.3276 0.3400 0.3736 0.4159 0.3071 0.2868 0.4074 0.3274 1.0000 0.7706 0.6655 0.6748
rotate.4 0.4909 0.4425 0.4274 0.4192 0.4432 0.3942 0.4158 0.4574 0.3280 0.3230 0.4017 0.3226 0.7706 1.0000 0.6908 0.6822
rotate.6 0.4468 0.4079 0.5061 0.3691 0.3641 0.3694 0.3476 0.4387 0.2657 0.3495 0.3413 0.2998 0.6655 0.6908 1.0000 0.6650
rotate.8 0.4360 0.3642 0.4007 0.3329 0.2751 0.2679 0.3079 0.3975 0.3089 0.2899 0.3420 0.3413 0.6748 0.6822 0.6650 1.0000

The correlation matrix does not show any negative or low correlations (which is a very good sign! 👍). To check the associations among the items more carefully, we will also create a correlation matrix plot using the ggcorrplot() function from the ggcorrplot package (Kassambara, 2022). We will include the hc.order = TRUE argument to perform hierarchical clustering. This will look for groups (i.e., clusters) of items that are strongly associated with each other. If all SAPA items measure the same latent trait, we should see a single cluster of items.

ggcorrplot::ggcorrplot(corr = cormat, # correlation matrix
                       type = "lower", # print only the lower part of the correlation matrix
                       hc.order = TRUE, # hierarchical clustering
                       show.diag = TRUE, # show the diagonal values of 1
                       lab = TRUE, # add correlation values as labels
                       lab_size = 3) # Size of the labels

The figure above shows that the four rotation items have created a cluster (see the cluster on the top-right corner), while the remaining SAPA items have created another cluster (see the cluster on the bottom-left corner). The rotation items are strongly correlated with each other (not surprising given that they all focus on the rotation skills); however, the same items have relatively lower correlations with the other items in the dataset. Also, matrix.55 seems to have relatively low correlations with the items from both clusters.

Findings of hierarchical clustering suggest that the SAPA items may not be measuring a single latent trait. However, hierarchical clustering is not a test of dimensionality. To ensure that there is a single factor (i.e., latent trait) underlying the SAPA items, we need to perform factor analysis and evaluate the factor structure of the SAPA items (i.e., dimensionality).


Exploratory factor analysis

Factor analysis is a statistical modeling technique that aims to explain the common variability among a set of observed variables and transform the variables into a reduced set of variables known as factors (or dimensions). At the core of factor analysis is the desire to reduce the dimensionality of the data from \(p\) observed variables to \(q\) factors, such that \(q < p\). During instrument development or when there are no prior beliefs about the dimensionality or structure of an existing instrument, exploratory factor analysis (EFA) should be considered to investigate the factorial structure of the instrument. To perform EFA, we need to specify the number of factors to extract, how to rotate factors (if the number of factors > 1), and which type of estimation method should be used depending on the nature of the data (e.g., categorical vs. continuous variables).

The psych package includes several functions to perform factor analytic analysis with different estimation methods. We will use the fa() function in the psych package to perform EFA. To use the function, we need to specify the following items:

  • r = Response data (either raw data or a correlation matrix)
  • n.obs = Number of observations in the data (necessary only when using a correlation matrix)
  • nfactors = Number of factors that we expect to find in the data
  • rotate = Type of rotation if \(n > 1\). We can use “varimax” for an orthogonal rotation that assumes no correlation between factors or “oblimin” for an oblique rotation that assumes factors are somewhat correlated
  • fm = Factor analysis method. We will use “pa” (i.e., principal axis) for EFA
  • cor = How to find the correlations when using raw data. For continuous variable, use cor = "Pearson" (Pearson correlation); for dichotomous variables, use cor = "tet" (tetrachoric correlation); for polytomous variables (e.g., Likert scales), use cor = "poly" (polychoric correlation).

First, we will try a one-factor model, evaluate model fit, and determine whether a one-factor (i.e., unidimensional) structure is acceptable for the SAPA items:

# Try one-factor EFA model --> nfactors = 1
efa.model1 <- psych::fa(r = sapa, nfactors = 1, fm = "pa", cor = "tet")
# Print the results 
print(efa.model1, sort = TRUE) # Show the factor loadings sorted by absolute value
Factor Analysis using method =  pa
Call: psych::fa(r = sapa, nfactors = 1, fm = "pa", cor = "tet")
Standardized loadings (pattern matrix) based upon correlation matrix
           V  PA1   h2   u2 com
rotate.4  14 0.74 0.55 0.45   1
reason.17  3 0.71 0.51 0.49   1
reason.4   1 0.70 0.49 0.51   1
rotate.6  15 0.69 0.47 0.53   1
letter.34  7 0.68 0.46 0.54   1
rotate.3  13 0.68 0.46 0.54   1
letter.7   5 0.67 0.44 0.56   1
letter.58  8 0.66 0.44 0.56   1
reason.19  4 0.64 0.41 0.59   1
rotate.8  16 0.64 0.41 0.59   1
reason.16  2 0.63 0.40 0.60   1
matrix.47 11 0.62 0.39 0.61   1
letter.33  6 0.62 0.39 0.61   1
matrix.46 10 0.56 0.31 0.69   1
matrix.45  9 0.54 0.30 0.70   1
matrix.55 12 0.48 0.23 0.77   1

                PA1
SS loadings    6.65
Proportion Var 0.42

Mean item complexity =  1
Test of the hypothesis that 1 factor is sufficient.

The degrees of freedom for the null model are  120  and the objective function was  8.04 with Chi Square of  12198
The degrees of freedom for the model are 104  and the objective function was  1.85 

The root mean square of the residuals (RMSR) is  0.08 
The df corrected root mean square of the residuals is  0.09 

The harmonic number of observations is  1523 with the empirical chi square  2304  with prob <  0 
The total number of observations was  1525  with Likelihood Chi Square =  2806  with prob <  0 

Tucker Lewis Index of factoring reliability =  0.742
RMSEA index =  0.131  and the 90 % confidence intervals are  0.126 0.135
BIC =  2044
Fit based upon off diagonal values = 0.96
Measures of factor score adequacy             
                                                   PA1
Correlation of (regression) scores with factors   0.96
Multiple R square of scores with factors          0.92
Minimum correlation of possible factor scores     0.84

The output shows the factor loadings for each item (see the PA1 column) and the proportion of explained variance (42%; see Proportion Var). The factor loadings seem fine (i.e., > 0.3), which is the typical cut-off value to determine significant loadings. We can determine model fit based on model fit indices of root mean square of residuals (RMSR), root mean square error of approximation (RMSEA), and Tucker-Lewis Index. We can use Hu & Bentler (1999)’s guidelines for these model fit indices: Tucker-Lewis index (TLI) > .95, RMSEA < .06, and RMSR near zero indicate good model fit. The fit measures in the output show that the one-factor model does not necessarily fit the sapa dataset. Therefore, we will try a two-factor model in the next run:

# Try two-factor EFA model --> nfactors=2
efa.model2 <- psych::fa(sapa, nfactors = 2, rotate = "oblimin", fm = "pa", cor = "tet")
# Print the results 
print(efa.model2, sort = TRUE) # Show the factor loadings sorted by absolute value
Factor Analysis using method =  pa
Call: psych::fa(r = sapa, nfactors = 2, rotate = "oblimin", fm = "pa", 
    cor = "tet")
Standardized loadings (pattern matrix) based upon correlation matrix
          item   PA1   PA2   h2   u2 com
letter.34    7  0.80 -0.09 0.56 0.44 1.0
letter.7     5  0.78 -0.09 0.53 0.47 1.0
letter.33    6  0.70 -0.05 0.44 0.56 1.0
reason.17    3  0.65  0.11 0.52 0.48 1.1
reason.19    4  0.62  0.05 0.43 0.57 1.0
matrix.46   10  0.59 -0.01 0.34 0.66 1.0
matrix.47   11  0.59  0.07 0.40 0.60 1.0
reason.16    2  0.58  0.09 0.41 0.59 1.0
reason.4     1  0.56  0.19 0.49 0.51 1.2
letter.58    8  0.56  0.15 0.44 0.56 1.1
matrix.45    9  0.56  0.02 0.32 0.68 1.0
matrix.55   12  0.35  0.17 0.23 0.77 1.5
rotate.3    13 -0.03  0.87 0.72 0.28 1.0
rotate.8    16 -0.05  0.85 0.66 0.34 1.0
rotate.4    14  0.09  0.81 0.75 0.25 1.0
rotate.6    15  0.07  0.75 0.64 0.36 1.0

                       PA1  PA2
SS loadings           4.87 3.03
Proportion Var        0.30 0.19
Cumulative Var        0.30 0.49
Proportion Explained  0.62 0.38
Cumulative Proportion 0.62 1.00

 With factor correlations of 
     PA1  PA2
PA1 1.00 0.63
PA2 0.63 1.00

Mean item complexity =  1.1
Test of the hypothesis that 2 factors are sufficient.

The degrees of freedom for the null model are  120  and the objective function was  8.04 with Chi Square of  12198
The degrees of freedom for the model are 89  and the objective function was  0.64 

The root mean square of the residuals (RMSR) is  0.04 
The df corrected root mean square of the residuals is  0.05 

The harmonic number of observations is  1523 with the empirical chi square  551.4  with prob <  7.8e-68 
The total number of observations was  1525  with Likelihood Chi Square =  976.6  with prob <  2.1e-149 

Tucker Lewis Index of factoring reliability =  0.901
RMSEA index =  0.081  and the 90 % confidence intervals are  0.076 0.086
BIC =  324.3
Fit based upon off diagonal values = 0.99
Measures of factor score adequacy             
                                                   PA1  PA2
Correlation of (regression) scores with factors   0.95 0.95
Multiple R square of scores with factors          0.90 0.91
Minimum correlation of possible factor scores     0.81 0.81

Based on the factor loadings listed under the PA1 and PA2 columns, we see that the first 12 items are highly loaded on the first factor whereas the last four items (i.e., rotation items) are highly loaded on the second factor. This finding is aligned with what we have observed in the correlation matrix plot earlier. Another important finding is that one of the items (matrix.55) is not sufficiently loaded on either of the two factors.

The rest of the output shows that the first factor explains 30% of the total variance while the second factor explains 19% of the total variance (see Proportion Var). Compared to the one-factor model, the two-factor model explains an additional 7% of variance in the data. The two factors seem to be moderately correlated (\(r = .63\)). The model fit indices show that the two-factor model fits the data better (though the model fit indices do not entirely meet the guidelines).

At this point, we need to make a theoretical decision informed by the statistical output: Can we still assume that all the items in the sapa dataset measure the same latent trait? Or, should we exclude the items that do not seem to correlate well with the rest of the items in the dataset? The evidence we obtained from the EFA models suggests that the rotation items may not be the part of the construct measured by the rest of the SAPA items. Also, matrix.55 appears to be a bit problematic. Therefore, we can choose to exclude these five items from the dataset.

In the following section, we will first use the subset() function (from base R) to drop the rotation items and matrix.55 and save the new dataset as “sapa_clean”. Next, we will run the one-factor EFA model again using the items in sapa_clean.

# Drop the problematic items
sapa_clean <- subset(sapa, select = -c(rotate.3, rotate.4, rotate.6, rotate.8, matrix.55))

# Try one-factor EFA model with the clean dataset
efa.model3 <- psych::fa(sapa_clean, nfactors = 1, fm = "pa", cor = "tet")
print(efa.model3, sort=TRUE)
Factor Analysis using method =  pa
Call: psych::fa(r = sapa_clean, nfactors = 1, fm = "pa", cor = "tet")
Standardized loadings (pattern matrix) based upon correlation matrix
           V  PA1   h2   u2 com
letter.34  7 0.74 0.55 0.45   1
reason.17  3 0.73 0.53 0.47   1
letter.7   5 0.72 0.52 0.48   1
reason.4   1 0.69 0.48 0.52   1
reason.19  4 0.66 0.44 0.56   1
letter.33  6 0.65 0.43 0.57   1
letter.58  8 0.65 0.42 0.58   1
reason.16  2 0.64 0.41 0.59   1
matrix.47 11 0.63 0.39 0.61   1
matrix.46 10 0.58 0.34 0.66   1
matrix.45  9 0.56 0.32 0.68   1

                PA1
SS loadings    4.84
Proportion Var 0.44

Mean item complexity =  1
Test of the hypothesis that 1 factor is sufficient.

The degrees of freedom for the null model are  55  and the objective function was  4.54 with Chi Square of  6898
The degrees of freedom for the model are 44  and the objective function was  0.37 

The root mean square of the residuals (RMSR) is  0.05 
The df corrected root mean square of the residuals is  0.05 

The harmonic number of observations is  1523 with the empirical chi square  385.8  with prob <  3.7e-56 
The total number of observations was  1525  with Likelihood Chi Square =  569.1  with prob <  1.9e-92 

Tucker Lewis Index of factoring reliability =  0.904
RMSEA index =  0.088  and the 90 % confidence intervals are  0.082 0.095
BIC =  246.6
Fit based upon off diagonal values = 0.99
Measures of factor score adequacy             
                                                   PA1
Correlation of (regression) scores with factors   0.95
Multiple R square of scores with factors          0.90
Minimum correlation of possible factor scores     0.80

The output above shows that the model fit has improved significantly after removing the rotation items and matrix.55 from the dataset. Thus, we will use the sapa_clean dataset for subsequent analyses. Please note that for the sake of brevity, we followed a data-driven approach to determine whether the problematic items need to be removed from the data. A more suitable solution would be to review the content of these items carefully and the output of the EFA models, and make a decision considering both the theoretical assumptions regarding the items and the statistical findings.


Confirmatory factor analysis

After an instrument has been developed and validated, we have a sense of its dimensionality (i.e., factor structure) and which observed variables (i.e., items) should load onto which factor(s). In this setting, it is more appropriate to consider confirmatory factor analysis (CFA) for examining the factor structure. Unlike with EFA, in CFA the researcher must create a theoretically-justified factor model by specifying the factor(s) and which items are associated with each factor and evaluate its fit to the data.

Following the results of EFA from the previous section, we will go ahead and fit a one-factor CFA model to the sapa_clean dataset. To perform CFA in R, as well as path analysis and structural equation modeling, we can use the lavaan package (Rosseel et al., 2023), which stands for latent variable analysis. The lavaan package uses its own special model syntax. For conducting a CFA, we need to define a model and then estimate the model using the cfa() function. The model definition below begins with a single quote and ends with the same single quote. We named our factor as “f” (or, we could name it as “intelligence”) and listed the items associated with this factor (i.e., SAPA items).

# Define a single factor
sapa_model <- 'f =~ reason.4 + reason.16 + reason.17 + reason.19 + letter.7 + 
               letter.33 + letter.34 + letter.58 + matrix.45 + matrix.46 + matrix.47'

Next, we will run the CFA model for the model defined above. If the items are either dichotomous or polytomous, then the estimator should be either “MLR” or “WLSMV” because these estimators are more robust against non-normality, which is usually the case for categorical data. In this example, we will use “MLR” (i.e., Robust Maximum Likelihood) to estimate our CFA model.

# Estimate the model
cfa_sapa <- lavaan::cfa(sapa_model, data = sapa_clean, estimator = "MLR")

# Print the output
summary(cfa_sapa, fit.measures=TRUE, standardized = TRUE)
lavaan 0.6.14 ended normally after 39 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        22

                                                  Used       Total
  Number of observations                          1523        1525

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                               182.266     160.148
  Degrees of freedom                                44          44
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.138
    Yuan-Bentler correction (Mplus variant)                       

Model Test Baseline Model:

  Test statistic                              3225.104    2771.330
  Degrees of freedom                                55          55
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.164

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.956       0.957
  Tucker-Lewis Index (TLI)                       0.945       0.947
                                                                  
  Robust Comparative Fit Index (CFI)                         0.958
  Robust Tucker-Lewis Index (TLI)                            0.948

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -10128.806  -10128.806
  Scaling correction factor                                  0.697
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)     -10037.673  -10037.673
  Scaling correction factor                                  0.991
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                               20301.613   20301.613
  Bayesian (BIC)                             20418.838   20418.838
  Sample-size adjusted Bayesian (SABIC)      20348.950   20348.950

Root Mean Square Error of Approximation:

  RMSEA                                          0.045       0.042
  90 Percent confidence interval - lower         0.039       0.035
  90 Percent confidence interval - upper         0.052       0.048
  P-value H_0: RMSEA <= 0.050                    0.858       0.982
  P-value H_0: RMSEA >= 0.080                    0.000       0.000
                                                                  
  Robust RMSEA                                               0.044
  90 Percent confidence interval - lower                     0.037
  90 Percent confidence interval - upper                     0.052
  P-value H_0: Robust RMSEA <= 0.050                         0.887
  P-value H_0: Robust RMSEA >= 0.080                         0.000

Standardized Root Mean Square Residual:

  SRMR                                           0.032       0.032

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  f =~                                                                  
    reason.4          1.000                               0.268    0.558
    reason.16         0.868    0.055   15.901    0.000    0.232    0.506
    reason.17         0.990    0.051   19.454    0.000    0.265    0.577
    reason.19         0.967    0.055   17.589    0.000    0.259    0.532
    letter.7          1.076    0.061   17.547    0.000    0.288    0.588
    letter.33         0.987    0.062   15.926    0.000    0.264    0.534
    letter.34         1.104    0.062   17.859    0.000    0.296    0.607
    letter.58         0.958    0.056   17.178    0.000    0.256    0.516
    matrix.45         0.830    0.055   15.231    0.000    0.222    0.445
    matrix.46         0.865    0.058   14.907    0.000    0.232    0.465
    matrix.47         0.914    0.059   15.419    0.000    0.245    0.502

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .reason.4          0.159    0.006   25.914    0.000    0.159    0.689
   .reason.16         0.157    0.006   27.641    0.000    0.157    0.744
   .reason.17         0.141    0.006   24.298    0.000    0.141    0.668
   .reason.19         0.170    0.006   28.392    0.000    0.170    0.717
   .letter.7          0.157    0.006   25.381    0.000    0.157    0.655
   .letter.33         0.175    0.006   28.471    0.000    0.175    0.715
   .letter.34         0.150    0.006   23.743    0.000    0.150    0.632
   .letter.58         0.181    0.006   32.745    0.000    0.181    0.734
   .matrix.45         0.200    0.005   36.477    0.000    0.200    0.802
   .matrix.46         0.194    0.006   34.265    0.000    0.194    0.783
   .matrix.47         0.177    0.006   30.143    0.000    0.177    0.748
    f                 0.072    0.006   11.727    0.000    1.000    1.000

The cfa() function returns a long output with model fit statistics, model parameters, and additional information, but we will only focus on model fit indices of Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), and Root Mean Square Error of Approximation (RMSEA) to interpret the fit of the one-factor model to the sapa_clean dataset (please refer to UCLA Statistical Consulting Group’s website for a more detailed coverage of CFA model estimation using lavaan. The website also includes annotated examples of a variety of statistical analyses using different software programs such as SPSS, SAS, R, and Mplus).

In the output, both CFI and TLI values (under the “Robust” column) are larger than .95, indicating good model fit. Similarly, the RMSEA value of .042 for the one-factor model suggests good model fit (since it is less than the suggested cut-off value of .06). Also, the “Std.all” column in the “Latent Variables: f =~” section shows standardized factor loadings for the items. We see that all the items in sapa_clean have a high factor loading (> 0.3), indicating an adequate relationship with the estimated factor.


Item analysis

Before we start the IRT analysis, let’s take a look at the items by running item analysis based on Classical Test Theory (CTT). This will allow us to have a final look at the items that we identified based on EFA and CFA and ensure that the response dataset is ready for IRT analysis. We will use the alpha() function from the psych package for running CTT-based item analysis.

# Run the item analysis and save it as itemanalysis_psych
itemanalysis_psych <- psych::alpha(x = sapa_clean)

# Print the results
itemanalysis_psych

Reliability analysis   
Call: psych::alpha(x = sapa_clean)

  raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
      0.81      0.81     0.8      0.28 4.3 0.0072  0.6 0.29     0.28

    95% confidence boundaries 
         lower alpha upper
Feldt      0.8  0.81  0.82
Duhachek   0.8  0.81  0.82

 Reliability if an item is dropped:
          raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
reason.4       0.79      0.79    0.78      0.28 3.8   0.0079 0.0024  0.28
reason.16      0.80      0.80    0.79      0.28 4.0   0.0077 0.0024  0.28
reason.17      0.79      0.79    0.78      0.28 3.8   0.0079 0.0021  0.27
reason.19      0.80      0.80    0.78      0.28 3.9   0.0078 0.0025  0.28
letter.7       0.79      0.79    0.78      0.28 3.8   0.0080 0.0021  0.27
letter.33      0.80      0.80    0.78      0.28 3.9   0.0078 0.0022  0.28
letter.34      0.79      0.79    0.78      0.27 3.8   0.0081 0.0020  0.27
letter.58      0.80      0.80    0.79      0.28 3.9   0.0077 0.0024  0.28
matrix.45      0.80      0.80    0.79      0.29 4.1   0.0076 0.0021  0.29
matrix.46      0.80      0.80    0.79      0.29 4.0   0.0076 0.0023  0.29
matrix.47      0.80      0.80    0.79      0.28 4.0   0.0077 0.0027  0.28

 Item statistics 
             n raw.r std.r r.cor r.drop mean   sd
reason.4  1523  0.61  0.61  0.55   0.50 0.64 0.48
reason.16 1524  0.56  0.57  0.50   0.45 0.70 0.46
reason.17 1523  0.62  0.62  0.57   0.51 0.70 0.46
reason.19 1523  0.59  0.59  0.53   0.47 0.62 0.49
letter.7  1524  0.63  0.63  0.58   0.52 0.60 0.49
letter.33 1523  0.59  0.59  0.53   0.47 0.57 0.50
letter.34 1523  0.64  0.64  0.60   0.54 0.61 0.49
letter.58 1525  0.58  0.57  0.51   0.46 0.44 0.50
matrix.45 1523  0.54  0.53  0.46   0.41 0.53 0.50
matrix.46 1524  0.55  0.55  0.47   0.42 0.55 0.50
matrix.47 1523  0.57  0.57  0.50   0.45 0.61 0.49

Non missing response frequency for each item
             0    1 miss
reason.4  0.36 0.64    0
reason.16 0.30 0.70    0
reason.17 0.30 0.70    0
reason.19 0.38 0.62    0
letter.7  0.40 0.60    0
letter.33 0.43 0.57    0
letter.34 0.39 0.61    0
letter.58 0.56 0.44    0
matrix.45 0.47 0.53    0
matrix.46 0.45 0.55    0
matrix.47 0.39 0.61    0

The output shows that the reliability is \(\alpha = .81\), suggesting that the SAPA items have high internal consistency. In the “Reliability if an item is dropped” section, we see that removing any of the SAPA items does not necessarily improve the coefficient alpha, suggesting that all the items are essential. Lastly, in the “Item statistics” section, we can see that point-biserial correlations under the r.drop and r.cor columns are quite high (i.e., \(> .20\)), indicating that all the SAPA items can distinguish low- and high-achieving students adequately.


Item calibration

In this section, we will see how to estimate different types of dichotomous IRT models using the sapa_clean dataset. This process is known as “item calibration”. We will calibrate the SAPA items using the following IRT models:

Model Description Parameters
Rasch Rasch Model b
1PL One-Parameter Logistic Model b, a (same for all items)
2PL Two-Parameter Logistic Model b, a (unique to each item)
3PL Three-Parameter Logistic Model b, a (unique to each item), c


By clicking on each of the following tabs, you can see the estimation of item parameters for the four IRT models listed above.


Rasch Model

In the Rasch model, the probability of answering item \(i\) correctly for examinee \(j\) with ability \(\theta_j\) (i.e., \(P(X_{ij} = 1)\)) can be written as follows:

\[P(X_{ij} = 1) = \frac{e^{(\theta_j - b_i)}}{1 + e^{(\theta_j - b_i)}}\]

where \(b_i\) is the item difficulty level of item \(i\). Using the Rasch model, we can estimate a unique difficulty parameter for each item and an ability parameter for each examinee.

Now, let’s calibrate the items in the sapa_clean dataset using the Rasch model.

# Estimate the item parameters
model_rasch <- mirt::mirt(data = sapa_clean, # data with only item responses
                          model = 1, # 1 refers to the unidimensional IRT model
                          itemtype = "Rasch") # IRT model we want to use for item calibration

Next, we will extract the estimated item parameters using the coef() function and print them. Normally, we would apply the following code to extract the parameters easily:

# Extract the item parameters
param_rasch <- coef(model_rasch, IRTpars = TRUE, simplify = TRUE)

# Store only the item parameters and discard other info
param_rasch <- as.data.frame(param_rasch$items)

# Print the item parameters
param_rasch

If the code above does not work for you (which sometimes happens when the package is updated), you can try the following code to extract the parameters using the default parameterization and then transform them to the regular IRT parameterization (otherwise, the default option provides a slope-intercept parameterization where the difficulty parameter represents easiness instead of difficulty). For the remaining IRT models, I will use the following method but you are welcome to adapt the code shown above to extract the parameters more easily via IRTpars = TRUE.

# Extract the item parameters
param_rasch <- as.data.frame(coef(model_rasch, simplify = TRUE)$items)

# By default, mirt produces a different IRT parameterization where
# difficulty is represented as easiness. To see the regular IRT
# parameterization, we will apply a small transformation.
param_rasch$d <- (param_rasch$d*-1)/param_rasch$a1

# Rename the columns
colnames(param_rasch) <- c("a", "b", "g", "u")

# Print the item parameters
param_rasch
          a       b g u
reason.4  1 -0.8037 0 1
reason.16 1 -1.1663 0 1
reason.17 1 -1.1607 0 1
reason.19 1 -0.6549 0 1
letter.7  1 -0.5632 0 1
letter.33 1 -0.3996 0 1
letter.34 1 -0.6433 0 1
letter.58 1  0.3202 0 1
matrix.45 1 -0.1426 0 1
matrix.46 1 -0.2769 0 1
matrix.47 1 -0.6472 0 1

In the output, we see that the a parameter is fixed to “1” for all items, the b parameters are uniquely estimated for each item, and the c parameter (shown as “g” as guessing) is fixed to zero for all items. The “u” column indicates the upper asymptote representing the maximum value of the probability of success (this parameter only matters for the 4-parameter logistic (4PL) model and yes, there is indeed a 4PL model… 😩).


1PL Model

In the one-parameter logistic (1PL) model, the probability of answering item \(i\) correctly for examinee \(j\) with ability \(\theta_j\) can be written as follows:

\[P(X_{ij} = 1) = \frac{e^{a(\theta_j - b_i)}}{1 + e^{a(\theta_j - b_i)}}\]

where \(b_i\) is the item difficulty level of item \(i\) and \(a\) is the item discrimination parameter of item \(i\) (though it is not unique to item \(i\)). Using the 1PL model, we can estimate a unique difficulty parameter for each item, a discrimination parameter fixed across all the items, and an ability parameter for each examinee.

Now, we will calibrate the items in the sapa_clean dataset using the 1PL model. Estimating the 1PL model requires a special setup in the mirt package. We will use the two-parameter logistic (2PL) model but constrain the discrimination parameters to be fixed across all the items, which will give us the 1PL model (i.e., unique difficulty parameters but a fixed discrimination). Instead of using model = 1 in the mirt() function, we will define a model where we indicate the latent variable (called “F” in the following example), how many items are defining this model (i.e., F = 1-11), and which parameters are to be constrained. We use CONSTRAIN = (1-11, a1) to estimate a single discrimination parameter (called “a1” in mirt) for all of the 11 items in the dataset.

# Define the 1PL model explicitly
model <- "F = 1-11
          CONSTRAIN = (1-11, a1)"

# Estimate the item parameters
model_1PL <- mirt::mirt(data = sapa_clean, # data with only item responses
                        model = model, # our special model for 1PL
                        itemtype = "2PL") # IRT model we want to use for item calibration

Next, we will extract the estimated item parameters using the coef() function and print them.

# Extract the item parameters
param_1PL <- as.data.frame(coef(model_1PL, simplify = TRUE)$items)

# Transform the item parameters to regular IRT parameterization
param_1PL$d <- (param_1PL$d*-1)/param_1PL$a1

# Rename the columns
colnames(param_1PL) <- c("a", "b", "g", "u")

# Print the item parameters
param_1PL
              a        b g u
reason.4  1.478 -0.54366 0 1
reason.16 1.478 -0.78898 0 1
reason.17 1.478 -0.78518 0 1
reason.19 1.478 -0.44302 0 1
letter.7  1.478 -0.38098 0 1
letter.33 1.478 -0.27032 0 1
letter.34 1.478 -0.43517 0 1
letter.58 1.478  0.21671 0 1
matrix.45 1.478 -0.09643 0 1
matrix.46 1.478 -0.18731 0 1
matrix.47 1.478 -0.43778 0 1

In the output, we see that the a parameter is estimated but it is fixed to “1.478” for all items, the b parameters are uniquely estimated for each item, the c parameter (i.e., g) is fixed to zero, and the upper asymptote (i.e., “u”) is fixed to 1 for all items. If we compare the difficulty parameters from Rasch and 1PL, we can see that they are not necessarily the same. Estimating the discrimination parameter (instead of assuming \(a = 1\)) has changed the parameter estimates in the 1PL model. However, they are perfectly correlated.

# Combine the difficulty parameters
b_pars <- data.frame(rasch = param_rasch$b,
                     onePL = param_1PL$b)

# Print the difficulty parameters
b_pars
     rasch    onePL
1  -0.8037 -0.54366
2  -1.1663 -0.78898
3  -1.1607 -0.78518
4  -0.6549 -0.44302
5  -0.5632 -0.38098
6  -0.3996 -0.27032
7  -0.6433 -0.43517
8   0.3202  0.21671
9  -0.1426 -0.09643
10 -0.2769 -0.18731
11 -0.6472 -0.43778
# Are they correlated?
cor(b_pars$rasch, b_pars$onePL)
[1] 1
# Let's also plot them -- see that they are perfectly aligned on a diagonal line
plot(b_pars$rasch, b_pars$onePL, 
     xlab = "Difficulty (Rasch)", 
     ylab = "Difficulty (1PL)",
     main = "Rasch vs. 1PL Difficulty Parameters")


2PL Model

In the two-parameter logistic (2PL) model, the probability of answering item \(i\) correctly for examinee \(j\) with ability \(\theta_j\) can be written as follows:

\[P(X_{ij} = 1) = \frac{e^{a_i(\theta_j - b_i)}}{1 + e^{a_i(\theta_j - b_i)}}\]

where \(b_i\) is the item difficulty level of item \(i\) and \(a_i\) is the item discrimination parameter of item \(i\). Using the 1PL model, we can estimate a unique difficulty parameter and a unique discrimination parameter for each item, and an ability parameter for each examinee.

Let’s calibrate the items in the sapa_clean dataset using the 2PL model. Fortunately, estimating the 2PL model does not require any special set up. We will simply define itemtype as “2PL” and estimate the parameters as we have done for the Rasch model.

# Estimate the item parameters
model_2PL <- mirt::mirt(data = sapa_clean, # data with only item responses
                        model = 1, # 1 refers to the unidimensional IRT model
                        itemtype = "2PL") # IRT model we want to use for item calibration

Next, we will extract the estimated item parameters using the coef() function and print them.

# Extract the item parameters
param_2PL <- as.data.frame(coef(model_2PL, simplify = TRUE)$items)

# Transform the item parameters to regular IRT parameterization
param_2PL$d <- (param_2PL$d*-1)/param_2PL$a1

# Rename the columns
colnames(param_2PL) <- c("a", "b", "g", "u")

# Print the item parameters
param_2PL
              a       b g u
reason.4  1.617 -0.5207 0 1
reason.16 1.439 -0.8024 0 1
reason.17 1.816 -0.7110 0 1
reason.19 1.476 -0.4453 0 1
letter.7  1.769 -0.3502 0 1
letter.33 1.458 -0.2744 0 1
letter.34 1.874 -0.3897 0 1
letter.58 1.457  0.2164 0 1
matrix.45 1.112 -0.1171 0 1
matrix.46 1.199 -0.2142 0 1
matrix.47 1.344 -0.4638 0 1

In the output, we see that the a and b parameters are uniquely estimated for each item, the c parameter (i.e., g) is fixed to zero, and the upper asymptote (i.e., “u”) is fixed to 1 for all items. The discrimination parameters seem to vary across the items (e.g., \(a = 1.816\) for reason.17 and \(a = 1.112\) for matrix.45). Let’s see whether the difficulty parameters from the 1PL and 2PL models are similar:

# Combine the difficulty parameters
b_pars <- data.frame(onePL = param_1PL$b,
                     twoPL = param_2PL$b)

# Print the difficulty parameters
b_pars
      onePL   twoPL
1  -0.54366 -0.5207
2  -0.78898 -0.8024
3  -0.78518 -0.7110
4  -0.44302 -0.4453
5  -0.38098 -0.3502
6  -0.27032 -0.2744
7  -0.43517 -0.3897
8   0.21671  0.2164
9  -0.09643 -0.1171
10 -0.18731 -0.2142
11 -0.43778 -0.4638
# Are they correlated? Yes, they are!
cor(b_pars$onePL, b_pars$twoPL)
[1] 0.9945


3PL Model

In the three-parameter logistic (3PL) model, the probability of answering item \(i\) correctly for examinee \(j\) with ability \(\theta_j\) can be written as follows:

\[P(X_{ij} = 1) = c_i + (1 - c_i)\frac{e^{a_i(\theta_j - b_i)}}{1 + e^{a_i(\theta_j - b_i)}}\]

where \(b_i\) is the item difficulty level of item \(i\), \(a_i\) is the item discrimination parameter of item \(i\), and \(c_i\) is the guessing parameter (i.e., lower asymptote) of item \(i\). Using the 3PL model, we can estimate unique difficulty, discrimination, and guessing parameters for each item, and an ability parameter for each examinee.

Let’s calibrate the items in the sapa_clean dataset using the 3PL model. This time, we will define itemtype as “3PL” and then estimate the parameters.

# Estimate the item parameters
model_3PL <- mirt::mirt(data = sapa_clean, # data with only item responses
                        model = 1, # 1 refers to the unidimensional IRT model
                        itemtype = "3PL") # IRT model we want to use for item calibration

Next, we will extract the estimated item parameters using the coef() function and print them.

# Extract the item parameters
param_3PL <- as.data.frame(coef(model_3PL, simplify = TRUE)$items)

# Transform the item parameters to regular IRT parameterization
param_3PL$d <- (param_3PL$d*-1)/param_3PL$a1

# Rename the columns
colnames(param_3PL) <- c("a", "b", "g", "u")

# Print the item parameters
param_3PL
              a       b         g u
reason.4  1.885 -0.3414 0.0980052 1
reason.16 1.448 -0.7941 0.0018842 1
reason.17 1.822 -0.7033 0.0022765 1
reason.19 1.475 -0.4415 0.0004529 1
letter.7  1.883 -0.2785 0.0397055 1
letter.33 1.466 -0.2618 0.0050298 1
letter.34 1.867 -0.3841 0.0013213 1
letter.58 1.455  0.2218 0.0011979 1
matrix.45 1.123 -0.1108 0.0015351 1
matrix.46 1.582  0.1072 0.1475559 1
matrix.47 1.357 -0.4521 0.0037671 1

In the output, we see that the a, b, and c parameters are uniquely estimated for each item, and the upper asymptote (i.e., “u”) is fixed to 1 for all items. Overall, most SAPA items have very low guessing parameters. The largest guessing parameter belongs to matrix.46 (\(c = 0.147\)).


Visualizing IRT models

Once we calibrate the items using a particular IRT model, we can also check the response functions for each item, as well as for the entire instrument, visually. In the following section, we will create item- and test-level visualizations for the SAPA items using the item parameters we have obtained from the IRT models.

Let’s begin with the item characteristic curves (ICC). We will use itemplot() to see the ICC for a specific item (i.e., item = 10 for creating an ICC for item 10). We use type = "trace" because ICCs are also known as tracelines. In the ICC, the x-axis shows the latent trait (i.e., \(\theta\)) and the y-axis shows the probability of answering the item correctly (ranging from 0 to 1).

# Item 10 - 1PL model
mirt::itemplot(model_1PL, 
               item = 10, # which item to plot
               type = "trace") # traceline (i.e., ICC)

If we want to plot ICCs for more than one item, then we can the plot() function. We need to specify for which items we want to create an ICC using the which.items argument:

# ICCs for items 1, 3, and 5 in the 2PL model
plot(model_2PL, 
     type = "trace", 
     which.items = c(1, 3, 5)) # items to be plotted (i.e., their positions in the data)

Also, we can plot the ICCs for all the items together using the same plot() function. Deleting the which.items argument will return the ICCs for all the items in a single plot. Now, let’s see the ICCs for all SAPA items using the 3PL model:

# All ICCs in 3PL model
plot(model_3PL, type = "trace")

Next, we will use the plot() function and type = "score" to obtain test characteristic curves or TCC (i.e., sum of the individual ICCs). In the TCC, the x-axis shows the latent trait (i.e., \(\theta\)) and the y-axis shows the expected (raw) score based on the entire instrument (ranging from 0 to the maximum raw score possible on the instrument).

plot(model_2PL, type = "score")

One of the most important concepts in IRT is “item information”, which refers to the level of information a particular item can provide at each level of the latent trait. The higher the information, the better measurement accuracy. We can visualize the information level of an item using the item information function (IIF). Another related concept is the standard error (also known as the conditional standard error of measurement; cSEM), which can be calculated as the reciprocal of the square root of the amount of item information:

\[SE_i(\theta) = \frac{1}{\sqrt{(I_i(\theta)})},\] where \(I_i(\theta)\) is the information of item \(i\) at the latent trait \(\theta\) and \(SE_i(\theta)\) is the standard error of item \(i\) at the latent trait \(\theta\).

In the following example, we will plot the IIF and SE together for item 1 based on the 2PL model. We will use the itemplot() function with the type = "infoSE" argument. Alternatively, we could use type = "info" or type = "SE" to plot the IIF and SE separately. In the following plot, the x-axis shows the latent trait (i.e., \(\theta\)), the y-axis on the left shows item information, and the y-axis on the right shows the standard error.

mirt::itemplot(model_2PL, # IRT model that stores the item information
               item = 1, # which item to plot
               type = "infoSE") # plot type

We can also view the IIFs for all the items in a single plot using the plot() function with the type = "infotrace" argument:

plot(model_rasch, type = "infotrace")

In addition to IIF and SE at the item level, we can also plot the information and SE values at the test level. That is, we can see the test information function (TIF) and the SE across all the items. In the following example, we will use the plot() function with the type = "infoSE" argument to produce a single plot of TIF and SE (we can also use either “info” or “SE” to plot them separately). In the plot, the x-axis shows the latent trait (i.e., \(\theta\)), the y-axis on the left shows the test information, and the y-axis on the right shows the total standard error for the SAPA items.

# TIF and SE for the 3PL model
plot(model_3PL, type = "infoSE")

Another useful visualization for IRT models is the item fit plot. This visualization allows us to inspect the fit of each item to the selected IRT model by plotting the empirical values (i.e., observed responses) against the predicted values (i.e., responses expected by the selected IRT model). Using this plot, we can find poorly fitting items. We will use the itemfit() function and empirical.plot to indicate which item we want to plot. The following plots show that the observed values are more aligned with the ICC produced by the 3PL model compared with the ICC produced by the 1PL model, suggesting that item 1 fits the 3PL model better.

# Item fit plot for item 1 in the 1PL model
mirt::itemfit(model_1PL, empirical.plot = 1)

# Item fit plot for item 1 in the 3PL model
mirt::itemfit(model_3PL, empirical.plot = 1)

The final visualization tool that we will use is called item-person map (also known as the Wright map in the IRT literature). An item-person map displays the location of item difficulty parameters and ability parameters (i.e., latent trait estimates) along the same logit scale. Item-person maps are very useful for comparing the spread of the item difficulty parameters against the distribution of the ability parameters. The item difficulty parameters are ideally expected to cover the whole logit scale to measure all the ability parameters accurately.

In the following example, we will use the ggWrightMap() function from the ShinyItemAnalysis package (Martinkova et al., 2022) to draw an item-person map for the Rasch model. This function requires the ability parameters (i.e., latent trait estimates) and item difficulty parameters from the Rasch model. We will use fscores() from mirt to estimate ability parameters (see the next section for more details on ability estimation). The following item-person map shows that the SAPA items mostly cover the middle part of the ability (i.e, latent trait) distribution. There are not enough items to cover high-achieving examinees (i.e., \(\theta > 1\)) and low-achieving examinees (i.e., \(\theta < -2\)). The SAPA instrument needs more items (some difficult and some easy) to increase measurement accuracy for those examinee groups.

# Estimate the ability parameters for Rasch and save as a vector
theta_rasch <- as.vector(mirt::fscores(model_rasch))

# Use the difficulty and ability parameters to create an item-person map
ShinyItemAnalysis::ggWrightMap(theta = theta_rasch,
                               b = param_rasch$b,
                               item.names = colnames(sapa_clean), # item names (optional)
                               color = "lightblue", # color of ability distribution (optional)
                               ylab.theta = "Latent Trait", # label for ability distribution (optional)
                               ylab.b = "Item Difficulty") # label for item difficulty (optional)


Ability estimation

Since we already calibrated the SAPA items using the IRT models, we can go ahead and estimate ability (i.e., latent trait) parameters for the examinees using the fscores() function. We will use the expected a priori (EAP) method to estimate the ability parameters. Alternatively, we can use method = "ML" (i.e., the maximum likelihood estimation) but this may not return a valid ability estimate for examinees who answered all of the items correctly or incorrectly. In the following example, we will estimate the ability parameters based on the 3PL model.

# Estimate ability parameters
theta_3PL <- mirt::fscores(model_3PL, # estimated IRT model
                           method = "EAP", # estimation method
                           full.scores.SE = TRUE) # return the standard errors

# See the estimated ability parameters
head(theta_3PL)
          F1  SE_F1
[1,] -1.4855 0.4943
[2,] -0.7608 0.4020
[3,] -0.4840 0.3973
[4,] -1.1385 0.4685
[5,] -0.5290 0.3922
[6,]  1.4418 0.6210
# See the distribution of the estimated ability parameters
# (i.e., first column) in theta_3PL
hist(theta_3PL[, 1], 
     xlab = "Theta", # label for the x axis
     main = "Ability Distribution") # title for the plot

Reliability

For IRT models, item and test information are the main criteria for determining the accuracy of the measurement process. However, it is also possible to calculate a CTT-like reliability index to evaluate the overall measurement accuracy for IRT models. The IRT test reliability coefficient (\(\rho_{xx'}\)) can be defined as the ratio of the true score variance (i.e., variance of ability estimates) to the observed score variance (the sum of the variance of ability estimates and the average of squared standard errors):

\[\rho_{xx'} = \frac{Var(\hat{\theta})}{Var(\hat{\theta}) + SE(\hat{\theta})^2}\]

This is known as “empirical reliability”. In the following example, we will use empirical_rxx() function to calculate the marginal reliability coefficient for the 3PL model.

# Empirical reliability
mirt::empirical_rxx(theta_3PL)
    F1 
0.7825 


What if SAPA items were polytomous?

In this example, we used the SAPA items which were scored dichotomously (i.e., 1 for correct and 0 for incorrect). If the SAPA items were polytomous, how could we estimate the IRT parameters? The following R codes show how the SAPA items could be calibrated using the Partial Credit Model and the Graded Response Model via the mirt() function. As we have done for the Rasch model, we can use itemtype = "Rasch" to calibrate the items based on the Partial Credit Model because this model is the polytomous extension of the Rasch model. For the Graded Response Model, we need itemtype = "graded". The R codes that we have used above for extracting item parameters, visualizing the response functions, and estimating ability parameters for the dichotomous IRT models can also used for the Partial Credit and Graded Response models.

# Partial Credit Model (PCM)
model_pcm <- mirt::mirt(data = sapa_clean,
                        model = 1, 
                        itemtype = "Rasch")


# Graded Response Model (GRM)
model_grm <- mirt::mirt(data = sapa_clean,
                        model = 1, 
                        itemtype = "graded")


References

Chalmers, P. (2023). Mirt: Multidimensional item response theory. https://CRAN.R-project.org/package=mirt
Choi, Y.-J., & Asilkalkan, A. (2019). R packages for item response theory analysis: Descriptions and features. Measurement: Interdisciplinary Research and Perspectives, 17(3), 168–175. https://doi.org/10.1080/15366367.2019.1586404
Condon, D. M., & Revelle, W. (2014). The international cognitive ability resource: Development and initial validation of a public-domain measure. Intelligence, 43, 52–64.
Cui, B. (2020). DataExplorer: Automate data exploration and treatment. http://boxuancui.github.io/DataExplorer/
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55.
Kassambara, A. (2022). Ggcorrplot: Visualization of a correlation matrix using ggplot2. http://www.sthda.com/english/wiki/ggcorrplot-visualization-of-a-correlation-matrix-using-ggplot2
Mair, P., Hatzinger, R., & Maier, M. J. (2020). eRm: Extended Rasch Modeling. https://cran.r-project.org/package=eRm
Martinkova, P., Hladka, A., & Netik, J. (2022). ShinyItemAnalysis: Test and item analysis via shiny. https://CRAN.R-project.org/package=ShinyItemAnalysis
Partchev, I., & Maris, G. (2017). Irtoys: A collection of functions related to item response theory (IRT). https://CRAN.R-project.org/package=irtoys
Revelle, W. (2022). Psych: Procedures for psychological, psychometric, and personality research. https://personality-project.org/r/psych/ https://personality-project.org/r/psych-manual.pdf
Revelle, W., Wilt, J., & Rosenthal, A. (2010). Individual differences in cognition: New methods for examining the personality-cognition link. In Handbook of individual differences in cognition (pp. 27–49). Springer.
Rizopoulos, D. (2006). Ltm: An r package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25. http://www.jstatsoft.org/v17/i05/
Robitzsch, A., Kiefer, T., & Wu, M. (2021). TAM: Test analysis modules. https://CRAN.R-project.org/package=TAM
Rosseel, Y., Jorgensen, T. D., & Rockwood, N. (2023). Lavaan: Latent variable analysis. https://lavaan.ugent.be