2 Designing Simulations

To better explain the steps of conducting a Monte Carlo simulation study, let’s assume a hypothetical research scenario in which a researcher wants to examine item parameter estimation in item response theory (IRT). Here the researcher aims to determine the robustness of parameter estimation for the 3PL model, especially when the sample size is small.

2.1 Simulation Factors

The researcher can investigate the impact of several factors (i.e., conditions) within the same study but the goal is to create a feasible study with the factors that are essential for this study. Therefore, the following factors are selected:

Test length (10, 15, 20, or 25 items)
Sample size (250, 500, 750, or 1000 examinees)
Whether or not the guessing (c) parameter should be fixed for all items (e.g., \(c = 0.16\))

Let’s see the total number of conditions from these factors:

4 (test length) x 4 (sample size) x 2 (fixed guessing or free estimation) = 32 conditions

This calculation assumes that these factors are fully crossed. Two factors are fully crossed when each level of one factor occurs in combination with each level of the other factor. However, some factors may not be crossed with the other factors. These are called nested factors. Two factors are nested when each level of a factor occurs in combination with different levels of another factor (see Schielzeth and Nakagawa’s paper for more information about crossed and nested factors).

2.2 Evaluation Criteria

The researcher is interested in accuracy. Therefore, we have to use several measures of accuracy to evaluate the simulation results. Some of these measures (i.e., indices) include:

Bias: \(bias = \frac{1}{R} \Sigma_{r=1}^{R} (\hat{x_r}-x)\), where \(R\) is the number of replications, \(\hat{x_r}\) is the estimated value of the parameter, and \(x\) is the true value of the parameter. Bias, which can be either positive or negative, should be close to zero for higher accuracy.
Root-mean square error (RMSE): \(RMSE = \sqrt{\frac{1}{R} \Sigma_{r=1}^{R} (\hat{x_r}-x)^2}\), where \(R\) is the number of replications, \(\hat{x_r}\) is the estimated value of the parameter, and \(x\) is the true value of the parameter. RMSE, which is either zero or a positive value, should be close to zero for higher accuracy.
(Pearson) Correlation: \(\rho(x,\hat{x}) = \frac{\text{cov}(x,\hat{x})}{\sigma_x \sigma_{\hat{x}}}\), where \(\hat{x_r}\) is the estimated value of the parameter, \(x\) is the true value of the parameter, the numerator is the covariance of the two parameters, and the denominator is the product of their standard deviations. Correlation should be closer to 1 as the accuracy increases.

Note that there are other types of evaluation criteria, such as power, type I error, relative efficiency, precision, and recall. In this study, the accuracy measures listed above should be adequate for evaluating the accuracy of estimated item paramaters. We will discuss how the simulation results with these evaluation criteria can be summarized and presented in Part 4 of this book.

2.3 Other Design Elements

Item and ability parameters: The researcher must use either a fixed set of item parameters from an existing instrument or simulated item parameters similar to those from an existing instrument. Thus, the researcher uses the distributions provided in Casabianca and Lewis (2015)’s article based on a 2008 National Mathematics Assessment:

\(a \sim N(1.13, 0.25)\)
\(b \sim N(0.21, 0.51)\)
\(c \sim N(0.16, 0.05)\)

For the ability distribution, ability values are drawn from a normal distribution, \(\theta \sim N(0, 1)\).

Number of replications: The number of replications should be adequate to create enough variation in the simulation. The higher the number of replications, the longer the simulation study will take. Therefore, it is important to choose a suitable number. Typically, 100 replications are enough for this type of simulation study. We can test this by increasing the number of iterations to 150 and check whether the overall results would change substantially.

Replication mechanism: In a typical Monte Carlo simulation study, the levels of simulation factors remain fixed, while the data, parameters, and other parts vary. In other words, these are the parts that we actually simulate. For the sake of simplicity, some of these parts can be generated once while the other parts continuously change from one replication to another. In this study, the researcher does not have a specific set of item parameters to test. Instead, the goal of the study is more general: investigating the impact of the identified simulation factors on item parameter estimation. Therefore, unique sets of item parameters, ability values, and response data can be generated with each replication.

References

Casabianca, Jodi M., and Charles Lewis. 2015. “IRT Item Parameter Recovery with Marginal Maximum Likelihood Estimation Using Loglinear Smoothing Models.” Journal of Educational and Behavioral Statistics 40 (6): 547–78. https://doi.org/10.3102/1076998615606112.