Conducting Monte Carlo Simulations in R
University of Alberta, bulut@ualberta.ca
1 Introduction
1.1 Overview
Both researchers and practitioners often use Monte Carlo simulations to answer a variety of research questions. Over the past decade, R
(R Core Team 2019) has been one of the most popular programming languages for conducting Monte Carlo simulation studies. R
(https://www.r-project.org/) is a free, open-source programming language for statistical computing and data visualization. Both built-in functions and many user-created packages in R
allow researchers and practitioners to design and implement a very simple to very comprehensive simulation studies.
This short book will explain the major steps in conducting Monte Carlo simulations using R
. Here is the outline of the book1:
Part | Description |
---|---|
1 | Introduction |
Why Simulations? | |
Typical Simulation Scenarios | |
Additional Resources | |
2 | Designing Simulations |
Simulation Factors | |
Evaluation Criteria | |
Other Design Elements | |
3 | Running Simulation |
Custom Functions | |
Debugging the Code | |
Putting the Functions Together | |
Benchmarking | |
4 | Summarizing Simulation Results |
Tables and Figures | |
Exporting the Results |
1.2 Why Simulations?
There are many reasons to conduct Monte Carlo simulations. Researchers and practitioners often choose to simulate data instead of collecting empirical data because:
- it is impractical and costly to collect empirical data while manipulating several conditions
- it is not possible to investigate the real impact of the study conditions without knowing the characteristics of the target population as well as the variables of interest.
- it is more difficult to deal with empirical data because it typically includes missingness – which may be in large amounts and nonrandom.
1.3 Typical Simulation Scenarios
We can use Monte Carlo simulations to answer various research questions. Typical research questions in which Monte Carlo simulations can be useful are:
- Does a particular type of estimation (e.g., maximum likelihood) yield accurate results?
- What is the level of bias?
- What is the standard error of estimates?
- What conditions would affect the accuracy of the estimation?
- Does the estimation remain robust when assumptions are violated?
- Which estimation method (e.g., maximum likelihood, EAP, and MAP) is more accurate?
- Do the performances of these methods vary by different conditions?
- Which estimator, method, or model is the most robust?
- Can a statistical method or model (e.g., logistic regression) successfully detect a value of interest (e.g., differential item functioning)?
- How accurate is the method when the null hypothesis is false?
- How accurate is the method when the null hypothesis is true?
1.4 Additional Resources
If you are interested in learning more about Monte Carlo simulations, there are many online resources available. Some of these resources include:
- Bulut and Sunbul (2017)’s article: Monte Carlo Simulation Studies in Item Response Theory with the R Programming Language
- Hallgren (2013)’s article: Conducting Simulation Studies in the R Programming Environment
- Roger Peng’s online book: R Programming for Data Science. In the book, Chapter 20 specifically focuses on simulations in R.
- The
SimDesign
(Chalmers 2020) package inR
. For examples ofSimDesign
, you can check out its Wiki page: https://github.com/philchalmers/SimDesign/wiki - There is also another
R
package calledMonteCarlo
(Leschinski 2019) for more general simulation studies.
References
Bulut, O., and O. Sunbul. 2017. “Monte Carlo Simulation Studies in Item Response Theory with the R Programming Language.” Journal of Measurement and Evaluation in Education and Psychology 8 (3): 266–87. https://doi.org/doi: 10.21031/epod.30582.
Chalmers, Phil. 2020. SimDesign: Structure for Organizing Monte Carlo Simulation Designs. https://CRAN.R-project.org/package=SimDesign.
Hallgren, K. A. 2013. “Conducting Simulation Studies in the R Programming Environment.” Tutorials in Quantitative Methods for Psychology 9 (2): 43–60. https://doi.org/10.20982/tqmp.09.2.p043.
Leschinski, Christian Hendrik. 2019. MonteCarlo: Automatic Parallelized Monte Carlo Simulations. https://CRAN.R-project.org/package=MonteCarlo.
R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.