Summary
Topics covered
Exploratory data analysis
- Data Wrangling
- Subsetting, creating variables, reshaping, and summarizing
data.table
dplyr
sparklyr
and Apache Spark
- Data Visualization
- Static visualizations
ggplot2
ggplot2
add-ons GGally
, ggExtra
, and ggalluvial
cowplot
as an additional ggplot2
theme
- Interactive visualizations
plotly
Supervised learning
- Decision trees
- Classification and regression trees
rpart
rpart.plot
- Random forests
- Random forests for classification and regression
randomForest
- Model building and evaluation
- Support Vector Machines
- Maximal margin classifers
- Support vector classifiers
- Support vector machines with polynomial and radial kernels
- Logistic regression
- Tuning and evaluating the models
e1071
sparklyr
Methods we didn’t cover
- Regression
- Penalized regression
- ridge, lasso, elastic net
glmnet
- Principal components and partial least squares (a supervised version of PC) regression
- Non-linear regression
- Polynomials, splines (smoothing splines), generalized additive models
splines
gam
- K-nearest-neighbors (KNN)
- Unsupervised learning
- K-means clustering
- Hierarchical clustering
cluster
factoextra