10 Summary

10.1 Topics covered

10.1.1 Exploratory data analysis

  1. Data Wrangling
    • Subsetting, creating variables, reshaping, and summarizing
    • data.table
    • dplyr
    • sparklyr and Apache Spark
  2. Data Visualization
    • Static visualizations
    • ggplot2
    • ggplot2 add-ons GGally, ggExtra, and ggalluvial
    • cowplot as an additional ggplot2 theme
    • Interactive visualizations
    • plotly

10.1.2 Supervised learning

  1. Decision trees
    • Classification and regression trees
    • rpart
    • rpart.plot
  2. Random forests
    • Random forests for classification and regression
    • randomForest
  3. Model building and evaluation
    • modelr
    • caret
  4. Support Vector Machines
    • Maximal margin classifers
    • Support vector classifiers
    • Support vector machines with polynomial and radial kernels
    • Logistic regression
    • Tuning and evaluating the models
    • e1071
    • sparklyr

10.2 Methods we didn’t cover

  1. Regression
    • Penalized regression
      • ridge, lasso, elastic net
      • glmnet
    • Principal components and partial least squares (a supervised version of PC) regression
      • pls
    • Non-linear regression
      • Polynomials, splines (smoothing splines), generalized additive models
      • splines
      • gam
  2. K-nearest-neighbors (KNN)
    • caret
    • class
  3. Unsupervised learning
    • K-means clustering
    • Hierarchical clustering
    • cluster
    • factoextra