Exploring, Visualizing, and Modeling Big Data in R
1
Preface
1.1
Summary
1.2
Who we are
2
Introduction
2.1
What is big data?
2.2
Why is big data important?
2.3
How do we analyze big data?
2.4
Additional resources
2.5
PISA dataset
3
Exploratory data analysis
3.1
What is exploratory data analysis?
3.2
Confirmatory data analysis
3.3
A framework for EDA
3.4
EDA tools
4
Wrangling big data
4.1
What is
data.table
?
4.1.1
Why use
data.table
over
tidyverse
?
4.2
Reading/writing data with
data.table
4.2.1
Exercises
4.3
Using the i in
data.table
4.3.1
Exercises
4.4
Using the j in
data.table
4.4.1
Exercises
4.5
Summarizing using the by in
data.table
4.5.1
Exercises
4.6
Reshaping data
4.7
The
sparklyr
package
4.8
Lab
5
Visualizing big data
5.1
Introduction to
ggplot2
5.2
Marginal plots
5.2.1
Exercise
5.3
Conditional plots
5.3.1
Exercise
5.4
Plots for examining correlations
5.5
Plots for examining means by group
5.6
Plots for ordinal/categorical variables
5.6.1
Exercise
5.7
Interactive plots with
plotly
5.7.1
Exercise
5.8
Customizing visualizations
5.9
Lab
6
Modeling big data
6.1
Introduction to machine learning
6.1.1
Focus of machine learning
6.1.2
Some concepts underlying machine learning
6.1.3
Model development
6.1.4
Model evaluation
6.1.5
Key issues
6.2
Types of machine learning
7
Supervised Machine Learning - Part I
7.1
Decision Trees
7.1.1
Regression trees
7.1.2
Classification trees
7.1.3
Pruning decision trees
7.2
Decision trees in R
7.2.1
Cross-validation
7.3
Random Forests
7.4
Random forests in R
8
Supervised Machine Learning - Part II
8.1
Support Vector Machines
8.1.1
Maximal Margin Classifier
8.1.2
Support Vector Classifier
8.1.3
Support Vector Machine
8.1.4
Lab
9
Unsupervised machine learning
9.1
Clustering
9.2
Distance Measures
9.3
K-means clustering
9.4
K-means clustering in R
10
Summary
10.1
Topics covered
10.1.1
Exploratory data analysis
10.1.2
Supervised learning
10.2
Methods we didn’t cover
References
All rights reserved
Exploring, Visualizing, and Modeling Big Data with R
References