Machine Learning Random Forests and Boosting

November 18, 2017 | by chad.salinas@gmail.com | 0 Comments |

0 Likes

Key Ideas

In developing a machine learning prediction model, we need to ask questions of each of the variables. We need to determine what measurable quantitative value qualifies a variable for inclusion in one group versus another. So, leveraging machine learning in the Obama-Clinton election, we look at the variable that captures the demographics of a county. Our decision rule for splitting the variable into groups could be that if a county is greater than 20% African-American, then we subdivide into two groups. We advance further through our variables by asking whether the high-school graduation of a county was higher than 78%. We divide that variable into two subgroups. This process continues until we exhaust all of our predictive variables.

Algorithm to Build Decision Tree

Start with one root group with all the variables
Identify the variables that best splits the desired outcome
Create the next level of the binary tree with two leaf nodes
Within each of the two new nodes recursively split on the variables as in #2 above
Continue until we reach the base case of sufficiently small and pure (read homogeneous) groups to predict the outcome

Measures of Impurity

Missclassification Error
Gini Index
Deviance and Information Gain

Example

In RStudio:

data(iris) library(ggplot2) names(iris) table(iris$Species) // Trying to predict species // Separate data into training and test sets inTrain <- createDataPartition(y=iris$Species, p=0.7, list=FALSE) training <- iris[inTrain, ] testing <- iris[-inTrain, ] dim(training); dim(testing);

// Assert: should split 45/5 into training/testing repsectively
qplot(Petal.Width, Sepal.Width, colour=Species, data = training)

Any programming problem can be solved by adding a level of indirection.

– David J. Wheeler

Machine Learning Predcitve Modeling Methods

Bagging

Start with bagging…

Random Forest

Then, build prediction model using Random Forest

Boosting

Contrast the Random Forest with Boosting by tweaking the weights

technology

Share on: