Out-of-bag evaluation

Random forests do not require a validation dataset. Most random forests use a technique called out-of-bag-evaluation( OOB evaluation) to evaluate the quality of the model. OOB evaluation treats the training set as if it were on the test set of a cross-validation.

As explained earlier, each decision tree in a random forest is typically trained on ~67% of the training examples. Therefore, each decision tree does not see ~33% of the training examples. The core idea of OOB-evaluation is as follows:

  • To evaluate the random forest on the training set.
  • For each example, only use the decision trees that did not see the example during training.

The following table illustrates OOB evaluation of a random forest with 3 decision trees trained on 6 examples. (Yes, this is the same table as in the Bagging section). The table shows which decision tree is used with which example during OOB evaluation.

Table 7. OOB Evaluation - the numbers represent the number of times a given training example is used during training of the given example

Training examples
Examples for OOB Evaluation
#1
#2
#3
#4
#5
#6
original dataset
1
1
1
1
1
1
decision tree 1
1
1
0
2
1
1
#3
decision tree 2
3
0
1
0
2
0
#2, #4, and #6
decision tree 3
0
1
3
1
0
1
#1 and #5

In the example shown in Table 7, the OOB predictions for training example 1 will be computed with decision tree #3 (since decision trees #1 and #2 used this example for training). In practice, on a reasonable size dataset and with a few decision trees, all the examples have an OOB prediction.

YDF Code
In YDF, the OOB-evaluation is available in the training logs if the model is trained with compute_oob_performances=True .

OOB evaluation is also effective to compute permutation variable importance for random forest models. Remember from Variable importances that permutation variable importance measures the importance of a variable by measuring the drop of model quality when this variable is shuffled. The random forest "OOB permutation variable importance" is a permutation variable importance computed using the OOB evaluation.

YDF Code
In YDF, the OOB permutation variable importances are available in the training logs if the model is trained with compute_oob_variable_importances=True .
Design a Mobile Site
View Site in Mobile | Classic
Share by: