Pruning Regression Trees is one the most important ways we can prevent them from overfitting the Training Data. This video walks you through Cost Complexity Pruning, aka Weakest Link Pruning, step-by-step so that you can learn how it works and see it in action.
NOTE: This StatQuest assumes you already know about...
Regression Trees:
https://youtu.be/g9c66TUylZ4
ALSO NOTE: This StatQuest is based on the Cost Complexity Pruning algorithm found on pages 307 to 309 of the Introduction to Statistical Learning in R: http://faculty.marshall.usc.edu/gareth-james/ISL/
For a complete index of all the StatQuest videos, check out:
https://statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Patreon: https://www.patreon.com/statquest
...or...
YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join
...buying one of my books, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
https://statquest.org/statquest-store/
...or just donating to StatQuest!
https://www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
https://twitter.com/joshuastarmer
0:00 Awesome song and introduction
0:59 Motivation for pruning a tree
3:58 Calculating the sum of squared residuals for pruned trees
7:50 Comparing pruned trees with alpha.
11:17 Step 1: Use all of the data to build trees with different alphas
13:05 Step 2: Use cross validation to compare alphas
15:02 Step 3: Select the alpha that, on average, gives the best results
15:27 Step 4: Select the original tree that corresponds to that alpha
#statquest #regression #tree