Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. This tutorial will explain details of using gradient boosting in practice, we will solve a classification problem using the popular GBDT library CatBoost.
www.pydata.org
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
0:00 - Introduction
1:49 - Intro to CatBoost
2:08 - Overview of the Presentation
2:39 - Intro to Gradient Boosting
6:08 - Numerical and Categorical Data with CatBoost
7:26 - Advantages of CatBoost
9:00 - Library Comparison (Quality)
9:45 - Speed
10:11 - Benchmarking (CPU & GPU)
11:55 - CPU vs GPU
12:50 - Prediction Time
13:24 - Tutorial
15:15 - Problem Statement
15:38 - CatBoost Library (Imports and related issues)
16:22 - Reading and Intro to the Data
18:17 - Exploring the data
19:36 - Training the Model with default parameters
22:16 - Creating the Pool Object
23:12 - Splitting the data (Train & Validation)
24:16 - Selecting the objective function
25:11 - STDOUT of training
28:32 - Plotting metrics while training
30:33 - Model Comparison (plotting after training)
32:39 - Finding the best model
35:05 - Cross-Validation
41:30 - Grid Search
44:40 - Overfitting Detector
49:18 - Overfitting Detector with eval metric
51:31 - Model Predictions
57:10 - Select Decision Boundary
1:01:04 - Model Evaluation (new dataset)
1:03:06 - Feature Importance
1:03:37 - Prediction Values Change
1:04:50 - Loss Function Change
1:07:49 - Shap Values
1:16:05 - Snapshotting
1:17:45 - Saving the Model
1:18:36 - Hyperparameters Tuning
1:23:07 - Speeding up Training and Reducing Model Size
1:23:35 - Additional Details about CatBoost Community
1:25:50 - Future Scope of CatBoost
1:26:22 - Questions and Suggestions
S/o to https://github.com/theProcrastinatr for the video timestamps!
Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps