Handling Categorical Data in Machine Learning: Easy Explanation for Data Science Interviews

Handling Categorical Data in Machine Learning: Easy Explanation for Data Science Interviews

8.320 Lượt nghe
Handling Categorical Data in Machine Learning: Easy Explanation for Data Science Interviews
Handling categorical data in machine learning projects is a very common topic in data science interviews. In this video, I’ll cover the difference between treating a variable as a dummy variable vs. a non-dummy variable, how you can deal with categorical features when the number of levels is very large, and the pros and cons of various strategies. Feature hashing https://en.wikipedia.org/wiki/Feature_hashing 🟢Get all my free data science interview resources https://www.emmading.com/resources 🟡 Product Case Interview Cheatsheet https://www.emmading.com/product-case-cheat-sheet 🟠 Statistics Interview Cheatsheet https://www.emmading.com/statistics-interview-cheat-sheet 🟣 Behavioral Interview Cheatsheet https://www.emmading.com/behavioral-interview-cheat-sheet 🔵 Data Science Resume Checklist https://www.emmading.com/data-science-resume-checklist ✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: https://www.emmading.com/coaching // Comment Got any questions? Something to add? Write a comment below to chat. // Let's connect on LinkedIn: https://www.linkedin.com/in/emmading001/ ==================== Contents of this video: ==================== 00:00 Introduction 00:48 Categorical Data 02:22 Ordinal Features & Class Labels 03:38 One-Hot Encoding 05:32 Dummy Encoding 06:30 Problems of One-Hot & Dummy Encoding 07:26 Feature Hashing