Must Know Data Interview Question| HANDLING DUPLICATES | INTERVIEW TIPS | SQL | PART 4

Must Know Data Interview Question| HANDLING DUPLICATES | INTERVIEW TIPS | SQL | PART 4

108 Lượt nghe
Must Know Data Interview Question| HANDLING DUPLICATES | INTERVIEW TIPS | SQL | PART 4
Welcome to series on Data Interview questions and answers! In this video, we'll cover the most asked and most common data questions that help you prepare nextgen data professionals for data engineer, data analyst, & BIE position. We'll explore technical and real-life case-based questions covering SQL, DAX, POWER BI, Python, ETL, data warehousing, machine learning, Microsoft Fabric, Azure & Snowflake Cloud platforms, Big Data frameworks (Databricks, Spark). Want more similar videos- hit like, comment, share and subscribe ❤️Do Like, Share and Comment ❤️ Like Aim 1000 likes! ➖➖➖➖➖➖➖➖➖➖➖➖➖ Please like & share the video. ➖➖➖➖➖➖➖➖➖➖➖➖➖ Sample Data script /*Sample Data Script */ -- Drop if exists DROP TABLE IF EXISTS customers; -- Create table CREATE TABLE customers ( customer_id INT IDENTITY(1,1) PRIMARY KEY, customer_name VARCHAR(100), email VARCHAR(100) ); -- Insert sample data with multi-column duplicates INSERT INTO customers (customer_name, email) VALUES ('Alice Smith', '[email protected]'), ('Bob Jones', '[email protected]'), ('Alice Smith', '[email protected]'), -- duplicate ('Charlie Ray', '[email protected]'), ('Bob Jones', '[email protected]'), -- duplicate ('Bob Jones', '[email protected]'); -- not a duplicate (different email) ➖➖➖➖➖➖➖➖➖➖➖➖➖ what we have covered in this video: IDENTIFYING & DELETING DUPLICATES IN DATA SQL Problem Statement - Solved using ROW NUMBER, WHERE CLAUSE, CTE (MEDIUM & ADVANCE SQL DATA INTERVIEW CONCEPTS) We also looked as some tips and strategies for the standing out in your next data interview! Summary: MOST ASKED AND MOST IMPORTANT MUST KNOW INTERVIEW PROBLEM STATEMENT - How would you identify and delete duplicates from the data? DUPLICATES QUESTIONS CATEGORY TIP 1 - Use ROW NUMBER Window Function Or GROUP BY TIP 2 - Ask FOLLOW UP questions to the interviewer regarding the data & database they are working. To identify the fields to be used in Parition By Clause TIP 3 - STAND OUT!! - Walkthrough the code and explain why you used ROW_NUMBER & not other window functions! Comparison of Window Functions (for Duplicates Use Case Scenario) ROW_NUMBER() -- Assigns a unique sequential number to each row in a partition ✅ Best for identifying & removing exact duplicates RANK()-- Assigns same rank to ties but skips numbers (1, 1, 3, ...) ❌ Not ideal — causes gaps, may leave multiple rows marked the same DENSE_RANK() -- Like RANK, but doesn't skip values (1, 1, 2, ...) ❌ Also not suitable for precise de-duplication ➖➖➖➖➖➖➖➖➖➖➖➖➖ Hope this video was useful and you learned something new :) See you in next video, until then Bye!