Welcome to series on Data Interview questions and answers! In this video, we'll cover the most asked and most common data questions that help you prepare nextgen data professionals for data engineer, data analyst, & BIE position. We'll explore technical and real-life case-based questions covering SQL, DAX, POWER BI, Python, ETL, data warehousing, machine learning, Microsoft Fabric, Azure & Snowflake Cloud platforms, Big Data frameworks (Databricks, Spark).
Want more similar videos- hit like, comment, share and subscribe
❤️Do Like, Share and Comment
❤️ Like Aim 1000 likes!
➖➖➖➖➖➖➖➖➖➖➖➖➖
Please like & share the video.
➖➖➖➖➖➖➖➖➖➖➖➖➖
Sample Data script
/*Sample Data Script */
-- Drop if exists
DROP TABLE IF EXISTS customers;
-- Create table
CREATE TABLE customers (
customer_id INT IDENTITY(1,1) PRIMARY KEY,
customer_name VARCHAR(100),
email VARCHAR(100)
);
-- Insert sample data with multi-column duplicates
INSERT INTO customers (customer_name, email) VALUES
('Alice Smith', '
[email protected]'),
('Bob Jones', '
[email protected]'),
('Alice Smith', '
[email protected]'), -- duplicate
('Charlie Ray', '
[email protected]'),
('Bob Jones', '
[email protected]'), -- duplicate
('Bob Jones', '
[email protected]'); -- not a duplicate (different email)
➖➖➖➖➖➖➖➖➖➖➖➖➖
what we have covered in this video:
IDENTIFYING & DELETING DUPLICATES IN DATA SQL Problem Statement - Solved using ROW NUMBER, WHERE CLAUSE, CTE
(MEDIUM & ADVANCE SQL DATA INTERVIEW CONCEPTS)
We also looked as some tips and strategies for the standing out in your next data interview!
Summary:
MOST ASKED AND MOST IMPORTANT MUST KNOW INTERVIEW PROBLEM STATEMENT -
How would you identify and delete duplicates from the data?
DUPLICATES QUESTIONS CATEGORY
TIP 1 - Use ROW NUMBER Window Function Or GROUP BY
TIP 2 - Ask FOLLOW UP questions to the interviewer
regarding the data & database they are working.
To identify the fields to be used in Parition By Clause
TIP 3 - STAND OUT!! - Walkthrough the code
and explain why you used ROW_NUMBER & not other window functions!
Comparison of Window Functions (for Duplicates Use Case Scenario)
ROW_NUMBER() -- Assigns a unique sequential number to each row in a partition
✅ Best for identifying & removing exact duplicates
RANK()-- Assigns same rank to ties but skips numbers (1, 1, 3, ...)
❌ Not ideal — causes gaps, may leave multiple rows marked the same
DENSE_RANK() -- Like RANK, but doesn't skip values (1, 1, 2, ...)
❌ Also not suitable for precise de-duplication
➖➖➖➖➖➖➖➖➖➖➖➖➖
Hope this video was useful and you learned something new :) See you in next video, until then Bye!