Never miss a tutorial! Subscribe to the Project Data Science channel: https://bit.ly/3yTMdQV
Go from zero to hero with our Data Science Specialization: https://courses.projectdatascience.com/courses/becoming-a-data-science-practitioner.
Or learn about all of data science in one blog post! https://www.projectdatascience.com/what-is-data-science/
...
pandas (yes, it is technically lowercase!) is by far one of the most popular and most useful Python libraries for data science. If you're doing data science work in Python, you're using pandas. If you're doing data analysis work in Python, you're using pandas.
And this pandas mega-tutorial shows you 90% of the pandas functionality you'll ever need to learn.
In this tutorial, you're going to learn pandas by using a real dataset to answer questions like:
1. What is the happiest country in the world?
2. Which region has the happiest countries?
3. Which countries have made the most improvement over time?
Here's some of the pandas functionality you'll learn:
- Loading the data
- Inspecting the data
- Calculating descriptive statistics (mean, min, max, etc.)
- Looking at the data types
- Using indexing, masking, and filtering to see only the data you want to see
- Sorting and argmax
- Histograms and plotting
- Creating new columns from existing ones
- Combining multiple datasets
- Cleaning mismatched/dirty data
- Checking for nulls
- Using SQL-like joins to bring in other data and enrich your dataset
- Writing data out to a CSV file
- Using value_counts to count rows per unique value
- Using group by operations (like SQL "GROUP BY")
- Using pivot tables in pandas like in Excel
You learn best through doing—so grab your computer, and get ready to learn pandas!
Happy learning!
---
00:00 Introduction
02:40 Download the data
06:47 Create a project directory and conda virtual environment
15:06 Launch Jupyter notebooks and setup our notebook
19:03 How do you load data using pandas?
25:08 Inspect our data: df.shape, df.dtypes, df.describe
35:48 What exactly is a pandas DataFrame?
44:23 How do you calculate the mean, min and max of a column in pandas?
46:30 How do you filter a pandas DataFrame?
01:04:25 How do you sort a pandas DataFrame?
01:10:50 How do you plot a histogram in pandas?
01:16:25 How do you make a scatterplot in pandas?
01:19:55 How do you multiply a column by a number?
01:23:39 How do you combine DataFrames in pandas?
02:02:07 How do you join two DataFrames in pandas?
02:17:24 How do you check for null / NaN values in a pandas DataFrame?
02:21:20 How do you write a pandas DataFrame to CSV?
02:24:16 How many times does each unique value show up in a column?
02:32:00 How do you do a group by in pandas?
02:40:27 Answering a complex question in pandas
02:52:54 How do you use unstack in pandas?
03:01:30 How do you do a pivot table in pandas?
03:09:24 Future learning, and thank you! Happy learning!
---
Additional Project Data Science Resources:
- https://projectdatascience.com/ (The official site!)
-
https://www.youtube.com/watch?v=rdaG53khzv0&list=PLMAyPTgGwv2DUV6DZib9eMetsTTX87JNr (Beginner's introduction to machine learning with Python.)
-
https://youtu.be/GNKt8TAIAVc (Beginner's introduction to neural networks in Python.)
---
Additional Resources:
- https://pandas.pydata.org/ (The official pandas site.)
- https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html (A great pandas reference from the official pandas creators.)
- https://docs.conda.io/en/latest/miniconda.html (Miniconda, my preferred way to install conda.)
- https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428 (If you find yourself struggling to know which method in pandas to use out of several possible options, this is a good reference.)