This is the first #tutorial video in a data science project mini-series, starting with understanding the data through exploratory data analysis using the essential #python libraries: pandas, scikit-learn and seaborn.
I am walking you through a unique #machinelearning project, using REAL data collected with smart meter devices installed on buildings in Portugal.
Summary of the video:
00:00 - Hello!
01:20 - Loading a .csv as pandas data frame
02:55 - Cast date type to a string column in pandas data frame
04:35 - Print a summary statistics in python
05:16 - Generate full date range every 15 minutes
09:20 - Plot a histogram from a summary statistics table in python
12:14 - Count duplicate rows in pandas data frame
13:20 - Missing Values Analysis
21:20 - Identify downtime of smart meters
24:57 - Identify correlated time series
27:10 - Create heatmap from correlation matrix with seaborn
29:40 - Convert the correlation matrix into an upper-triangular matrix with np.tril_indices
33:22 - Extract values from correlation matrix by condition in python
34:36 - Plot two time series on seaborn lineplot
35:06 - Scaling values in a pandas data frame
37:26 - Create date related columns from date time column in pandas
43:05 - Subset pandas data frame by column names
44:36 - Move columns on rows in pandas data frames and aggregate values
49:53 - Discover monthly and weekly periodicity
58:02 Conclusions and see you next time!
Datasets: https://data.mendeley.com/datasets/vryvyfz2tj/1
Article: https://www.sciencedirect.com/science/article/pii/S2352340924003421
GIVE the GIT REPO a ⭐to let me know it's worth sharing my code.
⭐ Git Repo: https://github.com/giraffa-analytics/energy_consumption_yt/tree/master