If you know you have outliers in your dataset how would you go about removing them in R? In this episode, Pat will show you how to identify outliers using graphical approaches using ggplot2 with geom_histogram and geom_line. If they are truly anomalous we remove them using functions from dplyr like filter, mutate, if_else, and drop_na. We'll do all this using local weather data from the NOAA website in RStudio
You can find my blog post for this episode at https://www.riffomonas.org/code_club/2022-07-21-fixing-anomalies.
#ggplot2 #dplyr #R #Rstudio #Rstats
Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at https://shop.riffomonas.org/youtube to get practice problems, tips, and insights.
If you're interested in taking an upcoming 3 day R workshop be sure to check out our schedule at https://riffomonas.org/workshops/
You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: https://www.riffomonas.org/minimalR/
General data: https://www.riffomonas.org/generalR/
0:00 Introduction
2:40 Identifying problematic data with line plots
6:16 Identifying problematic data with histograms
7:25 Identifying problematic data with slice_max
8:52 Rinse, repeat
11:36 Removing anomalous data
15:50 How you would remove categorical data
18:49 Removing rows with NA values