Removing outliers in R with tools from dplyr and ggplot2 (CC232)

Removing outliers in R with tools from dplyr and ggplot2 (CC232)

7.525 Lượt nghe
Removing outliers in R with tools from dplyr and ggplot2 (CC232)
If you know you have outliers in your dataset how would you go about removing them in R? In this episode, Pat will show you how to identify outliers using graphical approaches using ggplot2 with geom_histogram and geom_line. If they are truly anomalous we remove them using functions from dplyr like filter, mutate, if_else, and drop_na. We'll do all this using local weather data from the NOAA website in RStudio You can find my blog post for this episode at https://www.riffomonas.org/code_club/2022-07-21-fixing-anomalies. #ggplot2 #dplyr #R #Rstudio #Rstats Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at https://shop.riffomonas.org/youtube to get practice problems, tips, and insights. If you're interested in taking an upcoming 3 day R workshop be sure to check out our schedule at https://riffomonas.org/workshops/ You can also find complete tutorials for learning R with the tidyverse using... Microbial ecology data: https://www.riffomonas.org/minimalR/ General data: https://www.riffomonas.org/generalR/ 0:00 Introduction 2:40 Identifying problematic data with line plots 6:16 Identifying problematic data with histograms 7:25 Identifying problematic data with slice_max 8:52 Rinse, repeat 11:36 Removing anomalous data 15:50 How you would remove categorical data 18:49 Removing rows with NA values