Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. These videos introduce you to these tools. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup
https://youtu.be/jOd65mR1zfw
-
01:44 Intro and what’s covered
Ground Rules
-
02:40 What’s a tibble
-
04:50 Use View
-
05:25 The Pipe operator:
-
07:20 What do I mean by data wrangling?
Pt. 2: Tidy Data and tidyr
https://youtu.be/1ELALQlO-yM
- /
00:48 Goal 1 Making your data suitable for R
- /
01:40 `tidyr` “Tidy” Data introduced and motivated
- /
08:15 `tidyr::gather`
- /
12:38 `tidyr::spread`
- /
15:30 `tidyr::unite`
- /
15:30 `tidyr::separate`
Pt. 3: Data manipulation tools: `dplyr`
https://youtu.be/Zc_ufg4uW4U
- 00.40 setup
- /
02:00 `dplyr::select`
- /
03:40 `dplyr::filter`
- /
05:05 `dplyr::mutate`
- /
07:05 `dplyr::summarise`
- /
08:30 `dplyr::arrange`
- /
09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
- /
11:45 `dplyr::group_by`
- /
15:00 `dplyr::group_by`
Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins
https://youtu.be/AuBgYDCg1Cg
Combining two datasets together
- /00.42 `dplyr::bind_cols`
- /
01:27 `dplyr::bind_rows`
- /
01:42 Set operations
`dplyr::union`, `dplyr::intersect`, `dplyr::set_diff`
- /
02:15 joining data
`dplyr::left_join`, `dplyr::inner_join`, `dplyr::right_join`, `dplyr::full_join`,
______________________________________________________________
Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
Documentation:
`tidyr` docs: tidyr.tidyverse.org/reference/
- `tidyr` vignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
`dplyr` docs: http://dplyr.tidyverse.org/reference/
- `dplyr` one-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
- `dplyr` two-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html
______________________________________________________________
New York Times “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”, By STEVE LOHRAUG. 17, 2014 https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html
______________________________________________________________