Data cleaning is one of the more undervalued steps in a data anlaysis. In this episode we'll use a variety of functions from the tidyverse to get three data frames into the right format and then we'll join them all together. This will help us get ready for downstream analyses looking for microbiome-based biomarkers associated with colorectal cancer.
In this episode, Pat will use the #tidyverse in #RStudio. The accompanying blog post can be found at https://www.riffomonas.org/code_club/2021-06-30-data-cleaning.
If you're interested in taking an upcoming 3 day R workshop, email me at
[email protected]!
R: https://r-project.org
RStudio: https://rstudio.com
Raw data: https://github.com/riffomonas/raw_data/releases/latest
Workshops: https://www.mothur.org/wiki/workshops
You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: https://www.riffomonas.org/minimalR/
General data: https://www.riffomonas.org/generalR/
0:00 Introduction
2:29 Tidying a mothur shared file
6:21 Formatting a taxonomy file
15:14 Calculating genus relative abundances
17:39 Formatting metadata and joining to relative abundances
23:31 Committing changes
24:26 Recap