class: left, middle, inverse, title-slide # Practice Solutions to
Intro to R and Rstudio for EDA - Part 1 ### Jessica Minnier, PhD & Meike Niederhausen, PhD ### OCTRI Biostatistics, Epidemiology, Research & Design (BERD) Workshop
2020/09/16 --- layout: true <!-- <div class="my-footer"><span>bit.ly/berd_tidy</span></div> --> --- # Practice 1 (pg. 1) 1. Create a new Rmd file to type the code and answers for the tasks below in it. 1. Remove the template text starting with line 12 (keep the YAML header and setup code chunk), and save the file as `Practice1.Rmd` 1. Create a new code chunk. 1. Create a vector of all integers from 4 to 10, and save it as `a1`. 1. What does the command `sum(a1)` do? 1. What does the command `length(a1)` do? 1. Use the `sum` and `length` commands to calculate the average of the values in `a1`. 1. Knit the Rmd file. --- # Answers to Practice 1 questions __#4__ Create a vector of all integers from 4 to 10, and save it as `a1`. ```r a1 <- 4:10 ``` __#5__ What does the command `sum(a1)` do? ```r sum(a1) ``` ``` [1] 49 ``` `sum` adds up the values in the vector --- __#6__ What does the command `length(a1)` do? ```r length(a1) ``` ``` [1] 7 ``` `length` is the number of values in the vector __#7__ Use the commands to calculate the average of the values in `a1`. ```r sum(a1) / length(a1) ``` ``` [1] 7 ``` ```r # this is equivalent mean(a1) ``` ``` [1] 7 ``` --- # Practice 1 (pg. 2) * Run the code below to install the `tidyverse` and `janitor` packages in R, which we will be using in upcoming slides. + If you get a message about restarting R, click Yes. + If you get an error message (warnings are ok), ask a helper. ```r # install.packages("tidyverse") # install.packages("janitor") ``` * After running the code, comment out the code with `#` in front of the commands so that they do not run when knitting the file. + *We only need to install packages once* and thus do not need to run this code again. * __Take a break!__ --- # Practice 2 Create a new Rmd for Practice 2 or continue in your current Rmd. 1. Find the median bill length. Is the median bill length similar to the mean? 1. What is the distance between the smallest and largest bill *depths*? 1. What does the `range()` command do? Try it out on the bill depths. 1. Make a scatterplot with bill length on the x-axis and bill depth on the y-axis. What is the relationship between bill length and depth? 1. Knit your Rmd file. 1. If you have time, * install the package `skimr` * load the package * run the command `skim(penguins)` * what does the `skim` command do? --- # Practice 2 Answers __#1__ Find the median bill length. Is the median bill length similar to the mean? ```r median(penguins$bill_length_mm, na.rm = TRUE) ``` ``` [1] 44.7 ``` ```r mean(penguins$bill_length_mm, na.rm = TRUE) ``` ``` [1] 44.00387 ``` The mean and median bill lengths are similar to each other. --- __#2__ What is the distance between the smallest and largest bill *depths*? ```r max(penguins$bill_depth_mm) - min(penguins$bill_depth_mm) ``` ``` [1] 8.4 ``` The distance between the smallest and largest bill depths is 8.4 mm. *Note that we do not need to use `na.rm = TRUE` for bill depths since there are no missing values. __#3__ What does the `range()` command do? Try it out on the bill depths. ```r range(penguins$bill_depth_mm) ``` ``` [1] 13.1 21.5 ``` The `range()` command gives the minimum and maximum values. --- __#4__ Make a scatterplot with bill length on the x-axis and bill depth on the y-axis. What is the relationship between bill length and depth? ```r plot(penguins$bill_length_mm, penguins$bill_depth_mm, xlab = "Length", ylab = "Depth", main = "Bill depth vs. length") ``` <img src="01_intro_r_eda_Practice_Answers_part1_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> --- __#6__ If you have time, * install the package `skimr` * load the package * run the command `skim(penguins)` * what does the `skim` command do? .pull-left-40[ ```r # install.packages("skimr") library(skimr) skim(penguins) ``` The `skim()` command gives summaries of each of the variables in the dataset. ] .pull-right-60[ <center> <img src="img/skim_penguins.png" width="100%" height="100%"> </center> ]