Practice Solutions to Getting Started with R and RStudio

class: left, middle, inverse, title-slide

# Practice Solutions to <br> Getting Started with R and RStudio
### Jessica Minnier, PhD & Meike Niederhausen, PhD
### OCTRI Biostatistics, Epidemiology, Research & Design (BERD) Workshop <br><br> 2019/09/24

---

layout: true

---

# Practice questions 1

1. Open a new R script and type code/answers for next tasks in it. Save as `Practice1.R`

1. Create a vector of all integers from 4 to 10, and save it as `a1`.

1. Create a vector of _even_ integers from 4 to 10, and save it as `a2`.

1. What is the sum of `a1` and `a2`?

1. What does the command `sum(a1)` do?

1. What does the command `length(a1)` do?

1. Use the commands to calculate the average of the values in `a1`.

1. The formula for the first `$n$` integers is `$n(n+1)/2$`. Compute the sum of all integers from 1 to 100 to verify that this formula holds for `$n=100$`.

1. Compute the sum of the squares of all integers from 1 to 100.

1. Take a break!

---

# Answers to practice questions 1

__#2__ Create a vector of all integers from 4 to 10, and save it as `a1`.

__#3__ Create a vector of _even_ integers from 4 to 10, and save it as `a2`.

```r
a1 <- 4:10
a2 <- c(4, 6, 8, 10)
# the following works as well:
a2 <- 2*(2:5)
# or
a2 <- seq(4, 10, by=2)
```

---

__#4__ What is the sum of `a1` and `a2`?

```r
a1+a2
```

```
Warning in a1 + a2: longer object length is not a multiple of shorter
object length
```

```
[1]  8 11 14 17 12 15 18
```

Note that instead of giving an error, the terms of `a1` are repeated as needed since `a2` is longer than `a1`

---

__#5__ What does the command `sum(a1)` do?

```r
sum(a1)
```

```
[1] 49
```

`sum` adds up the values in the vector

<br>

__#6__ What does the command `length(a1)` do?

```r
length(a1)
```

```
[1] 7
```

`length` is the number of values in the vector

---

__#7__ Use the commands to calculate the average of the values in `a1`.

```r
sum(a1) / length(a1)
```

```
[1] 7
```

```r
# this is equivalent
mean(a1)
```

```
[1] 7
```

---

__#8__ The formula for the first `$n$` integers is `$n(n+1)/2$`. Compute the sum of all integers from 1 to 100 to verify that this formula holds for `$n=100$`.

```r
sum(1:100)
```

```
[1] 5050
```

```r
# verify formula for n=100:
n=100
n * (n+1) / 2
```

```
[1] 5050
```

---

__#9__ Compute the sum of the squares of all integers from 1 to 100.

```r
# The following code creates a vector of the squares of all integers from 1 to 100
(1:100)^2
```

```
  [1]     1     4     9    16    25    36    49    64    81   100   121
 [12]   144   169   196   225   256   289   324   361   400   441   484
 [23]   529   576   625   676   729   784   841   900   961  1024  1089
 [34]  1156  1225  1296  1369  1444  1521  1600  1681  1764  1849  1936
 [45]  2025  2116  2209  2304  2401  2500  2601  2704  2809  2916  3025
 [56]  3136  3249  3364  3481  3600  3721  3844  3969  4096  4225  4356
 [67]  4489  4624  4761  4900  5041  5184  5329  5476  5625  5776  5929
 [78]  6084  6241  6400  6561  6724  6889  7056  7225  7396  7569  7744
 [89]  7921  8100  8281  8464  8649  8836  9025  9216  9409  9604  9801
[100] 10000
```

```r
# Now add the squares:
sum((1:100)^2)
```

```
[1] 338350
```

---

# Practice 2

1. Create a new script and save it as `Practice2.R`

1. Create data frames for males and females separately.

1. Do males and females have similar BMIs? Weights? Compares means, standard deviations, range, and boxplots.

1. Plot BMI vs. weight for each gender separately. Do they have similar relationships?

1. Are males or females more likely to be bullied in the past 12 months? Calculate the percentage bullied for each gender.

---

# Practice 2 Answers

__#2__ Create data frames for males and females separately.

```r
boys <- mydata[mydata$sex == "Male", ]
dim(boys)
```

```
[1]  8 11
```

```r
girls <- mydata[mydata$sex == "Female", ]
dim(girls)
```

```
[1] 12 11
```

Check number of boys & girls:

```r
summary(mydata$sex)
```

```
Female   Male 
    12      8 
```

---

__#3__ Do males and females have similar BMIs? Weights? Compares means, standard deviations, range, and boxplots.

.pull-left-60[

```r
summary(boys$bmi); sd(boys$bmi)
```

```
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  18.18   19.57   20.90   20.63   21.58   22.46 
```

```
[1] 1.466896
```

```r
summary(girls$bmi); sd(girls$bmi)
```

```
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  17.48   21.95   25.80   24.59   27.47   29.35 
```

```
[1] 3.70739
```

]
.pull-right-40[

```r
boxplot(mydata$bmi ~ mydata$sex)
```

<img src="01_getting_started_Practice_Answers_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" />
]

---

__#4__ Plot BMI vs. weight for each gender separately. Do they have similar relationships?

.pull-left[

```r
plot(boys$bmi, boys$weight)
```

]

.pull-right[

```r
plot(girls$bmi, girls$weight)
```

]

---

__#5__ Are males or females more likely to be bullied in the past 12 months? Calculate the percentage bullied for each gender.

.pull-left[

```r
bullied_boys <- 
  boys[boys$bullied_past_12mo == TRUE,]
nrow(bullied_boys)
```

```
[1] 3
```

```r
bullied_boys_prct <- 
  nrow(bullied_boys) / nrow(boys) * 100
bullied_boys_prct
```

```
[1] 37.5
```

```r
# alternative
mean(boys$bullied_past_12mo, na.rm=TRUE)
```

```
[1] 0.375
```
]
.pull-right[

```r
# Apply the same method for girls:
bullied_girls <- 
  girls[girls$bullied_past_12mo == TRUE,]
nrow(bullied_girls)
```

```
[1] 6
```

```r
bullied_girls_prct <- 
  nrow(bullied_girls) / nrow(girls) * 100
bullied_girls_prct
```

```
[1] 50
```

```r
# alternative. Answers don't match. Why???
mean(girls$bullied_past_12mo, na.rm=TRUE)
```

```
[1] 0.4
```

]

---

__#5__ cont'd

On the previous slide we saw that our two methods for calculating the percentage of girls that were bullied in the past 12 months did not match. What went wrong?

```r
nrow(bullied_girls)
```

```
[1] 6
```

```r
girls$bullied_past_12mo
```

```
 [1]    NA    NA  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE
[12] FALSE
```

To get the number of girls that were bullied we need to make sure the missing values (NA) are not included.

---

__#5__ cont'd - working with NA's

```r
# values of bullied_past_12mo
girls$bullied_past_12mo
```

```
 [1]    NA    NA  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE
[12] FALSE
```

```r
# which are missing (logical)
is.na(girls$bullied_past_12mo)
```

```
 [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[12] FALSE
```

```r
# which are NOT missing (logical)
!is.na(girls$bullied_past_12mo)
```

```
 [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[12]  TRUE
```

---

__#5__ cont'd - fix girls' code

Exclude the missing values from the `bullied_girls`:

```r
girls2 <- girls[!is.na(girls$bullied_past_12mo),]
nrow(girls2)
```

```
[1] 10
```

```r
bullied_girls2 <- girls2[girls2$bullied_past_12mo == TRUE,]
nrow(bullied_girls2)
```

```
[1] 4
```

```r
# from girls dataset, total number bullied
sum(girls$bullied_past_12mo, na.rm = TRUE)
```

```
[1] 4
```

---

__#5__ cont'd - Calculate percentage girls bullied

```r
bullied_girls_prct2 <- nrow(bullied_girls2) / nrow(girls2) * 100
bullied_girls_prct2
```

```
[1] 40
```

```r
# Compare to alternative
mean(girls$bullied_past_12mo, na.rm=TRUE)
```

```
[1] 0.4
```