I'm trying out the skills from module 05 with some birth data that I imported from a Stata file. I have a four-level categorical variable for maternal age (age 15-19, 20s, 30s, or age 40+) that was a "numeric" type and I tried to make it a "factor" instead. It looks like it worked, but then it won't let me group infant birth weight (tgrams) by this maternal age variable (mom_age_cat). It looks like all the mom_age_cat values are missing? But the dataset I imported had no missing data on this variable...
df <- rio::import("data/NC_birth_data_PGME_edited.dta")
str(df)
'data.frame': 800 obs. of 15 variables:
$ plural : num 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "label")= chr "plural"
..- attr(*, "format.stata")= chr "%10.0g"
$ sex : num 1 2 1 1 1 1 2 2 2 2 ...
..- attr(*, "label")= chr "sex"
..- attr(*, "format.stata")= chr "%10.0g"
$ mage : num 32 32 27 27 25 28 25 15 37 21 ...
..- attr(*, "label")= chr "mage"
..- attr(*, "format.stata")= chr "%10.0g"
$ weeks : num 40 37 39 39 39 43 39 42 41 39 ...
..- attr(*, "label")= chr "weeks"
..- attr(*, "format.stata")= chr "%10.0g"
$ marital : num 1 1 1 1 1 1 1 2 1 1 ...
..- attr(*, "label")= chr "marital"
..- attr(*, "format.stata")= chr "%10.0g"
$ racemom : num 1 1 1 1 1 1 1 1 8 1 ...
..- attr(*, "label")= chr "racemom"
..- attr(*, "format.stata")= chr "%10.0g"
$ hispmom : chr "N" "N" "N" "N" ...
..- attr(*, "label")= chr "hispmom"
..- attr(*, "format.stata")= chr "%9s"
$ gained : num 38 34 12 15 32 32 75 25 31 28 ...
..- attr(*, "label")= chr "gained"
..- attr(*, "format.stata")= chr "%10.0g"
$ smoke : num 0 0 0 0 0 0 0 0 0 0 ...
..- attr(*, "label")= chr "smoke"
..- attr(*, "format.stata")= chr "%10.0g"
$ drink : num 0 0 0 0 0 0 0 0 0 0 ...
..- attr(*, "label")= chr "drink"
..- attr(*, "format.stata")= chr "%10.0g"
$ tounces : num 111 116 138 136 121 117 143 113 139 120 ...
..- attr(*, "label")= chr "tounces"
..- attr(*, "format.stata")= chr "%10.0g"
$ tgrams : num 3147 3289 3912 3856 3430 ...
..- attr(*, "label")= chr "tgrams"
..- attr(*, "format.stata")= chr "%10.0g"
$ low : num 0 0 0 0 0 0 0 0 0 0 ...
..- attr(*, "label")= chr "low"
..- attr(*, "format.stata")= chr "%10.0g"
$ premie : num 0 0 0 0 0 0 0 0 0 0 ...
..- attr(*, "label")= chr "premie"
..- attr(*, "format.stata")= chr "%10.0g"
$ mom_age_cat: num 2 2 1 1 1 1 1 0 2 1 ...
..- attr(*, "format.stata")= chr "%10.0g"
..- attr(*, "labels")= Named num [1:4] 0 1 2 3
.. ..- attr(*, "names")= chr [1:4] "Aged 15-19" "Aged 20-29" "Aged 30-39" "Aged 40+"
df$mom_age_cat <- factor(df$mom_age_cat, levels=c("15-19", "20-29", "30-39", "40+"), ordered=TRUE) # Data is imported type "character" we need to change that to factors before running statistics.
levels(df$mom_age_cat) # Confirm that our data is ordered properly.
[1] "15-19" "20-29" "30-39" "40+"
df %>%
+ group_by(mom_age_cat) %>%
+ get_summary_stats(tgrams, type = "mean_sd")
# A tibble: 1 x 5
mom_age_cat variable n mean sd
<ord> <chr> <dbl> <dbl> <dbl>
1 NA tgrams 800 3299. 639.