This package contains selected totals from the Community Profiles time series data, released by the Australian Bureau of Statistics as part of the Census 2016 release.
library(knitr)
pkgs_suggested <- c("magrittr",
"ggplot2",
"scales",
"ggrepel")
suggested_packages_usable <-
all(vapply(pkgs_suggested, requireNamespace, logical(1), quietly = TRUE))
knitr::opts_chunk$set(eval = suggested_packages_usable,
# dev = "png",
fig.width = 8,
fig.height = 6)
## For additional data packages for the 2016 Census, visit https://github.com/HughParsonage/Census2016.DataPack
data.kable <- function(DT) {
current_knitr.kable.NA <- options("knitr.kable.NA")
options(knitr.kable.NA = '...')
on.exit(options(knitr.kable.NA = current_knitr.kable.NA))
if (nrow(DT) > 50) {
middle_row <- as.data.table(matrix(nrow = 1, ncol = ncol(DT)))
setnames(middle_row, seq_along(middle_row), names(DT))
DT_topn <- rbind(head(DT),
middle_row,
tail(DT))
kable(DT_topn, format.args = list(big.mark = ","))
} else {
kable(DT, format.args = list(big.mark = ","))
}
}
There is one function see_question()
and 6 data
sets.
Census_wide_by_SA2_year
This is a simple data.table
of multiple variables for
each statistical area 2 (SA2
)-census year combination. The
columns are ordered roughly by the order of the questions on the Census
form. Not all values are available to satisfy CRAN’s limits on package
size.
Both the sa2_code
and sa2_name
are provided
for convenience.
Census2016_wide_by_SA2_year %>%
.[year == 2016] %>%
.[, .(sa2_name, persons, median_household_income, median_annual_mortgage)] %>%
.[order(median_annual_mortgage)] %>%
data.kable
sa2_name | persons | median_household_income | median_annual_mortgage |
---|---|---|---|
Deua - Wadbilliga | 25 | 44,148 | 0 |
Port Kembla Industrial | 10 | 0 | 0 |
Illawarra Catchment Reserve | 8 | 0 | 0 |
Prospect Reservoir | 40 | 0 | 0 |
Banksmeadow | 18 | 0 | 0 |
Port Botany Industrial | 10 | 0 | 0 |
… | … | … | … |
Cottesloe | 7,375 | 138,788 | 39,000 |
Lilli Pilli - Port Hacking - Dolans Bay | 3,148 | 155,220 | 39,300 |
Nedlands - Dalkeith - Crawley | 18,534 | 118,976 | 39,600 |
Rose Bay - Vaucluse - Watsons Bay | 11,840 | 150,228 | 41,604 |
Balgowlah - Clontarf - Seaforth | 20,186 | 145,496 | 41,604 |
Hunters Hill - Woolwich | 10,345 | 142,064 | 42,000 |
Census2016_wide_by_SA2_year %>%
.[year == 2016] %>%
.[, .(sa2_name, persons, median_household_income, median_annual_mortgage)] %>%
.[median_annual_mortgage > 0] %>%
.[, mortgage_less_income := median_annual_mortgage - median_household_income] %>%
.[, text := NA_character_] %>%
.[, color := "black"] %>%
.[order(mortgage_less_income)] %>%
.[.N:1 <= 5, text := sa2_name] %>%
.[.N:1 <= 5, color := "red"] %>%
.[1:.N <= 5, text := sa2_name] %>%
.[1:.N <= 5, color := "blue"] %>%
ggplot(aes(x = median_household_income,
y = median_annual_mortgage,
size = persons,
alpha = persons,
color = color)) +
geom_point() +
scale_color_identity() +
scale_size(labels = comma) +
scale_alpha_continuous(labels = comma,
range = c(0, 0.5)) +
scale_x_continuous("Median annual household income", labels = dollar) +
scale_y_continuous("Median annual mortgage", labels = dollar) +
geom_label_repel(aes(label = text),
alpha = 1,
na.rm = TRUE) +
ggtitle("High-income households live alongside high-mortgage households",
subtitle = paste0("SA2s, 2016 with 5 highest (red) or lowest (blue)",
" nonzero mortgage relative to income"))
In addition to the 2016 data, the package also includes 2006 and 2011 census data as part of the time series. The ABS has released these data series to be comparable; even though the SA2 boundaries have changed you may assume that they refer to the same geographic area.
We can see that Mandarin has become much more common
languages_spoken_by_year <-
Census2016_languages %>%
.[, .(persons = sum(persons)), keyby = .(language, year)] %>%
setorder(-year, -persons) %>%
.[]
languages_spoken_by_year %>%
# Examine the top six languages,
# leave the others unlabelled and grey
.[language %in% languages_spoken_by_year$language[1:6],
Language := language] %>%
.[year == 2016, text := Language] %>%
.[, Language := reorder(Language, -persons)] %>%
ggplot(aes(x = year,
y = persons,
group = language,
color = Language,
label = text)) +
geom_line() +
scale_y_continuous(label = comma) +
geom_text_repel(na.rm = TRUE,
fontface = "bold",
force = 1.5,
nudge_x = 0.5)
see_question
Although Census2016
is intended as a data-only package,
there is one function, see_question
.
It is frequently useful to view the actual question that was asked
when looking at survey data. see_question
provides a
convenient way to do this without leaving RStudio (or even your
keyboard). There are two methods: see_question.numeric
takes a question number and prints it.
The other method is dispatched when one of the two-dimensional tables is supplied. This method returns the relevant question to the data set. For example,
(The data input is returned invisibly.)