| Title: | Decode and Validate HEIMS Data from Department of Education, Australia |
|---|---|
| Description: | Decode elements of the Australian Higher Education Information Management System (HEIMS) data for clarity and performance. HEIMS is the record system of the Department of Education, Australia to record enrolments and completions in Australia's higher education system, as well as a range of relevant information. For more information, including the source of the data dictionary, see <http://heimshelp.education.gov.au/sites/heimshelp/dictionary/pages/data-element-dictionary>. |
| Authors: | Hugh Parsonage [aut, cre] |
| Maintainer: | Hugh Parsonage <[email protected]> |
| License: | GPL-3 |
| Version: | 0.4.1 |
| Built: | 2026-05-09 08:50:45 UTC |
| Source: | https://github.com/hughparsonage/heims |
Browse elements for description
browse_elements(pattern)browse_elements(pattern)
pattern |
A case-insensitive perl expression or expressions to match in the long name of |
A data.table of all element-long name combinations matching the perl regular expression.
browse_elements(c("ProViDer", "Maj"))browse_elements(c("ProViDer", "Maj"))
Decode HEIMS elements
decode_heims(DT, show_progress = FALSE, check_valid = TRUE, selector)decode_heims(DT, show_progress = FALSE, check_valid = TRUE, selector)
DT |
A |
show_progress |
Display the progress of the function (which is likely to be slow on real data). |
check_valid |
Check the variable is valid before decoding. Setting to |
selector |
Original HEIMS names to restrict the decoding to. Other names will be preserved. |
Each variable in DT is validated according heims_data_dict before being decoded. Any failure stops the validation.
If DT has a key, the output will have a key, but set on the decoded columns and
the ordering will most likely change (to reflect the decoded values).
This function will, on the full HEIMS data, take a long time to finish. Typically in the order of 10 minutes for the enrol file.
DT with the values decoded and the names renamed.
## Not run: # (E488 is made up so won't work if validation is attempted.) decode_heims(dummy_enrol) ## End(Not run) decode_heims(dummy_enrol, show_progress = TRUE, check_valid = FALSE)## Not run: # (E488 is made up so won't work if validation is attempted.) decode_heims(dummy_enrol) ## End(Not run) decode_heims(dummy_enrol, show_progress = TRUE, check_valid = FALSE)
Decoders
E089_decoder E095_decoder E306_decoder E310_decoder E312_decoder E316_decoder E329_decoder E327_decoder E330_decoder E331_decoder E337_decoder E346_decoder E348_decoder E355_decoder E358_decoder E386_decoder E392_decoder E461_decoder E463_decoder E464_decoder E490_decoder U490_decoder E551_decoder E562_decoder E919_decoder E920_decoder E922_decoder FOE_uniter HE_Provider_decoderE089_decoder E095_decoder E306_decoder E310_decoder E312_decoder E316_decoder E329_decoder E327_decoder E330_decoder E331_decoder E337_decoder E346_decoder E348_decoder E355_decoder E358_decoder E386_decoder E392_decoder E461_decoder E463_decoder E464_decoder E490_decoder U490_decoder E551_decoder E562_decoder E919_decoder E920_decoder E922_decoder FOE_uniter HE_Provider_decoder
An object of class data.table (inherits from data.frame) with 2 rows and 2 columns.
A data.table of five fictitious enrolments.
dummy_enroldummy_enrol
An object of class data.table (inherits from data.frame) with 5 rows and 56 columns.
Make HEIMS element nos human-readable
rename_heims(DT) element2name(v)rename_heims(DT) element2name(v)
DT |
The data table with original names |
v |
A vector of element names. |
See heims_data_dict. Note that decode_heims is generally better,
as it decodes the variable if a decoder is present in the dictionary.
element2name is the inverse of browse_elements:
given an element like E306, it returns
the name (HE_Provider_cd.)
DT with the new names or the vector with the names translated.
Return TRUE or FALSE on whether or not each variable in a data.table complies with the HEIMS code limits
validate_elements(DT, .progress_cat = FALSE) prop_elements_valid(DT, char = FALSE) count_elements_invalid(DT, char = FALSE)validate_elements(DT, .progress_cat = FALSE) prop_elements_valid(DT, char = FALSE) count_elements_invalid(DT, char = FALSE)
DT |
The data.table whose variables are to be validated. |
.progress_cat |
Should the progress of the function be displayed on the console? If |
char |
Return as character vector, in particular marking – any complete or completely absent values. |
For early detection of invalid results, the type of the variable (in particular integer vs double) is considered first,
vetoing a TRUE result if different.
A named logical vector, whether or not the variable complies with the style requirements. A value of NA indicates the variable
was not checked (perhaps because it is absent from heims_data_dict).
X <- data.frame(E306 = c(0, 1011, 999, 9998)) validate_elements(X) # FALSE prop_elements_valid(X) X <- data.frame(E306 = as.integer(c(0, 1011, 999, 9998))) validate_elements(X) # TRUEX <- data.frame(E306 = c(0, 1011, 999, 9998)) validate_elements(X) # FALSE prop_elements_valid(X) X <- data.frame(E306 = as.integer(c(0, 1011, 999, 9998))) validate_elements(X) # TRUE
See relevel_heims.
first_levelsfirst_levels
An object of class data.table (inherits from data.frame) with 8 rows and 2 columns.
Read raw HEIMS file
fread_heims(filename)fread_heims(filename)
filename |
A text-delimited file, passed to |
The strings "" "NA" "?" "." "*" "**" are treated as missing, as well as ZZZZZZZZZZ
(so students without a CHESSN will be marked with the integer64 missing value).
A data.table with column names in ascending (lexicographical) order and
any columns starting with e will be uppercase.
HEIMS data dictionary
heims_data_dictheims_data_dict
A named list each containing 5 elements:
long_namea human-readable version of the variable; orig_name the element number;
mark_missinga vectorized-function returning TRUE on values of the variable which should be coded as NA;
ad_hoc_preparea function to apply before validation;
validatea single-value function returning TRUE or FALSE on vectors which comply with the variable's coding rules.
ad_hoc_validation_noteIf the data dictionary did not cover elements in the file, how the validate function was altered to suffer them.
valida vectorized function returning TRUE or FALSE on vectors which do not comply with the variable's coding rules.
decoderA function of the data.table decoding the variable decoded.
post_fstA function of the data.table returned by fst to be used (for example to reset attributes).
Abbreviations in long_name:
amtAmount
cdCode
detDetail(s)
FOEField of education
MajMajor
http://heimshelp.education.gov.au/sites/heimshelp/dictionary/pages/data-element-dictionary
Read HEIMS data from decoded fst files
read_heims_fst(filename)read_heims_fst(filename)
filename |
File path to |
A data.table with appropriate attributes.
Changes categorical variables in a data.table to levels with a sensible reference level
relevel_heims(DT)relevel_heims(DT)
DT |
A |
The same data.table with character vectors changed to factors whose first level is the level intended.
Only included here because of the unusual nature of heims_data_dict.
AND() OR() never(v) every(v) always(v) is.Date(v) is.YearMonth(v) nth_digit_of(x, n) between(...) or(...) and(...) if_else(...) coalesce(...) a %fin% tbl rm_leading_0s(v) as.integer64(v) is.integer64(v) force_integer(v) ymd(...)AND() OR() never(v) every(v) always(v) is.Date(v) is.YearMonth(v) nth_digit_of(x, n) between(...) or(...) and(...) if_else(...) coalesce(...) a %fin% tbl rm_leading_0s(v) as.integer64(v) is.integer64(v) force_integer(v) ymd(...)
v |
A vector. |
x, n
|
vectors |
... |
Passed to other functions |
a |
Element suspected to be in |
tbl |
A lookup table. |
nth_digit_of returns the nth digit of the number starting from the units and going up in magnitude.
nth_digit_of(503, 1) == 1nth_digit_of(503, 1) == 1