Title: | Decode and Validate HEIMS Data from Department of Education, Australia |
---|---|
Description: | Decode elements of the Australian Higher Education Information Management System (HEIMS) data for clarity and performance. HEIMS is the record system of the Department of Education, Australia to record enrolments and completions in Australia's higher education system, as well as a range of relevant information. For more information, including the source of the data dictionary, see <http://heimshelp.education.gov.au/sites/heimshelp/dictionary/pages/data-element-dictionary>. |
Authors: | Hugh Parsonage [aut, cre] |
Maintainer: | Hugh Parsonage <[email protected]> |
License: | GPL-3 |
Version: | 0.4.1 |
Built: | 2024-10-29 03:00:44 UTC |
Source: | https://github.com/hughparsonage/heims |
Browse elements for description
browse_elements(pattern)
browse_elements(pattern)
pattern |
A case-insensitive perl expression or expressions to match in the long name of |
A data.table
of all element-long name combinations matching the perl regular expression.
browse_elements(c("ProViDer", "Maj"))
browse_elements(c("ProViDer", "Maj"))
Decode HEIMS elements
decode_heims(DT, show_progress = FALSE, check_valid = TRUE, selector)
decode_heims(DT, show_progress = FALSE, check_valid = TRUE, selector)
DT |
A |
show_progress |
Display the progress of the function (which is likely to be slow on real data). |
check_valid |
Check the variable is valid before decoding. Setting to |
selector |
Original HEIMS names to restrict the decoding to. Other names will be preserved. |
Each variable in DT
is validated according heims_data_dict
before being decoded. Any failure stops the validation.
If DT
has a key, the output will have a key, but set on the decoded columns and
the ordering will most likely change (to reflect the decoded values).
This function will, on the full HEIMS data, take a long time to finish. Typically in the order of 10 minutes for the enrol file.
DT
with the values decoded and the names renamed.
## Not run: # (E488 is made up so won't work if validation is attempted.) decode_heims(dummy_enrol) ## End(Not run) decode_heims(dummy_enrol, show_progress = TRUE, check_valid = FALSE)
## Not run: # (E488 is made up so won't work if validation is attempted.) decode_heims(dummy_enrol) ## End(Not run) decode_heims(dummy_enrol, show_progress = TRUE, check_valid = FALSE)
Decoders
E089_decoder E095_decoder E306_decoder E310_decoder E312_decoder E316_decoder E329_decoder E327_decoder E330_decoder E331_decoder E337_decoder E346_decoder E348_decoder E355_decoder E358_decoder E386_decoder E392_decoder E461_decoder E463_decoder E464_decoder E490_decoder U490_decoder E551_decoder E562_decoder E919_decoder E920_decoder E922_decoder FOE_uniter HE_Provider_decoder
E089_decoder E095_decoder E306_decoder E310_decoder E312_decoder E316_decoder E329_decoder E327_decoder E330_decoder E331_decoder E337_decoder E346_decoder E348_decoder E355_decoder E358_decoder E386_decoder E392_decoder E461_decoder E463_decoder E464_decoder E490_decoder U490_decoder E551_decoder E562_decoder E919_decoder E920_decoder E922_decoder FOE_uniter HE_Provider_decoder
An object of class data.table
(inherits from data.frame
) with 2 rows and 2 columns.
A data.table
of five fictitious enrolments.
dummy_enrol
dummy_enrol
An object of class data.table
(inherits from data.frame
) with 5 rows and 56 columns.
Make HEIMS element nos human-readable
rename_heims(DT) element2name(v)
rename_heims(DT) element2name(v)
DT |
The data table with original names |
v |
A vector of element names. |
See heims_data_dict
. Note that decode_heims
is generally better,
as it decodes the variable if a decoder is present in the dictionary.
element2name
is the inverse of browse_elements
:
given an element like E306
, it returns
the name (HE_Provider_cd
.)
DT
with the new names or the vector with the names translated.
Return TRUE or FALSE on whether or not each variable in a data.table complies with the HEIMS code limits
validate_elements(DT, .progress_cat = FALSE) prop_elements_valid(DT, char = FALSE) count_elements_invalid(DT, char = FALSE)
validate_elements(DT, .progress_cat = FALSE) prop_elements_valid(DT, char = FALSE) count_elements_invalid(DT, char = FALSE)
DT |
The data.table whose variables are to be validated. |
.progress_cat |
Should the progress of the function be displayed on the console? If |
char |
Return as character vector, in particular marking – any complete or completely absent values. |
For early detection of invalid results, the type of the variable (in particular integer vs double) is considered first,
vetoing a TRUE
result if different.
A named logical vector, whether or not the variable complies with the style requirements. A value of NA
indicates the variable
was not checked (perhaps because it is absent from heims_data_dict
).
X <- data.frame(E306 = c(0, 1011, 999, 9998)) validate_elements(X) # FALSE prop_elements_valid(X) X <- data.frame(E306 = as.integer(c(0, 1011, 999, 9998))) validate_elements(X) # TRUE
X <- data.frame(E306 = c(0, 1011, 999, 9998)) validate_elements(X) # FALSE prop_elements_valid(X) X <- data.frame(E306 = as.integer(c(0, 1011, 999, 9998))) validate_elements(X) # TRUE
See relevel_heims
.
first_levels
first_levels
An object of class data.table
(inherits from data.frame
) with 8 rows and 2 columns.
Read raw HEIMS file
fread_heims(filename)
fread_heims(filename)
filename |
A text-delimited file, passed to |
The strings "" "NA" "?" "." "*" "**"
are treated as missing, as well as ZZZZZZZZZZ
(so students without a CHESSN will be marked with the integer64
missing value).
A data.table
with column names in ascending (lexicographical) order and
any columns starting with e
will be uppercase.
HEIMS data dictionary
heims_data_dict
heims_data_dict
A named list each containing 5 elements:
long_name
a human-readable version of the variable; orig_name
the element number;
mark_missing
a vectorized-function returning TRUE
on values of the variable which should be coded as NA
;
ad_hoc_prepare
a function to apply before validation;
validate
a single-value function returning TRUE
or FALSE
on vectors which comply with the variable's coding rules.
ad_hoc_validation_note
If the data dictionary did not cover elements in the file, how the validate
function was altered to suffer them.
valid
a vectorized function returning TRUE
or FALSE
on vectors which do not comply with the variable's coding rules.
decoder
A function of the data.table
decoding the variable decoded.
post_fst
A function of the data.table
returned by fst to be used (for example to reset attributes).
Abbreviations in long_name
:
amt
Amount
cd
Code
det
Detail(s)
FOE
Field of education
Maj
Major
http://heimshelp.education.gov.au/sites/heimshelp/dictionary/pages/data-element-dictionary
Read HEIMS data from decoded fst files
read_heims_fst(filename)
read_heims_fst(filename)
filename |
File path to |
A data.table
with appropriate attributes.
Changes categorical variables in a data.table to levels with a sensible reference level
relevel_heims(DT)
relevel_heims(DT)
DT |
A |
The same data.table with character vectors changed to factors whose first level is the level intended.
Only included here because of the unusual nature of heims_data_dict
.
AND() OR() never(v) every(v) always(v) is.Date(v) is.YearMonth(v) nth_digit_of(x, n) between(...) or(...) and(...) if_else(...) coalesce(...) a %fin% tbl rm_leading_0s(v) as.integer64(v) is.integer64(v) force_integer(v) ymd(...)
AND() OR() never(v) every(v) always(v) is.Date(v) is.YearMonth(v) nth_digit_of(x, n) between(...) or(...) and(...) if_else(...) coalesce(...) a %fin% tbl rm_leading_0s(v) as.integer64(v) is.integer64(v) force_integer(v) ymd(...)
v |
A vector. |
x , n
|
vectors |
... |
Passed to other functions |
a |
Element suspected to be in |
tbl |
A lookup table. |
nth_digit_of
returns the nth digit of the number starting from the units and going up in magnitude.
nth_digit_of(503, 1) == 1
nth_digit_of(503, 1) == 1