| Title: | Use Raw Vectors to Minimize Memory Consumption of Factors |
|---|---|
| Description: | Uses raw vectors to minimize memory consumption of categorical variables with fewer than 256 unique values. Useful for analysis of large datasets involving variables such as age, years, states, countries, or education levels. |
| Authors: | Hugh Parsonage [aut, cre] |
| Maintainer: | Hugh Parsonage <[email protected]> |
| License: | GPL-2 |
| Version: | 0.1.0 |
| Built: | 2026-05-19 06:57:44 UTC |
| Source: | https://github.com/hughparsonage/factor256 |
Aggregating helpers
count_by256(DT, by = NULL, count_col = "N")count_by256(DT, by = NULL, count_col = "N")
DT |
A |
by |
(string) A column of |
count_col |
(string) The name of the column in the result containing the counts. |
For:
count_by256A tally of by.
Whereas base R's factors are based on 32-bit integer vectors,
factor256 uses 8-bit raw vectors to minimize its memory footprint.
factor256(x, levels = NULL) recompose256(f) relevel256(x, levels) ## S3 method for class 'factor256' levels(x) is.factor256(x) isntSorted256(x, strictly = FALSE) as_factor(x) factor256_in(x, tbl) factor256_notin(x, tbl) factor256_ein(x, tbl) factor256_enotin(x, tbl) tabulate256(f) rank256(x) order256(x) unique256(x) tabulate256_levels(x, nmax = NULL, dotInterval = 65535L)factor256(x, levels = NULL) recompose256(f) relevel256(x, levels) ## S3 method for class 'factor256' levels(x) is.factor256(x) isntSorted256(x, strictly = FALSE) as_factor(x) factor256_in(x, tbl) factor256_notin(x, tbl) factor256_ein(x, tbl) factor256_enotin(x, tbl) tabulate256(f) rank256(x) order256(x) unique256(x) tabulate256_levels(x, nmax = NULL, dotInterval = 65535L)
x |
An atomic vector with fewer than 256 unique elements. |
levels |
An optional character vector of or representing the unique values of |
f |
A raw vector of class |
strictly |
If |
tbl |
The table of values to lookup in |
nmax, dotInterval
|
( |
factor256 is a class based on raw vectors.
Values in x absent from levels are mapped to 00.
In the following list, o is the result.
factor256A raw vector of class factor256.
recompose256is the inverse operation.
factor256_e?(not)?inA logical vector the same length of f, o[i] = TRUE if
f[i] is among the values of tbl when converted to factor256.
_notin is the negation. The factor256_e variants will error if
none of the values of tbl are present in f.
tabulate256Takes a raw vector and counts the number of times each element occurs within it. It is always length-256; if an element is absent it will have value zero in the output.
tabulate256_levelsSimilar to tabulate256 but with optional arguments nmax,
dotInterval.
as_factorConverts from factor256 to factor.
order256Same as order but supports raw vectors. order256(x)
rank256Same as rank with ties.method = "first" but supports raw vectors.
unique256Unique elements of.
f10 <- factor256(1:10) fletters <- factor256(rep(letters, 1:26)) head(factor256_in(fletters, "g")) head(tabulate256(fletters)) head(recompose256(fletters)) gletters <- factor256(rep(letters, 1:26), levels = letters[1:25]) tail(tabulate256(gletters)) tabulate256_levels(gletters, nmax = 5L, dotInterval = 1L)f10 <- factor256(1:10) fletters <- factor256(rep(letters, 1:26)) head(factor256_in(fletters, "g")) head(tabulate256(fletters)) head(recompose256(fletters)) gletters <- factor256(rep(letters, 1:26), levels = letters[1:25]) tail(tabulate256(gletters)) tabulate256_levels(gletters, nmax = 5L, dotInterval = 1L)
Some processes do not accept raw vectors so it can be necessary to convert our vectors to integers.
interlace256(w, x, y = NULL, z = NULL) deinterlace256(u) interlace256_columns(DT, new_colnames = 1L) deinterlace256_columns(DT, new_colnames = 1L)interlace256(w, x, y = NULL, z = NULL) deinterlace256(u) interlace256_columns(DT, new_colnames = 1L) deinterlace256_columns(DT, new_colnames = 1L)
w, x, y, z
|
Raw vectors. A vector may be |
u |
An integer vector. |
DT |
A |
new_colnames |
A mechanism for producing the new columns. Currently only
|
interlace256 Return an integer vector, compressing raw vectors.
deinterlace256 is the inverse operation, returning a list of four raw vectors.
setkey for raw columnssetkey for raw columns
setkeyv256(DT, cols)setkeyv256(DT, cols)
DT |
A |
cols |
Column names as in |
Same as data.table::setkeyv except that raw cols will be
converted to factors (as data.table does not allow raw keys).