Title: | Use Raw Vectors to Minimize Memory Consumption of Factors |
---|---|
Description: | Uses raw vectors to minimize memory consumption of categorical variables with fewer than 256 unique values. Useful for analysis of large datasets involving variables such as age, years, states, countries, or education levels. |
Authors: | Hugh Parsonage [aut, cre] |
Maintainer: | Hugh Parsonage <[email protected]> |
License: | GPL-2 |
Version: | 0.1.0 |
Built: | 2024-11-03 06:21:42 UTC |
Source: | https://github.com/hughparsonage/factor256 |
Aggregating helpers
count_by256(DT, by = NULL, count_col = "N")
count_by256(DT, by = NULL, count_col = "N")
DT |
A |
by |
(string) A column of |
count_col |
(string) The name of the column in the result containing the counts. |
For:
count_by256
A tally of by
.
Whereas base R's factors are based on 32-bit integer vectors,
factor256
uses 8-bit raw vectors to minimize its memory footprint.
factor256(x, levels = NULL) recompose256(f) relevel256(x, levels) ## S3 method for class 'factor256' levels(x) is.factor256(x) isntSorted256(x, strictly = FALSE) as_factor(x) factor256_in(x, tbl) factor256_notin(x, tbl) factor256_ein(x, tbl) factor256_enotin(x, tbl) tabulate256(f) rank256(x) order256(x) unique256(x) tabulate256_levels(x, nmax = NULL, dotInterval = 65535L)
factor256(x, levels = NULL) recompose256(f) relevel256(x, levels) ## S3 method for class 'factor256' levels(x) is.factor256(x) isntSorted256(x, strictly = FALSE) as_factor(x) factor256_in(x, tbl) factor256_notin(x, tbl) factor256_ein(x, tbl) factor256_enotin(x, tbl) tabulate256(f) rank256(x) order256(x) unique256(x) tabulate256_levels(x, nmax = NULL, dotInterval = 65535L)
x |
An atomic vector with fewer than 256 unique elements. |
levels |
An optional character vector of or representing the unique values of |
f |
A raw vector of class |
strictly |
If |
tbl |
The table of values to lookup in |
nmax , dotInterval
|
( |
factor256
is a class based on raw vectors.
Values in x
absent from levels
are mapped to 00
.
In the following list, o
is the result.
factor256
A raw vector of class factor256
.
recompose256
is the inverse operation.
factor256_e?(not)?in
A logical vector the same length of f
, o[i] = TRUE
if
f[i]
is among the values of tbl
when converted to factor256
.
_notin
is the negation. The factor256_e
variants will error if
none of the values of tbl
are present in f
.
tabulate256
Takes a raw vector and counts the number of times each element occurs within it. It is always length-256; if an element is absent it will have value zero in the output.
tabulate256_levels
Similar to tabulate256
but with optional arguments nmax
,
dotInterval
.
as_factor
Converts from factor256
to factor
.
order256
Same as order
but supports raw vectors. order256(x)
rank256
Same as rank
with ties.method = "first"
but supports raw vectors.
unique256
Unique elements of.
f10 <- factor256(1:10) fletters <- factor256(rep(letters, 1:26)) head(factor256_in(fletters, "g")) head(tabulate256(fletters)) head(recompose256(fletters)) gletters <- factor256(rep(letters, 1:26), levels = letters[1:25]) tail(tabulate256(gletters)) tabulate256_levels(gletters, nmax = 5L, dotInterval = 1L)
f10 <- factor256(1:10) fletters <- factor256(rep(letters, 1:26)) head(factor256_in(fletters, "g")) head(tabulate256(fletters)) head(recompose256(fletters)) gletters <- factor256(rep(letters, 1:26), levels = letters[1:25]) tail(tabulate256(gletters)) tabulate256_levels(gletters, nmax = 5L, dotInterval = 1L)
Some processes do not accept raw vectors so it can be necessary to convert our vectors to integers.
interlace256(w, x, y = NULL, z = NULL) deinterlace256(u) interlace256_columns(DT, new_colnames = 1L) deinterlace256_columns(DT, new_colnames = 1L)
interlace256(w, x, y = NULL, z = NULL) deinterlace256(u) interlace256_columns(DT, new_colnames = 1L) deinterlace256_columns(DT, new_colnames = 1L)
w , x , y , z
|
Raw vectors. A vector may be |
u |
An integer vector. |
DT |
A |
new_colnames |
A mechanism for producing the new columns. Currently only
|
interlace256
Return an integer vector, compressing raw vectors.
deinterlace256
is the inverse operation, returning a list of four raw vectors.
setkey
for raw columnssetkey
for raw columns
setkeyv256(DT, cols)
setkeyv256(DT, cols)
DT |
A |
cols |
Column names as in |
Same as data.table::setkeyv
except that raw cols
will be
converted to factors (as data.table
does not allow raw keys).