energy-I-Score for imputation of mixed data (categorical and numerical)
energy_Iscore_cat.Rdenergy-I-Score for imputation of mixed data (categorical and numerical)
Usage
energy_Iscore_cat(
X,
imputation_func,
X_imp = imputation_func(X),
multiple = TRUE,
N = 50,
max_length = NULL,
skip_if_needed = TRUE,
scale = FALSE,
n_cores = 1,
silent = TRUE
)Arguments
- X
data containing missing values denoted with NA's.
- imputation_func
a function that imputes data.
- X_imp
imputed dataset of the same size as
X. It'sNULLby default meaning that it will be obtained by imputation ofXusing theimputation_func.- multiple
a logical indicating whether provided imputation method is a multiple imputation approach (i.e. it generates different values to impute for each call). Default to TRUE. Note that if multiple equals to FALSE, N is automatically set to 1.
- N
a numeric value. Number of samples from imputation distribution H. Default to 50.
- max_length
Maximum number of variables \(X_j\) to consider, can speed up the code. Default to
NULLmeaning that all the columns will be taken under consideration.- skip_if_needed
logical, indicating whether some observations should be skipped to obtain complete columns for scoring. If FALSE, NA will be returned for column with no observed variable for training.
- scale
a logical value. If TRUE, each variable is scaled in the score.
- n_cores
a number of cores for parallelization.
- silent
logical indicating whether warnings and messages should be printed.
Value
a numerical value denoting weighted Imputation Score obtained for provided imputation function and a table with scores and weights calculated for particular columns.
Details
The categorical variables should be stored as factors. If you need additional
conversion of the data (for example one-hot encoding) for imputation, please,
implement everything within imputation_func parameter. You can use
miceDRF:::onehot_to_factor and miceDRF:::factor_to_onehot
functions.
References
This method is described in detail in:
Näf, J., Grzesiak, K., and Scornet, E. (2025). How to rank imputation methods? arXiv preprint. doi:10.48550/arXiv.2507.11297 .
Examples
set.seed(123)
X <- Iscores:::random_mcar_mixed_data(n = 100, p = 3, n_fac = 1, ratio = 0.2)
imp_fun <- Iscores:::median_mode_imputation
sc <- Iscores:::energy_Iscore_cat(X = X, imputation_func = imp_fun, N = 5)
sc
#> [1] 0.7699801
#> attr(,"dat")
#> column_id weight score n_columns_used
#> col1 1 0.1824 0.7096268 1
#> col2 2 0.1659 0.7301908 1
#> col4 4 0.1600 0.9355567 1
#> col3 3 0.1539 0.7122622 1