Data preparation before detection of strangers

crazyfy preprocess data for anomalies detection computational routines with strange : missing values treatement, variables standardisation, eventual recoding in log, treatment of character/factor variables.

crazyfy(
  data,
  do = c("factor", "log", "impute", "range"),
  id = NULL,
  skewness.cutpoint = 2,
  NA.method = "mean",
  NA.value = 0,
  verbose = FALSE
)

Arguments

data: Source data (data.frame or data.table).
do: character vector - List of processing steps to apply -- see details.
id: (optional) character - name of a preexisting variable to be used as ID.
skewness.cutpoint: numeric - value that is used to determine whether log recoding should be applied.
NA.method: character - method to be used for missing values imputation; one of "mean" or "value" (then using following parameter NA.value).
NA.value: numeric Value to be used to impute missing values when NA.method if "value".
verbose: logical - should function display some details about processing.

Value

Pre-processed data of classes data.table overloaded by crazy.data.table.

Details

See here this list of possible pre-treatment operations. * factor: Factors/characters are transformed into numeric by using term frequency–inverse document frequency approach (td-idf). Note that we use the smooth weighting IDF weight, ie. we take the log of 1+N/nt where N is the number of observations and nt the frequency for the specific term t. * log: compute log(x-min(x)). Done for all numeric variables having a distribution with skewness greater than skewness.cutpoint * impute: impute missing values. Possible method, chosen with NA.method are using variable average or a specific value then provided by NA.value. * range: standardize variable: (x-min(x))/max(x).

Examples

library(stranger)
data(iris)
crazy <- crazyfy(iris[,1:4])