Data visualizations of anomaly score locally around a specific data point

# S3 method for stranger
plot(x, type = "cluster", id = ".id", score = NULL, anomaly_id = NULL, ...)

# S3 method for fortifiedanomaly
plot(
  x,
  type = "feature_importance",
  id = ".id",
  anomaly_id = NULL,
  score = NULL,
  ...
)

# S3 method for anomalies
plot(x, type = "feature_importance", id = ".id", anomaly_id = NULL, ...)

# S3 method for singular
plot(x, type = "cluster", id = ".id", score = NULL, anomaly_id = NULL, ...)

Arguments

type: is the name of the visualization; (1) A hierarchical clustering, named "cluster", showing among the top n-anomaly which records belongs to the same cluster a specific record. Finding the commun pattern amoung the cluster may lead to the orign of of the specifi record score. (2) A dots plot, named "neighbours", showing the relationship between the anomly score and each feature for the k nearest neighbours of a specific record. (3) A bar chart, named "feature_importance", showing how sensitive is the anomaly score of a specific record to each of feature. This may help to identify the features behind the score. (4) A dots plot, names "score_decline", showing the decrease in anomaly score among the k nearest neighbours of a specific record. The shape indicates how extrem and how frequent is the anomaly score of a speicif record amoung its neighbours. (5) A Regression tree, named "regression_tree", showing the roots to high score around a specific record.
id: is the colname with records IDs
score: is the colname which contains the anomaly score
anomaly_id: is the record ID you want to investigate
data: is either of class dataframe, stranger or anomaly. It contains the observations; each row represents an observation and each variable is stored in one column. It must have at least one column with IDs and one column with the anomaly score for each ID.
check: logical indicating if object data should be checked for validity. The default is TRUE, this check is not necessary when data is known to be valid such as when it is the direct result of stranger().
keep: character vector: names of columns to keep (filter)
drop: character vector: names of columns to drop (filter)
n.cluster: is the number of cluster groups to emphasis. This parameter must only be specified with type ="cluster".
n.anom: is the number of top anomalies to be considered. This parameter must only be specified with type ="cluster".
k: is the number of neighbours to be considered. This parameter must always be specified, except with type = "cluster".
n_label: specifies the number of data point to be labelled in the plot. This parameter must only be specified with type ="scores_decline".

Value

A plot

Details

Function that produces visualizations to understand the anomaly score locally around a specific data point. We believe this should help people to trust scores a made by models even if they don’t fully understand them. Today, 5 visualisazions are implemented; (1) A hierarchical clustering, named "cluster", showing among the top n-anomaly which records belongs to the same cluster a specific record. Finding the commun pattern amoung the cluster may lead to the orign of of the specifi record score. (2) A dots plot, named "neighbours", showing the relationship between the anomly score and each feature for the k nearest neighbours of a specific record. (3) A bar chart, named "feature_importance", showing how sensitive is the anomaly score of a specific record to each of feature. This may help to identify the features behind the score. (4) A dots plot, names "score_decline", showing the decrease in anomaly score among the k nearest neighbours of a specific record. The shape indicates how extrem and how frequent is the anomaly score of a speicif record amoung its neighbours. (5) A Regression tree, named "regression_tree", showing the roots to high score around a specific record.

Examples

# \dontrun{
data(iris)
library(dplyr)
data <- iris %>% select(-Species) %>% crazyfy()
anom1<- data %>% strange()
result <- fortify(anom1)
investigate(result, type="cluster", id = ".id", score = "knn_k_10_mean", anomaly_id = 10, n.cluster = 4, n.anom = 50)
#> Error in investigate(result, type = "cluster", id = ".id", score = "knn_k_10_mean",     anomaly_id = 10, n.cluster = 4, n.anom = 50): could not find function "investigate"
investigate(result, type="neighbours", id = ".id", score = "knn_k_10_mean", anomaly_id = 10, k = 200)
#> Error in investigate(result, type = "neighbours", id = ".id", score = "knn_k_10_mean",     anomaly_id = 10, k = 200): could not find function "investigate"
investigate(result, type="feature_importance", id = ".id", score = "knn_k_10_mean", anomaly_id = 10, k = 100)
#> Error in investigate(result, type = "feature_importance", id = ".id",     score = "knn_k_10_mean", anomaly_id = 10, k = 100): could not find function "investigate"
investigate(result, type="scores_decline", id = ".id", score = "knn_k_10_mean", anomaly_id = 10, k = 50, n_label = 10)
#> Error in investigate(result, type = "scores_decline", id = ".id", score = "knn_k_10_mean",     anomaly_id = 10, k = 50, n_label = 10): could not find function "investigate"
investigate(result, type="regression_tree", id = ".id", score = "knn_k_10_mean", anomaly_id = 10, k = 1000)
#> Error in investigate(result, type = "regression_tree", id = ".id", score = "knn_k_10_mean",     anomaly_id = 10, k = 1000): could not find function "investigate"
# }