Calculates performance measures for a calssifier Assumes there are two classes, and the first of level(true_cl) is to be predicted (the "positive").

assess_clsfyr(score, true_cls, measure = "ACC", threshold = seq(0, 1, by =
  0.1))

Arguments

score

probabilities or scores for the target class 1 ("positive"); scores are assumed to be in \([0, 1]\) and high scores correspond to high probability.

true_cls

vector of indicators for the target class: TRUE or 1 if true class is the target class, FALSE or 0 else.

measure

a character vector of performance measures to be calculated, see Details.

threshold

threshold for prediction, see predict.jdify().

Value

A data.frame where each column corresponds to one value of threshold. The corresponding values can be found with attr(result, "threshold").

Details

Valid options for measure are

  • "TP": number of true positives,

  • "FP": number of false positive,

  • "TN": number of true negatives,

  • "FN": number of false negatives,

  • "TPR", "sensitivity", "recall": true positive rate (\(TP / P\)),

  • "FPR", "fall-out": false positive rate (\(FP / N\)),

  • "TNR", "specificity": true negative rate (\(TN / N\)),

  • "FNR": false negative rate (\(FN / P\)),

  • "PRC", "PPV": precision/positive predictive value (\(TP / (TP + FP)\),

  • "NPV": negative predictive value (\(TN / (TN + FN)\)),

  • "FDR": false discovery rate (\(FP / (TP + FP)\)),

  • "ACC", "accuracy": accuracy (\((TP + TN) / (P + N)\)),

  • "F1": F1 score (\(2 * TP / (2 * TP + FP + FN)\)),

  • "MCC": Matthews correlation coefficient $$\frac{(TP * TN - FP * FN)}{[(TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)]^(-1/2)},$$

  • "informedness": informedness (\(TP / P + TN / N - 1\)),

  • "markedness": markedness (\(TP / (TP + FP) + TN / (TN + FN) - 1\)),

  • "AUC": area under the curve (must be in first position)

where P and N are the number of positives and negatives, respectively.

See also

get_auc()

Examples

# simulate training and test data dat <- data.frame( cl = as.factor(rbinom(10, 1, 0.5)), x1 = rnorm(10), x2 = rbinom(10, 1, 0.3) ) model <- jdify(cl ~ x1 + x2, data = dat) # joint density fit probs <- predict(model, dat, what = "probs") # conditional probabilities # calculate performance measures assess_clsfyr(probs[, 1], dat[, 1] == 0, measure = c("ACC", "F1"))
#> threshold measure value #> 1 0.0 ACC 0.6000000 #> 2 0.1 ACC 0.6000000 #> 3 0.2 ACC 0.6000000 #> 4 0.3 ACC 0.6000000 #> 5 0.4 ACC 0.6000000 #> 6 0.5 ACC 0.8000000 #> 7 0.6 ACC 0.6000000 #> 8 0.7 ACC 0.5000000 #> 9 0.8 ACC 0.4000000 #> 10 0.9 ACC 0.4000000 #> 11 1.0 ACC 0.4000000 #> 12 0.0 F1 0.7500000 #> 13 0.1 F1 0.7500000 #> 14 0.2 F1 0.7500000 #> 15 0.3 F1 0.7500000 #> 16 0.4 F1 0.7500000 #> 17 0.5 F1 0.8571429 #> 18 0.6 F1 0.6000000 #> 19 0.7 F1 0.2857143 #> 20 0.8 F1 0.0000000 #> 21 0.9 F1 0.0000000 #> 22 1.0 F1 0.0000000
# calculate area under the curve FPR <- assess_clsfyr(probs[, 1], dat[, 1] == 0, measure = c("FPR"))$value TPR <- assess_clsfyr(probs[, 1], dat[, 1] == 0, measure = c("TPR"))$value get_auc(data.frame(FPR = FPR, TPR = TPR))
#> [,1] #> [1,] 0.75