Calculates performance measures for a calssifier Assumes there are two
classes, and the first of level(true_cl)
is to be predicted (the
"positive").
assess_clsfyr(score, true_cls, measure = "ACC", threshold = seq(0, 1, by = 0.1))
score | probabilities or scores for the target class 1 ("positive"); scores are assumed to be in \([0, 1]\) and high scores correspond to high probability. |
---|---|
true_cls | vector of indicators for the target class: |
measure | a character vector of performance measures to be calculated, see Details. |
threshold | threshold for prediction, see |
A data.frame where each column corresponds to one value of
threshold
. The corresponding values can be found with attr(result, "threshold")
.
Valid options for measure
are
"TP"
: number of true positives,
"FP"
: number of false positive,
"TN"
: number of true negatives,
"FN"
: number of false negatives,
"TPR"
, "sensitivity"
, "recall"
: true positive rate (\(TP / P\)),
"FPR"
, "fall-out"
: false positive rate (\(FP / N\)),
"TNR"
, "specificity"
: true negative rate (\(TN / N\)),
"FNR"
: false negative rate (\(FN / P\)),
"PRC"
, "PPV"
: precision/positive predictive value (\(TP / (TP + FP)\),
"NPV"
: negative predictive value (\(TN / (TN + FN)\)),
"FDR"
: false discovery rate (\(FP / (TP + FP)\)),
"ACC"
, "accuracy"
: accuracy (\((TP + TN) / (P + N)\)),
"F1"
: F1 score (\(2 * TP / (2 * TP + FP + FN)\)),
"MCC"
: Matthews correlation coefficient
$$\frac{(TP * TN - FP * FN)}{[(TP + FP) * (TP + FN) * (TN + FP) *
(TN + FN)]^(-1/2)},$$
"informedness"
: informedness (\(TP / P + TN / N - 1\)),
"markedness"
: markedness (\(TP / (TP + FP) + TN / (TN + FN) - 1\)),
"AUC"
: area under the curve (must be in first position)
where P
and N
are the number of positives and negatives, respectively.
# simulate training and test data dat <- data.frame( cl = as.factor(rbinom(10, 1, 0.5)), x1 = rnorm(10), x2 = rbinom(10, 1, 0.3) ) model <- jdify(cl ~ x1 + x2, data = dat) # joint density fit probs <- predict(model, dat, what = "probs") # conditional probabilities # calculate performance measures assess_clsfyr(probs[, 1], dat[, 1] == 0, measure = c("ACC", "F1"))#> threshold measure value #> 1 0.0 ACC 0.6000000 #> 2 0.1 ACC 0.6000000 #> 3 0.2 ACC 0.6000000 #> 4 0.3 ACC 0.6000000 #> 5 0.4 ACC 0.6000000 #> 6 0.5 ACC 0.8000000 #> 7 0.6 ACC 0.6000000 #> 8 0.7 ACC 0.5000000 #> 9 0.8 ACC 0.4000000 #> 10 0.9 ACC 0.4000000 #> 11 1.0 ACC 0.4000000 #> 12 0.0 F1 0.7500000 #> 13 0.1 F1 0.7500000 #> 14 0.2 F1 0.7500000 #> 15 0.3 F1 0.7500000 #> 16 0.4 F1 0.7500000 #> 17 0.5 F1 0.8571429 #> 18 0.6 F1 0.6000000 #> 19 0.7 F1 0.2857143 #> 20 0.8 F1 0.0000000 #> 21 0.9 F1 0.0000000 #> 22 1.0 F1 0.0000000# calculate area under the curve FPR <- assess_clsfyr(probs[, 1], dat[, 1] == 0, measure = c("FPR"))$value TPR <- assess_clsfyr(probs[, 1], dat[, 1] == 0, measure = c("TPR"))$value get_auc(data.frame(FPR = FPR, TPR = TPR))#> [,1] #> [1,] 0.75