Measures of classification performance

Calculates performance measures for a calssifier Assumes there are two classes, and the first of level(true_cl) is to be predicted (the "positive").

assess_clsfyr(score, true_cls, measure = "ACC", threshold = seq(0, 1, by =
  0.1))

Arguments

score	probabilities or scores for the target class 1 ("positive"); scores are assumed to be in $[0, 1]$ and high scores correspond to high probability.
true_cls	vector of indicators for the target class: `TRUE` or `1` if true class is the target class, `FALSE` or `0` else.
measure	a character vector of performance measures to be calculated, see Details.
threshold	threshold for prediction, see `predict.jdify()`.

Value

A data.frame where each column corresponds to one value of threshold. The corresponding values can be found with attr(result, "threshold").

Details

Valid options for measure are

"TP": number of true positives,
"FP": number of false positive,
"TN": number of true negatives,
"FN": number of false negatives,
"TPR", "sensitivity", "recall": true positive rate ($TP / P$),
"FPR", "fall-out": false positive rate ($FP / N$),
"TNR", "specificity": true negative rate ($TN / N$),
"FNR": false negative rate ($FN / P$),
"PRC", "PPV": precision/positive predictive value ($TP / (TP + FP)$,
"NPV": negative predictive value ($TN / (TN + FN)$),
"FDR": false discovery rate ($FP / (TP + FP)$),
"ACC", "accuracy": accuracy ($(TP + TN) / (P + N)$),
"F1": F1 score ($2 * TP / (2 * TP + FP + FN)$),
"MCC": Matthews correlation coefficient $$\frac{(TP * TN - FP * FN)}{[(TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)]^(-1/2)},$$
"informedness": informedness ($TP / P + TN / N - 1$),
"markedness": markedness ($TP / (TP + FP) + TN / (TN + FN) - 1$),
"AUC": area under the curve (must be in first position)

where P and N are the number of positives and negatives, respectively.

Examples

# simulate training and test data
dat <- data.frame(
    cl = as.factor(rbinom(10, 1, 0.5)),
    x1 = rnorm(10),
    x2 = rbinom(10, 1, 0.3)
)

model <- jdify(cl ~ x1 + x2, data = dat)      # joint density fit
probs <- predict(model, dat, what = "probs")  # conditional probabilities

# calculate performance measures
assess_clsfyr(probs[, 1], dat[, 1] == 0, measure = c("ACC", "F1"))
#>    threshold measure     value
#> 1        0.0     ACC 0.6000000
#> 2        0.1     ACC 0.6000000
#> 3        0.2     ACC 0.6000000
#> 4        0.3     ACC 0.6000000
#> 5        0.4     ACC 0.6000000
#> 6        0.5     ACC 0.8000000
#> 7        0.6     ACC 0.6000000
#> 8        0.7     ACC 0.5000000
#> 9        0.8     ACC 0.4000000
#> 10       0.9     ACC 0.4000000
#> 11       1.0     ACC 0.4000000
#> 12       0.0      F1 0.7500000
#> 13       0.1      F1 0.7500000
#> 14       0.2      F1 0.7500000
#> 15       0.3      F1 0.7500000
#> 16       0.4      F1 0.7500000
#> 17       0.5      F1 0.8571429
#> 18       0.6      F1 0.6000000
#> 19       0.7      F1 0.2857143
#> 20       0.8      F1 0.0000000
#> 21       0.9      F1 0.0000000
#> 22       1.0      F1 0.0000000

# calculate area under the curve
FPR <- assess_clsfyr(probs[, 1], dat[, 1] == 0, measure = c("FPR"))$value
TPR <- assess_clsfyr(probs[, 1], dat[, 1] == 0, measure = c("TPR"))$value
get_auc(data.frame(FPR = FPR, TPR = TPR))
#>      [,1]
#> [1,] 0.75

Arguments

Value

Details

See also

Examples

Contents