1. 程式人生 > >Metrics to measure machine learning model performance

Metrics to measure machine learning model performance

How to read: It depends on what you want to measure. For accuracy, a value closer to 1 (or 100%) is better.

Gains charts

This metric measures how a model performs in comparison to a situation where no model is used. It considers three different parts: a model baseline (the results you obtain without a predictive model), a model result

(the results obtained after you applied your model) and the perfect CAP (a hypothetical situation where you have the perfect results).

To be considered useful, your model has to perform better than the baseline. We measure it by checking the distance between the baseline and the model. The bigger the area between the lift curve and the baseline, the better the model.

How to read: A higher distance from baseline is better than a lesser ditance.

Kolmogorov Smirnov chart

The K-S measures how good is a model when trying to predict different classes. Supposing we have two classes — one positive and another negative — the K-S is perfect if the model can differentiate all the positive samples from all the negative samples.

The K-S metric measures the distance between the plotted cumulative distribution of the classes, taking into account the biggest vertical distance between the two classes.

How to read: In most cases, K-S will be a number between 0 and 100, being 100 the best possible value.

AUC-ROC and Gini Coefficient

To find the AUC-ROC, we plot the Sensitivity and the Specificity (1-specificity). Then, we calculate the area under the curve (AUC) and the ratio between the AUC and the total possible area. We obtain a number between 0 and 1, where 1 is our best possible value (attention to overfitting here).

The Gini coefficient is derived from the AUC-ROC. In this way:

Gini = 2*AUC — 1

How to read: to be considered a good model, we must have a Gini above 60%.

Logarithmic loss

To use log loss, your model has to be able to output a probability between 0 and 1. This metric will then compare the distance between the probability outputted by the model with the actual label. So, the log loss takes into account the uncertainty of a prediction based on how much it varies from the actual label.

A good log loss value is the one that is closer to 0, being the best possible model the one with log loss of 0. The log loss value increases when the prediction is different from the actual observation. So, a prediction of 0.42 for a label 1 is better than a prediction of 0.12 for the same label.

How to read: Values closer to 0 are better than higher values.

Hamming Loss

The Hamming loss is a metric used to evaluate the fraction of the wrong labels in comparison to the total number of labels. Since this is also a loss function, the best possible value is 0. It can also be written as 1-Accuracy and can be used for multi-class problems.

How to read: Values closer to 0 are better than higher values.

Concordance and Discordance

The Concordance and Discordance metrics compare the predicted probabilities outputted by the model with the observed cases. All possible pairs of observations are created on the data set and they are then compared.

We say that a concordant pair is the one where the predicted probability for the observed target with value 1 is higher than that of target with value 0. A discordant pair is the one where the predicted probability for the observed target with value 0 is higher than that of target with value 1. But we can also have ties. This happens when the predicted probability for the observed target with value 0 is equal to that of target with value 1.

To find the concordance, we divide all concordant combinations by the total of combinations. To find the discordance, you just have to divide the discordant combinations by the total of combinations.

How to read: Best possible concordance is 100% (or 1). So, higher values are better than lower values.

Matthews correlation coefficient

Used in binary classification problems, it takes into account true and false positives and negatives. This is a measure of quality that can be used even for situations when we have unbalanced classes.

The MCC returns a value between −1 and +1, where +1 is the best possible performance for a model. A score of 0 reveals that the model is no better than random prediction. A negative score indicates a poor perfomance.

How to read: Values closer to +1 are better values closer to -1.