In this article, we review some common metrics and their uses for two main ML problems, i.e. regression and classification.
Most of the blogs have focussed on classification metrics like precision, recall, AUC etc. For a change, I wanted to explore all kinds of metrics including those used in regression as well. MAE and RMSE are the two most popular metrics for continuous variables. Let’s start with the more popular one.
RMSE (Root Mean Square Error)
It represents the sample standard deviation of the differences between predicted values and observed values (called ). Mathematically, it is calculated using this formula:
MAE is the average of the absolute difference between the predicted values and observed value. The MAE is a linear score which means that all the individual differences are weighted equally in the average. Mathematically, it is calculated using this formula:
So which one should you choose and why?
Generally, RMSE will be higher than or equal to MAE. They equal when all the differences are equal or zero (true for case 1 where the difference between actual and predicted is 2 for all observations).
Let’s understand the above statement with the two examples:
Case 1: Actual Values = [2,4,6,8] , Predicted Values = [4,6,8,10]
Case 2: Actual Values = [2,4,6,8] , Predicted Values = [4,6,8,12]
MAE for case 1 = 2.0, RMSE for case 1 = 2.0
MAE for case 2 = 2.5, RMSE for case 2 = 2.65
RMSE is the default metric of many models because loss function defined in terms of RMSE is smoothly differentiable and makes it easier to perform mathematical operations. MAE is robust to outliers whereas RMSE is not.
Metrics for classification
Accuracy in (binary) classification problems is the number of correct predictions made by the model over all kinds predictions made.
>>> from sklearn.metrics import accuracy_score >>> y_pred = [0, 2, 1, 3] >>> y_true = [0, 1, 2, 3] >>> accuracy_score(y_true, y_pred) 0.5 >>> accuracy_score(y_true, y_pred, normalize=False) 2
When to use Accuracy:
Accuracy is a good measure when the target variable classes in the data are nearly balanced.
Ex: 55% classes in our fruits images data are pineapples and 45% are plums. Then, a model which predicts whether a new image is an apple or plum, 97% of times correctly is a very good measure.
When NOT to use Accuracy:
Accuracy should NEVER be used as a measure when the target variable classes in the data are a majority of one class.
For example, in 100 people sample, only 5 people have cancer. If a model has classified those 95 non-cancer patients correctly and 5 cancerous patients as Non-cancerous. The accuracy of such a bad model is also 95%.
Let’s use the same confusion matrix like the one we used before for our cancer detection example.
>>> from sklearn.metrics import precision_score >>> y_true = [0, 1, 2, 0, 1, 2] >>> y_pred = [0, 2, 1, 0, 0, 1] >>> precision_score(y_true, y_pred, average='macro') 0.22... >>> precision_score(y_true, y_pred, average='micro') 0.33... >>> precision_score(y_true, y_pred, average='weighted') 0.22...
Precision is a measure that tells us what proportion of patients that we diagnosed as having cancer, actually had cancer. The predicted positives (People predicted as cancerous are TP and FP) and the people actually having cancer are TP.
Ex: In our cancer example with 100 people, only 5 people have cancer. Let’s say our model is very bad and predicts every case as Cancer. Since we are predicting everyone as having cancer, our denominator(True positives and False Positives) is 100 and the numerator, person having cancer and the model predicting his case as cancer is 5. So in this example, we can say that Precision of such model is 5%.
Recall or Sensitivity:
Recall measures what proportion of patients that actually had cancer was diagnosed by the algorithm as having cancer. The actual positives (People having cancer are TP and FN) and the people diagnosed by the model having cancer are TP.
Note: FN is included because the Person actually had cancer even though the model predicted otherwise
Ex: In our cancer example with 100 people, 5 people actually have cancer. Let’s say that the model predicts every case as cancer. So our denominator (True positives and False Negatives) is 5 and the numerator, person having cancer and the model predicting his case as cancer is also 5(Since we predicted 5 cancer cases correctly). So in this example, we can say that the Recall of such model is 100%. And Precision of such a model(As we saw above) is 5%
Precision or Recall?:
It is clear that recall gives us information about a classifier’s performance with respect to false negatives (how many did we miss), while precision gives us information about its performance with respect to false positives(how many did we caught).
Precision is about being precise. So even if we managed to capture only one cancer case, and we captured it correctly, then we are 100% precise.
Recall is not so much about capturing cases correctly but more about capturing all cases that have “cancer” with the answer as “cancer”. So if we simply always say every case as “cancer”, we have 100% recall.
So basically if we want to focus more on minimising False Negatives, we would want our Recall to be as close to 100% as possible without precision being too bad and if we want to focus on minimising False positives, then our focus should be to make Precision as close to 100% as possible.