squared hinge loss

The squared hinge loss is a loss function used for “maximum margin” binary classification problems. Mathematically it is defined as:

$L(y, \hat{y}) = \sum_{i=0}^{N}\Big(max(0, 1 - y_i \cdotp {\hat{y}}_i)^2\Big)$

where ŷ the predicted value and y is either 1 or -1. Thus, the squared hinge loss is:

0	* when the true and predicted labels are the same and* when ŷ≥ 1 (which is an indication that the classifier is sure that it’s the correct label)
quadratically increasing with the error	* when the true and predicted labels are not the same or * when ŷ< 1, even when the true and predicted labels are the same (which is an indication that the classifier is not sure that it’s the correct label)

Note	ŷ should be the actual numerical output of the classifier and not the predicted label.

The hinge loss guarantees that, during training, the classifier will find the classification boundary. It is the furthest apart from each of the different classes of data points as possible. In other words, the boundary guarantees the maximum margin between the data points of the different classes.

When to use Square hinge

Use the Squared Hinge loss function on problems involving yes/no (binary) decisions. Especially, when you’re not interested in knowing how certain the classifier is about the classification. Namely, when you don’t care about the classification probabilities. Use in combination with the tanh() activation function in the last layer (neural network).

A typical application can be classifying email into ‘spam’ and ‘not spam’ and you’re only interested in the classification accuracy.

squared hinge loss

When to use Square hinge

Add comment

Cancel reply

When to use Square hinge

Add comment

Cancel reply

Read more

Categories