A.I, Data and Software Engineering

squared hinge loss

s

The squared hinge loss is a loss function used for “maximum margin” binary classification problems. Mathematically it is defined as:

 L(y, \hat{y}) = \sum_{i=0}^{N}\Big(max(0, 1 - y_i \cdotp {\hat{y}}_i)^2\Big)

where ŷ the predicted value and y is either 1 or -1. Thus, the squared hinge loss is:

0* when the true and predicted labels are the same and* when ŷ≥ 1 (which is an indication that the classifier is sure that it’s the correct label)
quadratically increasing with the error* when the true and predicted labels are not the same or
* when ŷ< 1, even when the true and predicted labels are the same (which is an indication that the classifier is not sure that it’s the correct label)
Noteŷ should be the actual numerical output of the classifier and not the predicted label.

The hinge loss guarantees that, during training, the classifier will find the classification boundary. It is the furthest apart from each of the different classes of data points as possible. In other words, the boundary guarantees the maximum margin between the data points of the different classes.

When to use Square hinge

Use the Squared Hinge loss function on problems involving yes/no (binary) decisions. Especially, when you’re not interested in knowing how certain the classifier is about the classification. Namely, when you don’t care about the classification probabilities. Use in combination with the tanh() activation function in the last layer (neural network).

A typical application can be classifying email into ‘spam’ and ‘not spam’ and you’re only interested in the classification accuracy.

Add comment

💬

A.I, Data and Software Engineering

PetaMinds focuses on developing the coolest topics in data science, A.I, and programming, and make them so digestible for everyone to learn and create amazing applications in a short time.

Categories