A.I, Data and Software Engineering

# squared hinge loss

s

The squared hinge loss is a loss function used for “maximum margin” binary classification problems. Mathematically it is defined as:

$$L(y, \hat{y}) = \sum_{i=0}^{N}\Big(max(0, 1 – y_i \cdotp {\hat{y}}_i)^2\Big)$$

where ŷ the predicted value and y is either 1 or -1. Thus, the squared hinge loss is:

The hinge loss guarantees that, during training, the classifier will find the classification boundary. It is the furthest apart from each of the different classes of data points as possible. In other words, the boundary guarantees the maximum margin between the data points of the different classes.

## When to use Square hinge

Use the Squared Hinge loss function on problems involving yes/no (binary) decisions. Especially, when you’re not interested in knowing how certain the classifier is about the classification. Namely, when you don’t care about the classification probabilities. Use in combination with the tanh() activation function in the last layer (neural network).

A typical application can be classifying email into ‘spam’ and ‘not spam’ and you’re only interested in the classification accuracy.