Advanced Keras - Custom loss functions

When working on machine learning problems, sometimes you want to construct your own custom loss function(s). This article will introduce abstract Keras backend for that purpose.

Keras loss functions

From Keras loss documentation, there are several built-in loss functions, e.g. mean_absolute_percentage_error, cosine_proximity, kullback_leibler_divergence etc. When compiling a Keras model, we often pass two parameters, i.e. optimizer and loss as strings:

model.compile(optimizer='adam', loss='cosine_proximity')

loss: String (name of objective function) or objective function or Loss instance. Note that if the model has multiple outputs, you can use a different loss on each output by passing a dictionary or a list of losses. The loss value that will be minimized by the model will then be the sum of all individual losses.

Next, we will step by step discover how to create and use custom loss functions. Later, we apply one cost function for predicting fuel efficiency (Miles Per Gallon – MPG) from Auto MPG dataset.

A Simple custom loss function

To keep our very first custom loss function simple, I will use the original “mean square error”, later we will modify it.

${MSE}=\frac{1}{n}\sum_{i=1}^n(Y_i-\hat{Y_i})^2$

Now for the tricky part: Keras loss functions must only take (y_true, y_pred) as parameters. So we need a separate function that returns another function – Python decorator factory. The code below shows that the function my_mse_loss() return another inner function mse(y_true, y_pred):

from keras import backend as K
def my_mse_loss():
    def mse(y_true, y_pred):
        return K.mean(K.square(y_pred - y_true))
    return mse

That is it! Now we can use it while compiling our model.

  model.compile(loss=my_mse_loss(),
                optimizer=optimizer,
                metrics=['mae', 'mse'])

A custom loss function with parameters

If you want the loss function to take other parameters, you can pass it to the factory.

def my_mse_loss_b(b):
     def mseb(y_true, y_pred):
         return K.mean(K.square(y_pred - y_true)) + b
     return mseb

Important note: Even Keras and TensorFlow accept numpy arrays, it is highly recommended to keep everything in its kingdom. Specifically, we should try to use the equivalent data type provided by the current library. Try not to mix types!

The following code is NOT recommended!

def my_mse_loss_b(b):
     def mseb(y_true, y_pred):
         ...
         a = np.ones_like(y_true) #numpy array here is not recommended
         return K.mean(K.square(y_pred - y_true)) + a
     return mseb

Instead, you should try this:

def my_mse_loss_b(b):
     def mseb(y_true, y_pred):
         ...
         a = K.ones_like(y_true) #use Keras instead
         return K.mean(K.square(y_pred - y_true)) + a
     return mseb

More than one loss function in one model

Sometimes, we may need to handle more than one output of our model. Consider the following example:

                        +----+
                        > C  |-->loss1
                       /+----+
                      /
                     /
    +----+    +----+/
 -->| A  |--->| B  |\
    +----+    +----+ \
                      \
                       \+----+
                        > D  |-->loss2
                        +----+

In the graph, A and B layers share weights. Some models may have only one input layer as the root of the two branches.

loss1 will affect A, B, and C.
loss2 will affect A, B, and D.

You can read this paper which two loss functions are used for graph embedding or this article for multiple label classification. We will generalize some steps to implement this:

Create a model with n outputs
Create n loss functions
Pass n loss functions while compiling the model as a list or a dictionary.

Example code:

model = Model(inputs=inputs,
               outputs=[branch1, branch2],
          	name="fashionnet")

def my_loss1(args):
    def loss1(y_true, y_pred):
        return ...
    return loss1
def my_loss2(args):
    def loss2(y_true, y_pred):
        return ...
    return loss2

model.compile(optimizer=opt, loss=[my_lost1(args), my_lost2(args)], loss_weights=lossWeights,
	metrics=["accuracy"])

You can also pass a dictionary of loss as long as you assign a name for the layer that you want to apply the loss before you can use the dictionary. For example, we name the output of branch one as b1_output and use it as the key for the dictionary.

def branch1():
    ...
    x = Activation(finalAct, name="b1_output")(x)
    return x

model.compile(optimizer=opt, loss={'b1_output':my_lost1(args), 'b2_output':my_lost2(args)}, loss_weights=lossWeights,
	metrics=["accuracy"])

Let try it on Auto MPG dataset.

Enable TF2.0 and load data

The ipython is created with Google Colab:

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
print(tf.__version__)
#TensorFlow 2.x selected.
#2.0.0

Import libraries

from __future__ import absolute_import, division, print_function, unicode_literals
import pathlib
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Load dataset using keras.utils and load the data to Pandas data frame.

dataset_path = keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
raw_dataset = pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)
dataset = raw_dataset.copy()
dataset.tail()

	MPG	Cylinders	Displacement	Horsepower	Weight	Acceleration	Model Year	Origin
393	27.0	4	140.0	86.0	2790.0	15.6	82	1
394	44.0	4	97.0	52.0	2130.0	24.6	82	2
395	32.0	4	135.0	84.0	2295.0	11.6	82	1
396	28.0	4	120.0	79.0	2625.0	18.6	82	1
397	31.0	4	119.0	82.0	2720.0	19.4	82	1

Clean, split, and normalize data

#Drop rows with unknown values
dataset = dataset.dropna()

The column "Origin" is really categorical (not numeric). To eliminate the linear relations between them, we convert that to a one-hot:

origin = dataset.pop('Origin')
dataset['USA'] = (origin == 1)*1.0
dataset['Europe'] = (origin == 2)*1.0
dataset['Japan'] = (origin == 3)*1.0
dataset.tail()

	MPG	Cylinders	Displacement	Horsepower	Weight	Acceleration	Model Year	USA	Europe
393	27.0	4	140.0	86.0	2790.0	15.6	82	1.0	0.0
394	44.0	4	97.0	52.0	2130.0	24.6	82	0.0	1.0
395	32.0	4	135.0	84.0	2295.0	11.6	82	1.0	0.0
396	28.0	4	120.0	79.0	2625.0	18.6	82	1.0	0.0
397	31.0	4	119.0	82.0	2720.0	19.4	82	1.0	0.0

Now split the dataset into a training set (80%) and a test set (20%) by setting frac=0.8. We will use the test set in the final evaluation of our model.

train_dataset = dataset.sample(frac=0.8,random_state=0)
test_dataset = dataset.drop(train_dataset.index)

Let visualize the data:

import seaborn as sns
sns.pairplot(train_dataset[["MPG", "Cylinders", "Displacement", "Weight"]], diag_kind="kde")

Pair plot to show relations between variables — Pair plot to show relations between 4 variables: **MPG, Cylinders, Displacement and Weights**

We separate the target value, or “label”, from the features. This label is the value that you will train the model to predict. It is good practice to normalize features that use different scales and ranges to make training easier.

train_stats = train_dataset.describe()
train_stats.pop("MPG")
train_stats = train_stats.transpose()
#Split features from labels
train_labels = train_dataset.pop('MPG')
test_labels = test_dataset.pop('MPG')
def norm(x):
  return (x - train_stats['mean']) / train_stats['std']
normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)

Build a model with custom loss

We use one cost function that we created earlier, i.e. my_mse_loss.

def build_model():
  model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=[len(train_dataset.keys())]),
    layers.Dense(64, activation='relu'),
    layers.Dense(1)
  ])
  optimizer = tf.keras.optimizers.RMSprop(0.001)
  model.compile(loss=my_mse_loss(),
                optimizer=optimizer,
                metrics=['mae', 'mse'])
  return model

Build and train the model

model = build_model()
class PrintDot(keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs):
    if epoch % 100 == 0: print('')
    print('.', end='')
EPOCHS = 1000
history = model.fit(
  normed_train_data, train_labels,
  epochs=EPOCHS, validation_split = 0.2, verbose=0,
  callbacks=[PrintDot()])

Visualise the result

def plot_history(history):
  hist = pd.DataFrame(history.history)
  hist['epoch'] = history.epoch
  plt.figure()
  plt.xlabel('Epoch')
  plt.ylabel('My Mean Square Error []')
  plt.plot(hist['epoch'], hist['mse'],
           label='Train Error')
  plt.plot(hist['epoch'], hist['val_mse'],
           label = 'Val Error')
  plt.ylim([0,20])
  plt.legend()
  plt.show()
plot_history(history)

Conclusion

A loss function(s) (or objective function, or optimization score function) is one of the two parameters required to compile a model. You can create customs loss functions for specific purposes alongside built-in ones. In part 2, we will continue with multiple metric functions.

cost function custom loss K keras keras backend loss function neural network tensorflow

Cancel reply

Anonymous says:
October 24, 2019 at 4:40 pm

Thank you a Tung! Super good sharing!
- tungnd says:
  October 24, 2019 at 7:23 pm
  
  You are welcome! 🙂
David Signh says:
November 8, 2019 at 9:28 am

That is what I am looking for. Really struggling with this. Thanks for sharing.
SUN says:
March 24, 2022 at 5:08 am

Good job! Learned a lot from your sharing.
Well, I am wondering how to pass different parameters from an array to the custom loss function.
Let’s say my toy model likes this

def my_mse_loss_b(b):
def mseb(y_true, y_pred):
return K.mean(K.square(y_pred – y_true)) + b
return mseb

inputs = Input(shape=(200,))
x = Dense(128, activation=’relu’)(input)
x = Dense(200, activation=’relu’)(x)
outputs = Dense(200, activation=’linear’)(x)
model = Model(inputs=inputs, outputs=outputs)
model.compile(loss=my_mse_loss_b(B), optimizer=Adam(lr=0.0005))
history = model.fit(x_train, y_train, batch_size=64, epochs=2)

The shape of my x_train is (1000,200,1), y_train is (1000,200), B is (1000,)
When x_train[0] and y_train[0] pass to the training model, B[0] is passed as b, then x_train[1] and y_train[1] pass to the trainingmodel, B[1] is passed as b
Is that possible?
Would you please give me a hint?
Thanks, have a nice day!
- Tung Nguyen says:
  March 26, 2022 at 2:47 pm
  
  When calculating the loss, the params are vectors with same dimens, e.g. y_pred, y_true, and b. The calculation happens at the end of the epoch.
  
  It means they did not pass 1 by 1 like [x_train1, y_train1, b1]. They first passing x_train, y_train to train the model each epoch, then calculate: y_pred, y_true with b in the custom loss.

Advanced Keras – Custom loss functions

Keras loss functions

A Simple custom loss function

A custom loss function with parameters

More than one loss function in one model

Enable TF2.0 and load data

Clean, split, and normalize data

Build a model with custom loss

Build and train the model

Visualise the result

Conclusion

5 comments

Cancel reply

Keras loss functions

A Simple custom loss function

A custom loss function with parameters

More than one loss function in one model

Enable TF2.0 and load data

Clean, split, and normalize data

Build a model with custom loss

Build and train the model

Visualise the result

Conclusion

5 comments

Cancel reply

Read more

Categories