A.I, Data and Software Engineering

Advanced Keras – Custom loss functions


When working on machine learning problems, sometimes you want to construct your own custom loss function(s). This article will introduce abstract Keras backend for that purpose.

Keras loss functions

From Keras loss documentation, there are several built-in loss functions, e.g. mean_absolute_percentage_error, cosine_proximity, kullback_leibler_divergence etc. When compiling a Keras model, we often pass two parameters, i.e. optimizer and loss as strings:

model.compile(optimizer='adam', loss='cosine_proximity')

loss: String (name of objective function) or objective function or Loss instance. Note that if the model has multiple outputs, you can use a different loss on each output by passing a dictionary or a list of losses. The loss value that will be minimized by the model will then be the sum of all individual losses.

Next, we will step by step discover how to create and use custom loss functions. Later, we apply one cost function for predicting fuel efficiency (Miles Per Gallon – MPG) from Auto MPG dataset.

A Simple custom loss function

To keep our very first custom loss function simple, I will use the original “mean square error”, later we will modify it.


Now for the tricky part: Keras loss functions must only take (y_true, y_pred) as parameters. So we need a separate function that returns another functionPython decorator factory. The code below shows that the function my_mse_loss() return another inner function mse(y_true, y_pred):

from keras import backend as K
def my_mse_loss():
    def mse(y_true, y_pred):
        return K.mean(K.square(y_pred - y_true))
    return mse

That is it! Now we can use it while compiling our model.

                metrics=['mae', 'mse'])

A custom loss function with parameters

If you want the loss function to take other parameters, you can pass it to the factory.

def my_mse_loss_b(b):
     def mseb(y_true, y_pred):
         return K.mean(K.square(y_pred - y_true)) + b
     return mseb

Important note: Even Keras and TensorFlow accept numpy arrays, it is highly recommended to keep everything in its kingdom. Specifically, we should try to use the equivalent data type provided by the current library. Try not to mix types!

The following code is NOT recommended!

def my_mse_loss_b(b):
     def mseb(y_true, y_pred):
         a = np.ones_like(y_true) #numpy array here is not recommended
         return K.mean(K.square(y_pred - y_true)) + a
     return mseb

Instead, you should try this:

def my_mse_loss_b(b):
     def mseb(y_true, y_pred):
         a = K.ones_like(y_true) #use Keras instead
         return K.mean(K.square(y_pred - y_true)) + a
     return mseb

More than one loss function in one model

Sometimes, we may need to handle more than one output of our model. Consider the following example:

                        > C  |-->loss1
    +----+    +----+/
 -->| A  |--->| B  |\
    +----+    +----+ \
                        > D  |-->loss2

In the graph, A and B layers share weights. Some models may have only one input layer as the root of the two branches.

  • loss1 will affect A, B, and C.
  • loss2 will affect A, B, and D.

You can read this paper which two loss functions are used for graph embedding or this article for multiple label classification. We will generalize some steps to implement this:

  1. Create a model with n outputs
  2. Create n loss functions
  3. Pass n loss functions while compiling the model as a list or a dictionary.

Example code:

model = Model(inputs=inputs,
               outputs=[branch1, branch2],
def my_loss1(args):
    def loss1(y_true, y_pred):
        return ...
    return loss1
def my_loss2(args):
    def loss2(y_true, y_pred):
        return ...
    return loss2
model.compile(optimizer=opt, loss=[my_lost1(args), my_lost2(args)], loss_weights=lossWeights,

You can also pass a dictionary of loss as long as you assign a name for the layer that you want to apply the loss before you can use the dictionary. For example, we name the output of branch one as b1_output and use it as the key for the dictionary.

def branch1():
    x = Activation(finalAct, name="b1_output")(x)
    return x
model.compile(optimizer=opt, loss={'b1_output':my_lost1(args), 'b2_output':my_lost2(args)}, loss_weights=lossWeights,

Let try it on Auto MPG dataset.

Enable TF2.0 and load data

The ipython is created with Google Colab:

  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
#TensorFlow 2.x selected.

Import libraries

from __future__ import absolute_import, division, print_function, unicode_literals
import pathlib
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Load dataset using keras.utils and load the data to Pandas data frame.

dataset_path = keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
raw_dataset = pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)
dataset = raw_dataset.copy()
MPGCylindersDisplacementHorsepowerWeightAccelerationModel YearOrigin

Clean, split, and normalize data

#Drop rows with unknown values
dataset = dataset.dropna()

The  column "Origin" is really categorical (not numeric). To eliminate the linear relations between them, we convert that to a one-hot:

origin = dataset.pop('Origin')
dataset['USA'] = (origin == 1)*1.0
dataset['Europe'] = (origin == 2)*1.0
dataset['Japan'] = (origin == 3)*1.0
MPGCylindersDisplacementHorsepowerWeightAccelerationModel YearUSAEuropeJapan

Now split the dataset into a training set (80%) and a test set (20%) by setting frac=0.8. We will use the test set in the final evaluation of our model.

train_dataset = dataset.sample(frac=0.8,random_state=0)
test_dataset = dataset.drop(train_dataset.index)

Let visualize the data:

import seaborn as sns
sns.pairplot(train_dataset[["MPG", "Cylinders", "Displacement", "Weight"]], diag_kind="kde")
Pair plot to show relations between variables
Pair plot to show relations between 4 variables: MPG, Cylinders, Displacement and Weights

We separate the target value, or “label”, from the features. This label is the value that you will train the model to predict. It is good practice to normalize features that use different scales and ranges to make training easier.

train_stats = train_dataset.describe()
train_stats = train_stats.transpose()
#Split features from labels
train_labels = train_dataset.pop('MPG')
test_labels = test_dataset.pop('MPG')
def norm(x):
  return (x - train_stats['mean']) / train_stats['std']
normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)

Build a model with custom loss

We use one cost function that we created earlier, i.e. my_mse_loss.

def build_model():
  model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=[len(train_dataset.keys())]),
    layers.Dense(64, activation='relu'),
  optimizer = tf.keras.optimizers.RMSprop(0.001)
                metrics=['mae', 'mse'])
  return model

Build and train the model

model = build_model()
class PrintDot(keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs):
    if epoch % 100 == 0: print('')
    print('.', end='')
EPOCHS = 1000
history = model.fit(
  normed_train_data, train_labels,
  epochs=EPOCHS, validation_split = 0.2, verbose=0,

Visualise the result

def plot_history(history):
  hist = pd.DataFrame(history.history)
  hist['epoch'] = history.epoch
  plt.ylabel('My Mean Square Error [MPG^2]')
  plt.plot(hist['epoch'], hist['mse'],
           label='Train Error')
  plt.plot(hist['epoch'], hist['val_mse'],
           label = 'Val Error')
custom loss MSE


A loss function(s) (or objective function, or optimization score function) is one of the two parameters required to compile a model. You can create customs loss functions for specific purposes alongside built-in ones. In part 2, we will continue with multiple metric functions.



  • Good job! Learned a lot from your sharing.
    Well, I am wondering how to pass different parameters from an array to the custom loss function.
    Let’s say my toy model likes this

    def my_mse_loss_b(b):
    def mseb(y_true, y_pred):
    return K.mean(K.square(y_pred – y_true)) + b
    return mseb

    inputs = Input(shape=(200,))
    x = Dense(128, activation=’relu’)(input)
    x = Dense(200, activation=’relu’)(x)
    outputs = Dense(200, activation=’linear’)(x)
    model = Model(inputs=inputs, outputs=outputs)
    model.compile(loss=my_mse_loss_b(B), optimizer=Adam(lr=0.0005))
    history = model.fit(x_train, y_train, batch_size=64, epochs=2)

    The shape of my x_train is (1000,200,1), y_train is (1000,200), B is (1000,)
    When x_train[0] and y_train[0] pass to the training model, B[0] is passed as b, then x_train[1] and y_train[1] pass to the trainingmodel, B[1] is passed as b
    Is that possible?
    Would you please give me a hint?
    Thanks, have a nice day!

    • When calculating the loss, the params are vectors with same dimens, e.g. y_pred, y_true, and b. The calculation happens at the end of the epoch.

      It means they did not pass 1 by 1 like [x_train1, y_train1, b1]. They first passing x_train, y_train to train the model each epoch, then calculate: y_pred, y_true with b in the custom loss.

A.I, Data and Software Engineering

PetaMinds focuses on developing the coolest topics in data science, A.I, and programming, and make them so digestible for everyone to learn and create amazing applications in a short time.