A.I, Data and Software Engineering

Generate data on the fly – Keras data generator


Previously, we train our model using the pre-generated dataset, for example, in the recommender system or recurrent neural network. In this article, we will demonstrate using a generator to produce data on the fly for training a model.

Keras Data Generator with Sequence

There are a couple of ways to create a data generator. However, Tensorflow Keras provides a base class to fit dataset as a sequence.

To create our own data generator, we need to subclass tf.keras.utils.Sequence and  must implement the __getitem__ and the __len__ methods.

A generator should return a batch including (input, output) for training. This can be achieved by modify the method __getitem__. The scaffold would look like this.

If you want to modify your dataset between epochs you may implement on_epoch_end.

Train with a generator

After creating a generator, you have two options. One is to use fit_generator method of Keras model. For example:

As the method is deprecated, we can use the same fit as model.fit.

Remember that, when x is a generator, then we leave y untouch as the output should be included in the batch generated by the generator as shown from the flowing docstring.

python demonstration

Supposed that we have a recommender model from this post. Now we create a data generator for training. We use movie lens dataset, you can refer to this post for downloading and parsing the data to a Panda dataframe.

Below is the complete generator class.

Some important note:

  • The recommender model takes 2 inputs and produces 1 output. Therefore, when coding __getitem__ method, you should return the batch that also contains 2 inputs and 1 output.
  • Do not mistakenly use tf.math.ceil in the __len__ method as it is different from math.ceil

Finally, the training is straightforward.


While Keras provides data generators, they also have limitations. One of the reasons is that every task is needs a different data loader. Sometimes every image has one mask and some times several, sometimes the mask is saved as an image and sometimes it encoded, etc… For every task, we will probably need to tweak our data generator but the structure will stay the same.

Add comment

A.I, Data and Software Engineering

PetaMinds focuses on developing the coolest topics in data science, A.I, and programming, and make them so digestible for everyone to learn and create amazing applications in a short time.

Pin It on Pinterest


You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

Petaminds will use the information you provide on this form to be in touch with you and to provide updates.