A.I, Data and Software Engineering

MLP for implicit binary collaborative filtering

M

In this post, we demonstrate Keras implementation of the implicit collaborative filtering. We also introduce some techniques to improve the performance of the current model, including weight initialization, dynamic learning rate, early stopping callback etc.

The implicit data

For demonstration purposes, we use the dataset generated from negative samples using the technique mentioned in this post. The data contain user_id, item_id, and interaction (0-non-interact, 1 – has interact). The transformed dataset looks like this:

user_iditem_idrating
10081369920
217043961731
1470134167880
13892335314830
872014157261
14766942415730
494993022451
16011553414030
1323883048260
77969838921

2,000,000 rows × 3 columns

The MLP collaborative filtering model

The model we are going to build will have two inputs, i.e. users and items. The output will be a value in (0, 1) indicating non-interaction/interaction. The model structure is as below.

multi-layer perceptron matrix factorization
Multi-layer perception for collaborative filtering

To implement this, we first import relevant libraries.$latex $

Inputs and Embedding

Next, we create inputs and embedding layers for users and items by using Input and Embedding layers.

In both embedding layers, we use l2 regularizer and non-negative constraint to reduce overfitting and avoid negative values in embedding.

L2 regularizer adds “squared magnitude” of coefficient as penalty term to the loss function.

$$l2_{penalty} = l2\sum_{i=0}^{n}x_i^2$$

Concatenate and MLP

The idea of MLP is let deep neural network to learn the interaction between users and items.

We build mlp part with several layers. By default, Keras’ Dense layer will be initialized with glorot_uniform. Nevertheless, you can set suitable initializer to improve the training. The last layer has activation function as sigmoid for binary classification. Other layers use relu and being initialized with he_normal.

Finally, we compile the model with adadelta optimizer which is a stochastic gradient descent method that is based on adaptive learning rate per dimension to address two drawbacks:

  • the continual decay of learning rates throughout training
  • the need for a manually selected global learning rate 😀

Data generator

To generate data for training, we use data generator. You can follow this post for more detail.

Train with early stopping callback

Good, it is time for training the model. Another method to reduce overfitting is early stopping. We create and use the callback when fitting the model.

And finally, we have the result:

Wrapping up

The MLP version of collaborative filtering shows very promising result compared to the classical matrix factorization. In the future post, we will fuse the two models, i.e. MF and MLP, into a hybrid one, also known as Neural Collaborative Filtering.

Add comment

A.I, Data and Software Engineering

PetaMinds focuses on developing the coolest topics in data science, A.I, and programming, and make them so digestible for everyone to learn and create amazing applications in a short time.

Pin It on Pinterest

Newsletters

You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

Petaminds will use the information you provide on this form to be in touch with you and to provide updates.