MLP for implicit binary collaborative filtering

In this post, we demonstrate Keras implementation of the implicit collaborative filtering. We also introduce some techniques to improve the performance of the current model, including weight initialization, dynamic learning rate, early stopping callback etc.

The implicit data

For demonstration purposes, we use the dataset generated from negative samples using the technique mentioned in this post. The data contain user_id, item_id, and interaction (0-non-interact, 1 – has interact). The transformed dataset looks like this:

	user_id	item_id	rating
100813	6	992	0
21704	396	173	1
147013	416	788	0
138923	353	1483	0
87201	415	726	1
…	…	…	…
147669	424	1573	0
49499	302	245	1
160115	534	1403	0
132388	304	826	0
77969	838	92	1

2,000,000 rows × 3 columns

The MLP collaborative filtering model

The model we are going to build will have two inputs, i.e. users and items. The output will be a value in (0, 1) indicating non-interaction/interaction. The model structure is as below.

multi-layer perceptron matrix factorization — Multi-layer perception for collaborative filtering

To implement this, we first import relevant libraries.

%tensorflow_version 2.x
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras.layers import Input, Dense, Concatenate, Embedding, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import Sequence
from tensorflow.keras.regularizers import l2, l1, l1_l2
from tensorflow.keras.initializers import RandomUniform, he_normal,he_uniform
import math

Inputs and Embedding

Next, we create inputs and embedding layers for users and items by using Input and Embedding layers.

n_users, n_movies = len(dataset.user_id.unique()), len(dataset.item_id.unique())
#movie input and embedding
movie_input = keras.layers.Input(shape=[1],name='Item')
movie_embedding = keras.layers.Embedding(n_movies, n_latent_factors,
                                          embeddings_initializer=embedding_init,
                                          embeddings_regularizer=l2(1e-6),
                                          embeddings_constraint='NonNeg',
                                          name='Movie-Embedding')(movie_input)
movie_vec = keras.layers.Flatten(name='FlattenMovies')(movie_embedding)
#user input and embedding
user_input = keras.layers.Input(shape=[1],name='User')
user_embedding = keras.layers.Embedding(n_users, n_latent_factors,
                                          embeddings_initializer=embedding_init,
                                          embeddings_regularizer=l2(1e-6),
                                          embeddings_constraint='NonNeg',
                                          name='User-Embedding')(user_input)
user_vec = keras.layers.Flatten(name='FlattenUsers')(user_embedding)

In both embedding layers, we use l2 regularizer and non-negative constraint to reduce overfitting and avoid negative values in embedding.

L2 regularizer adds “squared magnitude” of coefficient as penalty term to the loss function.

$l2_{penalty} = l2\sum_{i=0}^{n}x_i^2$

Concatenate and MLP

The idea of MLP is let deep neural network to learn the interaction between users and items.

concat = keras.layers.concatenate([movie_vec, user_vec])
mlp = concat
for i in range(3,-1,-1):
    if i == 0:
      mlp = Dense(8**i, activation='sigmoid', kernel_initializer='glorot_normal',
                  name="output")(mlp)
    else:
      mlp = Dense(8*2**i, activation='relu', kernel_initializer='he_uniform')(mlp)
      if i > 2:
        mlp = BatchNormalization()(mlp)
        mlp = Dropout(0.2)(mlp)
model = Model(inputs=[user_input, movie_input], outputs=[mlp])
model.compile(optimizer='adadelta', loss='binary_crossentropy', metrics=['binary_accuracy'])

We build mlp part with several layers. By default, Keras’ Dense layer will be initialized with glorot_uniform. Nevertheless, you can set suitable initializer to improve the training. The last layer has activation function as sigmoid for binary classification. Other layers use relu and being initialized with he_normal.

Finally, we compile the model with adadelta optimizer which is a stochastic gradient descent method that is based on adaptive learning rate per dimension to address two drawbacks:

the continual decay of learning rates throughout training
the need for a manually selected global learning rate 😀

Data generator

To generate data for training, we use data generator. You can follow this post for more detail.

class DataGenerator(Sequence):
    def __init__(self, dataframe, batch_size=16, shuffle=True):
        'Initialization'
        self.batch_size = batch_size
        self.dataframe = dataframe
        self.shuffle = shuffle
        self.indices = dataframe.index
        print(len(self.indices))
        self.on_epoch_end()
    def __len__(self):
        'Denotes the number of batches per epoch'
        return math.floor(len(self.dataframe) / self.batch_size)
    def __getitem__(self, index):
        'Generate one batch of data'
        # Generate indexes of the batch
        idxs = [i for i in range(index*self.batch_size,(index+1)*self.batch_size)]
        #print(idxs)
        # Find list of IDs
        list_IDs_temp = [self.indices[k] for k in idxs]
        # Generate data
        User = self.dataframe.iloc[list_IDs_temp,[0]].to_numpy().reshape(-1)
        Item = self.dataframe.iloc[list_IDs_temp,[1]].to_numpy().reshape(-1)
        rating = self.dataframe.iloc[list_IDs_temp,[2]].to_numpy().reshape(-1)
        #print("u,i,r:", [User, Item],[y])
        return [User, Item],[rating]
    def on_epoch_end(self):
        'Updates indexes after each epoch'
        self.indices = np.arange(len(self.dataframe))
        if self.shuffle == True:
            np.random.shuffle(self.indices)

Train with early stopping callback

Good, it is time for training the model. Another method to reduce overfitting is early stopping. We create and use the callback when fitting the model.

early_stop = EarlyStopping(monitor='val_output_loss', min_delta = 0.0001, patience=10)
traindatagenerator = DataGenerator(train, rating_matrix, shuffle=False)
val_generator = DataGenerator(val, rating_matrix, shuffle=False)
history = model.fit(traindatagenerator, validation_data=val_generator, epochs=200, verbose=2, callbacks=[early_stop])

And finally, we have the result:

Wrapping up

The MLP version of collaborative filtering shows very promising result compared to the classical matrix factorization. In the future post, we will fuse the two models, i.e. MF and MLP, into a hybrid one, also known as Neural Collaborative Filtering.

collaborative filtering keras matrix factorization MLP multi-layer perceptron

MLP for implicit binary collaborative filtering

The implicit data

The MLP collaborative filtering model

Inputs and Embedding

Concatenate and MLP

Data generator

Train with early stopping callback

Wrapping up

Add comment

Cancel reply

The implicit data

The MLP collaborative filtering model

Inputs and Embedding

Concatenate and MLP

Data generator

Train with early stopping callback

Wrapping up

Add comment

Cancel reply

Read more

Categories