A.I, Data and Software Engineering

MLP for implicit binary collaborative filtering

M

In this post, we demonstrate Keras implementation of the implicit collaborative filtering. We also introduce some techniques to improve the performance of the current model, including weight initialization, dynamic learning rate, early stopping callback etc.

The implicit data

For demonstration purposes, we use the dataset generated from negative samples using the technique mentioned in this post. The data contain user_id, item_id, and interaction (0-non-interact, 1 – has interact). The transformed dataset looks like this:

user_iditem_idrating
10081369920
217043961731
1470134167880
13892335314830
872014157261
14766942415730
494993022451
16011553414030
1323883048260
77969838921

2,000,000 rows × 3 columns

The MLP collaborative filtering model

The model we are going to build will have two inputs, i.e. users and items. The output will be a value in (0, 1) indicating non-interaction/interaction. The model structure is as below.

multi-layer perceptron matrix factorization
Multi-layer perception for collaborative filtering

To implement this, we first import relevant libraries.

%tensorflow_version 2.x
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras.layers import Input, Dense, Concatenate, Embedding, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import Sequence
from tensorflow.keras.regularizers import l2, l1, l1_l2
from tensorflow.keras.initializers import RandomUniform, he_normal,he_uniform
import math

Inputs and Embedding

Next, we create inputs and embedding layers for users and items by using Input and Embedding layers.

n_users, n_movies = len(dataset.user_id.unique()), len(dataset.item_id.unique())
#movie input and embedding
movie_input = keras.layers.Input(shape=[1],name='Item')
movie_embedding = keras.layers.Embedding(n_movies, n_latent_factors,
                                          embeddings_initializer=embedding_init,
                                          embeddings_regularizer=l2(1e-6),
                                          embeddings_constraint='NonNeg',
                                          name='Movie-Embedding')(movie_input)
movie_vec = keras.layers.Flatten(name='FlattenMovies')(movie_embedding)
#user input and embedding
user_input = keras.layers.Input(shape=[1],name='User')
user_embedding = keras.layers.Embedding(n_users, n_latent_factors,
                                          embeddings_initializer=embedding_init,
                                          embeddings_regularizer=l2(1e-6),
                                          embeddings_constraint='NonNeg',
                                          name='User-Embedding')(user_input)
user_vec = keras.layers.Flatten(name='FlattenUsers')(user_embedding)

In both embedding layers, we use l2 regularizer and non-negative constraint to reduce overfitting and avoid negative values in embedding.

L2 regularizer adds “squared magnitude” of coefficient as penalty term to the loss function.

    \[l2_{penalty} = l2\sum_{i=0}^{n}x_i^2\]

Concatenate and MLP

The idea of MLP is let deep neural network to learn the interaction between users and items.

concat = keras.layers.concatenate([movie_vec, user_vec])
mlp = concat
for i in range(3,-1,-1):
    if i == 0:
      mlp = Dense(8**i, activation='sigmoid', kernel_initializer='glorot_normal',
                  name="output")(mlp)
    else:
      mlp = Dense(8*2**i, activation='relu', kernel_initializer='he_uniform')(mlp)
      if i > 2:
        mlp = BatchNormalization()(mlp)
        mlp = Dropout(0.2)(mlp)
model = Model(inputs=[user_input, movie_input], outputs=[mlp])
model.compile(optimizer='adadelta', loss='binary_crossentropy', metrics=['binary_accuracy'])

We build mlp part with several layers. By default, Keras’ Dense layer will be initialized with glorot_uniform. Nevertheless, you can set suitable initializer to improve the training. The last layer has activation function as sigmoid for binary classification. Other layers use relu and being initialized with he_normal.

Finally, we compile the model with adadelta optimizer which is a stochastic gradient descent method that is based on adaptive learning rate per dimension to address two drawbacks:

  • the continual decay of learning rates throughout training
  • the need for a manually selected global learning rate 😀

Data generator

To generate data for training, we use data generator. You can follow this post for more detail.

class DataGenerator(Sequence):
    def __init__(self, dataframe, batch_size=16, shuffle=True):
        'Initialization'
        self.batch_size = batch_size
        self.dataframe = dataframe
        self.shuffle = shuffle
        self.indices = dataframe.index
        print(len(self.indices))
        self.on_epoch_end()
    def __len__(self):
        'Denotes the number of batches per epoch'
        return math.floor(len(self.dataframe) / self.batch_size)
    def __getitem__(self, index):
        'Generate one batch of data'
        # Generate indexes of the batch
        idxs = [i for i in range(index*self.batch_size,(index+1)*self.batch_size)]
        #print(idxs)
        # Find list of IDs
        list_IDs_temp = [self.indices[k] for k in idxs]
        # Generate data
        User = self.dataframe.iloc[list_IDs_temp,[0]].to_numpy().reshape(-1)
        Item = self.dataframe.iloc[list_IDs_temp,[1]].to_numpy().reshape(-1)
        rating = self.dataframe.iloc[list_IDs_temp,[2]].to_numpy().reshape(-1)
        #print("u,i,r:", [User, Item],[y])
        return [User, Item],[rating]
    def on_epoch_end(self):
        'Updates indexes after each epoch'
        self.indices = np.arange(len(self.dataframe))
        if self.shuffle == True:
            np.random.shuffle(self.indices)

Train with early stopping callback

Good, it is time for training the model. Another method to reduce overfitting is early stopping. We create and use the callback when fitting the model.

early_stop = EarlyStopping(monitor='val_output_loss', min_delta = 0.0001, patience=10)
traindatagenerator = DataGenerator(train, rating_matrix, shuffle=False)
val_generator = DataGenerator(val, rating_matrix, shuffle=False)
history = model.fit(traindatagenerator, validation_data=val_generator, epochs=200, verbose=2, callbacks=[early_stop])

And finally, we have the result:

Wrapping up

The MLP version of collaborative filtering shows very promising result compared to the classical matrix factorization. In the future post, we will fuse the two models, i.e. MF and MLP, into a hybrid one, also known as Neural Collaborative Filtering.

Add comment

💬

A.I, Data and Software Engineering

PetaMinds focuses on developing the coolest topics in data science, A.I, and programming, and make them so digestible for everyone to learn and create amazing applications in a short time.

Categories