A.I, Data and Software Engineering

build a simple recommender system with matrix factorization

b

We will build a recommender system which recommends top n items for a user using the matrix factorization technique- one of the three most popular used recommender systems.

matrix factorization

Suppose we have a rating matrix of m users and n items. The rating of user \(u_i\) to item \(i_j\) is \(r_{ij}\).

matrix factorization

Similar to PCA, matrix factorization (MF) technique attempts to decompose a (very) large matrix (\(m \times n\)) to smaller matrices (e.g. \(m\times k \text{ and } k \times \). While PCA requires a matrix with no missing values, MF can overcome that by first filling the missing values.

Latent factors in MF

The two decomposed matrix have smaller dimensions compared to the original one. Before applying MF, you need to choose the value for the dimension k of the decomposed matrices. k is known as the number of latent factors.

The intuition of this is there are some unknown factors (k) that influence the rating of users to items. The good thing is we don’t have to tell what exactly these factors are. MF will use the value of k to generate 2 matrices, aka, user and item embedding matrices.

MF with Keras

We implement MF with Keras and TF.2.0 with Movielens dataset. You can refer to this article for movie lens download and process. In this article, I will reuse some script from that for downloading the dataset.

The datasets’ urls are as follows:

Next, we extract and load data to a data frame:

user_iditem_idratingtimestamp
01962423881250949
11863023891717742
2223771878887116
3244512880606923
41663461886397596

The data set contains 943 users and 1682 items. We can reindex the users and items from 0 (the first index) instead of 1. The original indices will be reduced by one.

user_iditem_idratingtimestamp
01952413881250949
11853013891717742
2213761878887116
3243502880606923
41653451886397596

Next, we create train and test sets with 80% and 20% of the original dataset respectively.

Let say we select the number of latent factors as 20. You may try with other numbers, e.g. 3, 5 or 10.

We compile the model and also monitor two error type, namely, mean absolute error (MAE), and mean squared error (MSE).

The model is summarized as below.

Visualise the model using Keras utils’ plot_model:

Great tool! Now it is time to train our model and log the history:

We now evaluate our model. First, we generate the ratings for each user and item pair on the test set and then we calculate the error.

We have some results from different settings. Remember that the errors are measured based on [1, .., 5] rating scale.

Learnt Embedding

We now can obtain two embedding matrices for users and items.

01234
count1683.0000001683.0000001683.0000001683.0000001683.000000
mean0.7743990.679642-0.7133510.7311470.647028
std0.5040340.4915000.5616790.4645910.519102
min-2.043083-0.980162-3.440306-1.761205-1.063968
25%0.4413130.367185-1.1126360.4255610.278499
50%0.7723260.683421-0.7226070.7235000.656169
75%1.0969931.008840-0.3377751.0200441.019403
max2.9228192.6635511.6647682.3122592.171595

How to Recommend?

I believe beginners will have a doubt about why we are creating these matrices. What is the use of these matrices we have spent so much time understanding?

To recommend top n items to a user \(u_i\) is simple now. We take the embedding vector of the user and do a dot product with all the embedding vectors of movies and get the top n largest values. The following code returns the top 5 most relevant movie ids.

Now, we recommend 5 movies (ids) for user_id=1

Conclusion

This post revisits a simple recommender system with matrix factorization using Keras. Nevertheless, embedding matrices have some negative values. There are some applications which require that the learnt embeddings be non-negative which we will address in another post.

Add comment

A.I, Data and Software Engineering

PetaMinds focuses on developing the coolest topics in data science, A.I, and programming, and make them so digestible for everyone to learn and create amazing applications in a short time.

Pin It on Pinterest

Newsletters

You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

Petaminds will use the information you provide on this form to be in touch with you and to provide updates.