A.I, Data and Software Engineering

Create bipartite graph from a rating matrix

C

As deep learning on graphs is trending recently, this article will quickly demonstrate how to use networkx to turn rating matrices, such as MovieLens dataset, into graph data.

The rating data

We use rating data from the movie lens. The rating data is loaded into rdata which is a Pandas DataFrame. This article demonstrates how to preprocess movie lens data.

After processing, the rdata should look like this:

userIdmovieIdratingtimestamp
01962423881250949
11863023891717742
2223771878887116
3244512880606923
41663461886397596

Nevertheless, we should avoid confusion between userId and movieId. Therefore, we added the prefix for each id as follow.

rdata['userId'] = 'u' + rdata['userId'].astype(str)
rdata['movieId'] = 'i' + rdata['movieId'].astype(str)
rdata.head()
userIdmovieIdratingtimestamp
0u196i2423881250949
1u186i3023891717742
2u22i3771878887116
3u244i512880606923
4u166i3461886397596

Transform the matrix to a bipartite graph

We will use networkx to create a bipartite undirected weighted graph. It is simple as follows.

import networkx as nx
from networkx import *
#Create a graph
G = nx.Graph()
#Add nodes
G.add_nodes_from(rdata.userId, bipartite=0)
G.add_nodes_from(rdata.movieId, bipartite=1)
#Add weights for edges
G.add_weighted_edges_from([(uId, mId,rating) for (uId, mId, rating)
              in rdata[['userId', 'movieId', 'rating']].to_numpy()])

Get graph properties

First, we can get the basic information about the graph

print(info(G))
#Name: MovieLens Bipartite
#Type: Graph
#Number of nodes: 2625
#Number of edges: 100000
#Average degree:  76.1905

We now can check if the graph is directed, multi-graphs, or bipartite.

G.is_directed(), G.is_multigraph(), is_bipartite(G)
#(False, False, True)

Next, we can get a more detailed insight into this graph.

print("radius: %d" % radius(G))
#radius: 3
print("diameter: %d" % diameter(G))
#diameter: 5
print("eccentricity: %s" % eccentricity(G))
#eccentricity: {'u196': 4, 'u186': 4, 'u22': 4,...}
print("center: %s" % center(G))
#center: ['u6', 'u62', 'u286', 'u200', 'u303',...]
print("periphery: %s" % periphery(G))
#periphery: ['u50', 'u97', 'u284', 'u242',...]
print("density: %s" % density(G))
#density: 0.029036004645760744

Visualize the graph

For better visualization, we first map nodes with two colours:

color_map = []
for node in G.nodes:
  if str(node).startswith('u'):
    color_map.append('yellow')
  else:
    color_map.append('green')

After that, we use networkx to draw the graph, spring and bipartite.

pos = nx.spring_layout(G)
plt.figure(3,figsize=(12,12))
nx.draw(G,pos,node_color=color_map)
plt.show()
graph with spring layout
Draw graph with spring layout

Otherwise, we can use the classical plot for bipartite graph as flow.

X, Y = bipartite.sets(G)
pos = dict()
pos.update( (n, (1, i)) for i, n in enumerate(X) ) # put nodes from X at x=1
pos.update( (n, (2, i)) for i, n in enumerate(Y) ) # put nodes from Y at x=2
nx.draw(G, pos=pos, node_color=color_map)
plt.show()
Bipartite graph of movie lens 100k.

Now, the graph is ready for your learning algorithm.

Add comment

A.I, Data and Software Engineering

PetaMinds focuses on developing the coolest topics in data science, A.I, and programming, and make them so digestible for everyone to learn and create amazing applications in a short time.

Categories