A.I, Data and Software Engineering

Sparse Matrices for Machine Learning quick note


In machine learning, many matrices are sparse. It is essential to know how to handle this kind of matrix.

Sparse vs Dense Matrix

First, it is good to know that sparse matrix looks similar to a normal matrix, with rows, columns or other indexes. But a sparse matrix is comprised of mostly zero (0s) values. They are distinct from dense matrices with mostly non-zero values.

A matrix is sparse if many of its coefficients are zero. The interest in sparsity arises because its exploitation can lead to enormous computational savings and because many large matrix problems that occur in practice are sparse.

The sparsity of a matrix can be quantified with a score, which is the number of zero values in the matrix divided by the total number of elements in the matrix.

$$ sparsity = \frac{zero \space elements}{total \space elements}$$

Below is an example of a small \(3 \times 6\) sparse matrix.

The example has 13 zero values of the 18 elements in the matrix, giving this matrix a sparsity score of 0.722 or about 72%.

Problems with Sparsity

Sparse matrices can cause problems with regards to space and time complexity.

Space Complexity

Very large matrices require a lot of memory, and some very large matrices that we wish to work with are sparse. In practice, most large matrices are sparse — almost all entries are zeros.

In a previous article, we measured the sizes of different objects including a sparse matrix.

An example of a very large matrix is a link matrix from one website to another. The matrix contained is sparse with many more zero values than data values. The problem with representing these sparse matrices as dense matrices is the allocation of each 32-bit or even 64-bit zero value in the matrix including those zero values which do not contain any information.

Time Complexity

Assuming a very large sparse matrix can fit into memory, we will want to perform operations on this matrix.

Simply, if the matrix contains mostly zero-values, i.e. no data, then performing operations across this matrix may take a long time where the bulk of the computation performed will involve adding or multiplying zero values together.

This is a problem of increased time complexity of matrix operations that increases with the size of the matrix. This problem is compounded since even trivial machine learning methods may require many operations on each row, column. Consequently, it can be resulting in vastly longer execution times.

Sparse Matrices in Machine Learning

Sparse matrices turn up a lot in applied machine learning. Now, we will look at some common examples to motivate you to be aware of the issues of sparsity.


Sparse matrices come up in some specific types of data, most notably observations that record the occurrence or count of an activity. Some examples can be:

  • Whether or not a user has watched a movie in a movie catalogue
  • Count of the number of listens of a song in a song catalogue.

Data Preparation

Sparse matrices come up in encoding schemes used in the preparation of data. Three common scheme include:

  • One-hot encoding, used to represent categorical data as sparse binary vectors.
  • Count encoding, used to represent the frequency of words in a vocabulary for a document
  • TF-IDF encoding, used to represent normalized word frequency scores in a vocabulary.

Areas of Study

Some areas of study within machine learning must develop specialized methods to address sparsity directly as the input data is almost always sparse. If there are 100,000 words in the language model, then the feature vector has length 100,000. But for a short email message, almost all the features will have count zero.

Three examples include:

  • Natural language processing for working with documents of text.
  • Recommender systems for working with product usage within a catalogue.
  • Computer vision when working with images that contain lots of black pixels.

Working with Sparse Matrices

The solution to representing and working with sparse matrices is to use an alternate data structure to represent the sparse data. The zero values can be ignored and only the data or non-zero values in the sparse matrix need to be stored or acted upon.

There are multiple data structures that can be used to efficiently construct a sparse matrix; such as Dictionary of Keys ( maps a row and column index to a value), List of Lists (stores each row as a list, with each sublist containing the column index and the value), or Coordinate List (stores a list of tuple with each tuple containing the row index, column index, and the value)

There are also data structures that are more suitable for performing efficient operations; two common examples are CSR and CSC.

  • Compressed Sparse Row. represents sparse matrix using three one-dimensional arrays for the non-zero values, the extents of the rows, and the column indexes.
  • Compressed Sparse Column. Similar to CSR method except the column indices are compressed and read first before the row indices.

We use the Compressed Sparse Row, aka CSR for short, to represent sparse matrices in machine learning for the efficient access and matrix multiplication that it supports.

Sparse Matrices in Python

SciPy provides tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. Many linear algebra NumPy and SciPy functions that operate on NumPy arrays can transparently operate on SciPy sparse arrays. Further, machine learning libraries that use NumPy data structures can also operate transparently on SciPy sparse arrays, such as scikit-learn for general machine learning and Keras for deep learning.

A dense matrix stored in a NumPy array can be converted into a sparse matrix using the CSR representation by calling the csr_matrix() function.

In the example below, we define a 3 x 6 sparse matrix as a dense array, convert it to a CSR sparse representation and then convert it back to a dense array by calling the todense() function.

Running the example first prints the defined dense array, followed by the CSR representation, and then the reconstructed dense matrix.

NumPy does not provide a function to calculate the sparsity of a matrix. Nevertheless, we can calculate it easily by first finding the density of the matrix and subtracting it from one.

The count_nonzero() function returns the number of non-zero elements in a NumPy array. Also, the size property of the array contains the total number of elements in the array. Therefore, we can compute the array sparsity as

$$sparsity = 1 – \frac{\text{count_nonzero}(A)}{A.size}$$

The example below demonstrates how to calculate the sparsity of an array.

Running the example first prints the defined sparse matrix followed by the sparsity of the matrix.

CSR/CSC operators

Now we try to perform some common operators on compressed sparse row matrices to see what happen.

First, we create a dense matrix (10 x 10) and convert it to CSR matrix. We have another rand matrix of size 10×1 to perform operations with the sparse matrix.

Next, we test with add (subtract) with the CSR matrix.

Everything seems normal and the results are all CSR. Now, we multiply a sparse matrix with a column vector.

Interestingly, the result is not CSR anymore, it is now a numpy array representing the dot product.

$$ A.{x}= \left[ \begin{array}{cccc} a_{11} & a_{12} & \ldots & a_{1n}\ \\a_{21} & a_{22} & \ldots & a_{2n}\ \\ \vdots & \vdots & \ddots & \vdots\ \\a_{m1} & a_{m2} & \ldots & a_{mn} \end{array} \right] \left[ \begin{array}{c} x_1\ \\x_2\ \\\vdots\ \\x_n \end{array} \right] = \left[ \begin{array}{c} a_{11}x_1+a_{12}x_2 + \cdots + a_{1n} x_n\ \\a_{21}x_1+a_{22}x_2 + \cdots + a_{2n} x_n\ \\\vdots\ \\a_{m1}x_1+a_{m2}x_2 + \cdots + a_{mn} x_n\ \end{array} \right]. $$

Whereas, it is a element wise multiplication if we apply on dense matrix.

If you try to use np.dot, this may not work as expected because numpy does not understand CRS format.

Instead, you can use @ operator (from Python 3.5 PEP465) to perform the dot product:


There is only a presentation difference between normal matrix and sparse matrix. Using CSR/CRC can save memory but some libraries, such as numpy, do not support performing operators for the format.

Add comment

A.I, Data and Software Engineering

PetaMinds focuses on developing the coolest topics in data science, A.I, and programming, and make them so digestible for everyone to learn and create amazing applications in a short time.

Pin It on Pinterest


You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

Petaminds will use the information you provide on this form to be in touch with you and to provide updates.