A.I, Data and Software Engineering

Dimension, Dimension, Dimension – Reshape your data

D

The most basic yet important thing when working with data array is its dimensions. This article will cover several data shapes and reshaping techniques.

Why need reshaping data

Imagine that you are starving and suddenly given a piece of delicious food. You may try to put it all in your mouth (Fig 1a) and find out it cannot help your hunger. So, you decided to arrange your food so that it not only fits your mouth but also you can eat really fast (Fig 1b).

Fig 1a: Good food – Wrong size
Fig 1b: Good food – Good format

That is it! In machine learning, especially, when working with a neural network, reshaping data is to make sure that the dimension of a data slice is suitable to process.

A typical example is matrix multiplication, e.g. matmul operator in TensorFlow or Numpy. Revise that \(A \times B\) is valid only when the width of matrix A (number of columns) equals the height of matrix B (number of rows).

Sample code with TensorFlow 2.0:

If you try to execute the code, you will get an error “Matrix size-incompatible” because both A and B are sized as 1 x 3.

To fix that, change the shape of B to 3×1 as follow.

Data Shapes

Example of data (slice) reshaping (3 shapes)
Example of data reshaping into three different shapes (2, 3), (3, 2), and (6, 1)

(n, ) shape

This is a very common shape of 1-D array but may confuse many beginners. A shape attribute returns a tuple of the length of each dimension of the array. There is nothing after the comma because there is only ONE INDEX to identify elements.

You may wonder if there is a (, n) shape? No, I never see one like that except in this article. 😀 .

(1, n) shape

Again, many may ask what is the difference between (n, ) vs (1, n). They are both 1-D but the former is array while the latter is a matrix (with 1 row and n columns). Namely, there are two indices associated with an element. This shape is also known as a row vector.

(n, 1) shape

Nothing special, this shape has n rows and 1 column. Nevertheless, this shape is very important as a column vector. Conveniently, we can use the transpose operator (T in numpy and tf.transpose in TF) to create a column vector from a row vector.

(m, n) shape

This shape has m rows and n columns. There are several ways to create data of this shape.

If you already have m rows of n elements, we can form a matrix of \(m \times n\) by using vstack .

Reshape a shape to (almost) any other shape

You can use reshape function to transform a specific shape to another. If you want to transform from to k dimensional data of shape \((a_1, …, a_k)\) to h dimensional data of shape \((b_1, …, b_h)\).

Condition: \(n = a_1 \times a_2 \times \dots \times a_k = b_1 \times b_2 \times \dots \times b_h \)

The sample code below demonstrates different shapes transformation: (4, ) -> (2, 2) -> (2, 1, 1, 2) -> (1, 2, 2).

Quickly flatten data

To transform an array to 1-d :

Conclusion

Good control of the shapes of your data is the first step to avoid many errors related to array/matrix operations. There are also other ways to reshape your data. Find more numpy indexing, hstack, data indexing and slicing (with pandas, or TensorFlow)

Happy reshaping! 😀

Add comment

A.I, Data and Software Engineering

PetaMinds focuses on developing the coolest topics in data science, A.I, and programming, and make them so digestible for everyone to learn and create amazing applications in a short time.

Pin It on Pinterest

Newsletters

You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

Petaminds will use the information you provide on this form to be in touch with you and to provide updates.