A.I, Data and Software Engineering

Quick Benchmark Colab CPU GPU TPU (XLA-CPU)


If you ever wonder about the performance differences between CPU, GPU, and TPU for your machine learning project, this article shows a simple benchmark for these three.

Memory Subsystem Architecture

Central Processing Unit (CPU), Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU) are processors with a specialized purpose and architecture.

  • CPU: A processor designed to solve every computational problem in a general fashion. The cache and memory design are to be optimal for any general programming problem.
  • GPU: A processor designed to accelerate the rendering of graphics.
  • TPU: A co-processor designed to accelerate deep learning tasks develop using TensorFlow (a programming framework). It is designed for a high volume of low precision computation (e.g. as little as 8-bit precision). Nevertheless, compilers have not been developed for TPU which could be used for general-purpose programming; hence, it requires significant effort to do general programming on TPU.
CPU GPU TPU memory subsystem architecture

Compute primitive

The dimensions of data are:

  • CPU: 1 X 1 data unit
  • GPU: 1 X N data unit
  • TPU: N X N data unit

Benchmark CPU, GPU, TPU

I will generate some data and perform the calculation on different infrastructures. The execution time will be logged for comparison. The implementation is on Google Colab with a limited option for TPU on Google compute engine backend. See this post for a quick intro of Google Colab. Specifically, we test on CPU, GPU, and XLA_CPU (accelerated linear algebra).

Test on CPU

Firstly, we enable TensorFlow 2.0 and set log info.

Then we select the CPU. Note that, you’ll need to enable CPU for the notebook:

  • Navigate to Edit→Notebook Settings
  • select None from the Hardware Accelerator drop-down (Runtime)

Next, we’ll confirm that we can connect to the CPU with TensorFlow:

After that, we define an operation to test. We use convolution 2d

We conduct the operation 10 times and get the log for the execution time.

Test on GPU

Similarly, you need to set the runtime to use GPU before running the following code.

The CPU vs GPU result.

Test on TPU – XLA-GPU

Don’t forget to switch the runtime to TPU.

And the result


There are a couple of other tests carried out in different settings. Nevertheless, under the current configuration of Google compute engine backend, it seems that CPU and TPU’s performance are very similar. GPU outperforms these two in this test (~50x faster).

In the future, we may conduct another test with a CNN project, which TPU is optimized for. For your interest, you can read this paper for a more structured benchmark.

Add comment

A.I, Data and Software Engineering

PetaMinds focuses on developing the coolest topics in data science, A.I, and programming, and make them so digestible for everyone to learn and create amazing applications in a short time.

Pin It on Pinterest


You have successfully subscribed to the newsletter

There was an error while trying to send your request. Please try again.

Petaminds will use the information you provide on this form to be in touch with you and to provide updates.