A.I, Data and Software Engineering

Make use of GG Colab and Jupyter notebook

M

I decided to share this topic while doing research on Deep Learning on Graph, the latest trend in Deep learning. One of the challenges that I had was to the processing power of my laptop while processing hundreds of thousands of nodes. While buying a new laptop with a good GPU is not cheap, around $2k+ US, I decided to dive into the free platform provided by Google (GG).

URL: https://colab.research.google.com/notebooks/welcome.ipynb#recent=true

What is great about Colab?

Colab is a free cloud service based on Jupyter Notebooks for machine learning education and research. It provides a runtime fully configured for deep learning and free-of-charge access to a robust GPU.

At a glance, Colab offers:

  • 12GB GPU
  • 20-50GB online space for storing data
  • 12 hours runtime*: it is crucial to finish each test within this period

But you can do more with:

  • Sharing the project with your colleagues
  • Map your Google drive in Colab VM runtime for notebooks to access
  • Files can be uploaded (<250MB) or downloaded using scripts
  • Work with file sync from your computer
  • Files can be load from Github (< 25MB)

Jupyter notebook in Colab

When moving from an IDE like Visual Studio or Eclipse, many feel uncomfortable with Jupyter because of the suggestion. Nevertheless, Jupyter notebook does provide suggestion and code completion.

For non-colab notebooks:

  • Tab: to get suggestions
  • Shift-tab: to get docstring
  • Shift-Enter: to run the current cell
  • Ctrl-Shift-P: command mode
Code suggestion – TAB

For colab notebook:

  • Ctrl+space: code suggestion and docstring (woohoo).
  • Other shortcuts are like above

Please note that it is the new feature that not officially released by the time I write this article. You may need to wait for the invitation popup to use the feature.

Accelerate the notebook on colab

Colab notebooks are the handicap of dealing with a runtime that will blow up every 12 hours into space! This is why is so important to speed up the time you need to run your runtime again. 

Here are some tips:

Run all cells at once:

  • Ctrl + F9: run all cells at once

Change runtime type:

You can switch to GPU or TPU for your notebook runtime ( Runtime > Change runtime type > ).

As they are quite similar, stick to GPU as it performs better in some reports. To confirm your notebook running on a GPU:

#' ' means CPU whereas '/device:G:0' means GPU
import tensorflow as tf
tf.test.gpu_device_name()
Change runtime type GPU / TPU of jupyter notebook in GG colab
Change runtime type GPU / TPU

Map your GG Drive:

# This cell imports the drive library and mounts your Google Drive as a VM local drive.
# You can access to your Drive files using this path "/content/gdrive/My Drive/"
from google.colab import drive
drive.mount('/content/gdrive')

Upload/Download files

#Upload - paste the code to a cell:
from google.colab import files
uploaded = files.upload()
#Download generated files - paste code to a cell
from google.colab import files
files.download('file.txt')

Reduce manual interactions

Use automation scripts whenever possible. The following example demonstrates how to pull cuDNN from Nvidia, save the lib for later use (use shell command in jupyter cells)

# Extracts the cuDNN files from Drive folder directly to the VM CUDA folders
!tar -xzvf gdrive/My\ Drive/darknet/cuDNN/cudnn-10.0-linux-x64-v7.5.0.56.tgz -C /usr/local/
!chmod a+r /usr/local/cuda/include/cudnn.h
# Now we check the version we already installed. Can comment this line on future runs
!cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

Copy datasets to VM local filesystem

Colab notebooks sometimes have some lag working with the Drive files. After logging in colab, you will work at “/content/” (check with pwd command). You can move Dataset from google drive to local:

# Copy files from Google Drive to the VM local filesystem
!cp -r "/content/gdrive/My Drive/data.csv" ./data

Hope that you find this post helpful. 🙂

2 comments

💬

A.I, Data and Software Engineering

PetaMinds focuses on developing the coolest topics in data science, A.I, and programming, and make them so digestible for everyone to learn and create amazing applications in a short time.

Categories