Table of contents
I decided to share this topic while doing research on Deep Learning on Graph, the latest trend in Deep learning. One of the challenges that I had was to the processing power of my laptop while processing hundreds of thousands of nodes. While buying a new laptop with a good GPU is not cheap, around $2k+ US, I decided to dive into the free platform provided by Google (GG).
What is great about Colab?
Colab is a free cloud service based on Jupyter Notebooks for machine learning education and research. It provides a runtime fully configured for deep learning and free-of-charge access to a robust GPU.
At a glance, Colab offers:
- 12GB GPU
- 20-50GB online space for storing data
- 12 hours runtime*: it is crucial to finish each test within this period
But you can do more with:
- Sharing the project with your colleagues
- Map your Google drive in Colab VM runtime for notebooks to access
- Files can be uploaded (<250MB) or downloaded using scripts
- Work with file sync from your computer
- Files can be load from Github (< 25MB)
Jupyter notebook in Colab
When moving from an IDE like Visual Studio or Eclipse, many feel uncomfortable with Jupyter because of the suggestion. Nevertheless, Jupyter notebook does provide suggestion and code completion.
For non-colab notebooks:
- Tab: to get suggestions
- Shift-tab: to get docstring
- Shift-Enter: to run the current cell
- Ctrl-Shift-P: command mode
For colab notebook:
- Ctrl+space: code suggestion and docstring (woohoo).
- Other shortcuts are like above
Please note that it is the new feature that not officially released by the time I write this article. You may need to wait for the invitation popup to use the feature.
Accelerate the notebook on colab
Colab notebooks are the handicap of dealing with a runtime that will blow up every 12 hours into space! This is why is so important to speed up the time you need to run your runtime again.
Here are some tips:
Run all cells at once:
- Ctrl + F9: run all cells at once
Change runtime type:
You can switch to GPU or TPU for your notebook runtime ( Runtime > Change runtime type > ).
As they are quite similar, stick to GPU as it performs better in some reports. To confirm your notebook running on a GPU:
#' ' means CPU whereas '/device:G:0' means GPU
import tensorflow as tf
Map your GG Drive:
# This cell imports the drive library and mounts your Google Drive as a VM local drive.
# You can access to your Drive files using this path "/content/gdrive/My Drive/"
from google.colab import drive
#Upload - paste the code to a cell:
from google.colab import files
uploaded = files.upload()
#Download generated files - paste code to a cell
from google.colab import files
Reduce manual interactions
Use automation scripts whenever possible. The following example demonstrates how to pull cuDNN from Nvidia, save the lib for later use (use shell command in jupyter cells)
# Extracts the cuDNN files from Drive folder directly to the VM CUDA folders
!tar -xzvf gdrive/My\ Drive/darknet/cuDNN/cudnn-10.0-linux-x64-v188.8.131.52.tgz -C /usr/local/
!chmod a+r /usr/local/cuda/include/cudnn.h
# Now we check the version we already installed. Can comment this line on future runs
!cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
Copy datasets to VM local filesystem
Colab notebooks sometimes have some lag working with the Drive files. After logging in colab, you will work at “/content/” (check with pwd command). You can move Dataset from google drive to local:
# Copy files from Google Drive to the VM local filesystem
!cp -r "/content/gdrive/My Drive/data.csv" ./data
Hope that you find this post helpful. 🙂