New Return to work safely with Swiftlane Health Check

Learn more
Photo

Using an IDE [PyCharm] With GCP AI-Notebooks

When I first started using AI-Notebooks, I had one Jupyter panel open to a notebook and another to a python file that had my classes and methods. This approach was painful because the JupyterLab's python editor is not the best IDE and I had to manually reload my module every time I made a change. Eventually, I came up with a solution that allows me to write python code locally in PyCharm (or any other IDE) and have my notebooks automatically pick up the changes. Even methods of class instances which were initialized before a change are updated automatically. The instructions below explain how to re-create this setup.

Instructions

Create a Python Package for Python Modules

These instuctions explain how to setup a python package if you want to do it manually, however, I suggest using a generator like this one. Once your package is setup, you can put your code inside the package. Below I explain how to install the package from local source.

Setup Rsync

Setup ssh using gcloud and then into SSH in you your notebook machine by running the code below. If you launch a new instance, you will have to re-run this part.

# Setup SSH
gcloud compute config-ssh

# ssh into your notebook instance
ssh <notebook-name>.<notebook-region>.<gcp-project>

# In the SSH session
> mkdir ~/remote-repos

# To get the value for <gcp_ssh_username> below run this line
> whoami

Setup a file called ~/.gcp_notebook_rc with the following lines

export NB_SERVER=<notebook-name>.<notebook-region>
export GCP_USER=<gcp_ssh_username>

In your package repo add an shell script under scripts/sync_to_nb.sh

set -e

source ~/.gcp_notebook_rc
rsync -avz <PATH-TO-LOCAL-PACKAGE> ${NB_SERVER}.swiftpass:/home/${GCP_USER}/remote-repos/  --filter=':- .gitignore'

rsync will use the .gitignore file in the package root to determine what files to skip. If you see rsync syncing files it should not be syncing add that to the .gitignore. I suggest having all the lines below in your .gitignore.

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version
venv/

# mypy
.mypy_cache/

# Custom
.git
.idea
.DS_Store

Install the Rsync-ed Package From Local Source

To install your repo as a python package run the following from a terminal in Jupyter labs.

sudo su - <<GCP_USER>>
pip install -e /home/<<GCP_USER>>/remote-repos/<<package-dir>>

This will install your package in such a way that python will use your local source files, so if those files are updated, you do not have to re-install the package.

Use autoreload to Automatically Refresh Packages

In order to have Jupyter refresh your package on each cell execution, put the following lines in the first cell of the notebook:

%load_ext autoreload
%autoreload 2

*Note: I've noticed that this extension only works with packaged modules, if you use it with just a local .py file it will not work.

Getting Pycharm to Rsync Automatically

Having to manually rsync can be annoying. I found that I would often forget to do it and be annoyed that my code had not updated. I created a build configuration in PyCharm that runs my the rsync script automatically. I also changed the Cmd+s key-binding to run this build since PyCharm automatically saves files.

To add a build configuration go to run -> Edit Configurations

Pycharm Run Config