Provisioning AWS Deep Learning AMI-based EC2 for Jupyter

Below are my notes for provisioning a Jupyter server using AWS’ Deep Learning AMI. These notes captured my installation process during the week of 23 July 2018.

My goal for this document is to list the various reference points where you can find the step-by-step setup instructions for this provisioning task. This post will also comment on the obstacles I had run into during the provisioning, and what I needed to do to get past those obstacles.

You can find the installation and configuration notes below or here on the website.

Abbreviations

  • AWS: Amazon Web Services
  • VPC: Virtual Private Cloud
  • EC2: Elastic Compute Cloud
  • IAM: Identity and Access Management
  • AMI: Amazon Machine Image
  • DLAMI-DG: Deep Learning AMI Developer Guide

Requirements:

Needed to find workable configurations for modeling machine learning problems by exploring the use of AWS Ec2 instances. The source code language was in Python and contained in a Jupyter notebook.

The baseline performance was defined by running the Python script on a Dell Latitude E7450 with Intel i7-5600U CPU at 2.60GHz, 16GB RAM, and Windows 10. While this can be a decent configuration for ML modeling, occasionally we may need a system with larger memory capacity or more CPUs for the tasks at hand.

The performance of the end-to-end script processing time on cloud instance should be comparable or even better than the baseline workstation.

Background and Prerequisite Information

The following tools and assumptions were present prior to the provisioning of the cloud instance.

  • AWS Console with the necessary rights and configuration elements to launch an instance. I had configured a VPC subnet, an IAM role, a security group, and a key pair for setting up the instance.
  • AWS Deep Learning AMI Developer Guide, released June 6, 2018
  • Web browsers
  • PuTTY

AWS Configuration Notes

AMI: I performed the following steps using both the Ubuntu-based and the Amazon Linux-based AMIs with an m5.large general-purpose instance. The AMIs were designed to take advantage of instances with GPUs. I found no issue running either AMI without the GPU; however, some pre-supplied tutorial examples probably need to be tweaked before they would work on a general-purpose instance.

VPC: This exercise requires only a subnet that is accessible via the Internet.

Security Group: I configured the security group to allow only TCP ports 22 from any IP address because I had planned to use an SSH tunnel to access the Jupyter server.

IAM Role: I assign all my AWS instances to an IAM role by default. For this exercise, an IAM role is not critical.

Key Pair: I attached the instance to an existing key pair. The key pair is necessary to access the instance via the SSH protocol.

Provision an instance with the Amazon Deep Learning AMI

Step 1) Create and launch the instance. I used an m5.large instance as the starting point.

Step 2) Configure the client workstation to connect to the Jupyter server. I configured my Windows workstation to connect to the Jupyter server using an SSH tunnel. The DLAMI-DG document has a write-up on how to do this for Windows, Linux, and MacOS clients (pages 15-20).

See the PuTTY screenshot below for configuring an SSH tunnel.

provision_aws_deep_learning_ami-1

Step 3) Run the “git” commands to copy my Python scripts from GitHub to the cloud server.

$ git clone https://github.com/daines-analytics/sandbox-projects.git

Step 4) Activate the Python 3 environment by running the command:

$ source activate python3

Step 5) Because my Python script required numpy, pandas, scipy, scikit-learn, and matplotlib packages, I needed to install some additional packages.

On the Ubuntu AMI, I ran the command “conda install <package>” to check or to install them.

On the Amazon Linux AMI, I ran the command “pip install <package>” to check or to install them.

Step 6) Start the Jupyter server by running the command:

$ jupyter notebook

provision_aws_deep_learning_ami-2

Step 7) Make a note of the Jupyter server URL and use that on the workstation browser running the SSH tunnel.

provision_aws_deep_learning_ami-3

Step 8) Locate the Python script and run it (my own Git folder circled below).

provision_aws_deep_learning_ami-4

Step 9) Compare the run-time script lengths. Not rigidly scientific but probably good enough.

Windows Workstation: 1 hour 50 minutes

Ubuntu/Deep Learning AMI: 1 hour 22 minutes

Amazon Linux/Deep Learning AMI: 1 hour 24 minutes

There you have it! A working Jupyter server on an AWS cloud instance that you can access via a secured protocol.

When compared to a client workstation, the right types of cloud instance can help our modeling effort. For anyone who will be attempting a similar installation, I hope these instructions can help in some way. My next step is to automate the instance creation with a CloudFormation script further. I will write down what I run into and share my findings later.