Executing RAPIDS from your computer (without a local GPU)

RAPIDS Series 2: Step-by-step Python configuration for more efficient computing

By Filip Velkov

Setting the stage

The usage of GPU computational power in data science—especially in scenarios where someone is training and using deep learning models—is becoming increasingly appealing to our fellow data scientists and engineers. This comes as no surprise when comparing the average model training times on GPU versus CPU.

Lately, more and more technologies are emerging that aim to bring the GPU processing power into the data science world — mainly so data scientists can do their tasks more efficiently. One of these technologies is NVIDIA RAPIDS, a data science framework composed of multiple libraries with a common goal of executing end-to-end data science pipelines completely on the GPU.

However, most of us face the problem of not having laptops with a dedicated GPU, meaning that most personal computing devices don’t come with a graphical processing unit integrated in their system. That means you have to rent a virtual server on the cloud that has a GPU device connected to it. This solution has one major setback—since we are working remotely, we are unable to use any of the most popular IDEs (code editors), with an exception for Jupyter Notebook which might be easy to set up on a virtual server.

However not everybody is happy with being limited to the Jupyter Notebook and not being able to use other code editors. We wanted to explore whether there is a way to overcome this. There is—and it’s pretty cool, given that it involves using the remote Python interpreter (the program executing the instructions written by the programmers) locally, which will allow us to use libraries in your local environment without having a GPU.

This article will present the steps necessary to set up an Amazon EC2 instance (which is just a fancy name for a virtual server) with a dedicated GPU, set up a Python environment with RAPIDS installed on it, and use the created interpreter locally in PyCharm. So let’s get started!

Setting up the EC2 Instance

  • Amazon EC2 P3 Instances, with up to 8 NVIDIA Tesla V100 GPUs.
  • Amazon EC2 G3 Instances, with up to 4 NVIDIA Tesla M60 GPUs.
  • Amazon EC2 G4 Instances, with up to 4 NVIDIA T4 GPUs.
  • Amazon EC2 P4 Instances, with up to 8 NVIDIA Tesla A100 GPUs.

For our example, we will be using an g4dn.xlarge instance, which is a G4 type instance that comes with a single NVIDIA T4 GPU, 4 vCPUs and 16 Gib of RAM.

To start, just log into your AWS account, go to the EC2 console, select Instances on your left side menu, and click the orange launch instances button on the right.

You will then be prompted to select an AMI (Amazon Machine Image) that will be used to configure your instance. There are some basic images that only include the operating system and others that are more complex. The latter include installing software packages on top of the operating system, which would be useful for more specific tasks like building web applications with a particular framework, a content management system like WordPress, or hosting a database server.

In our case we will be using the community Deep Learning AMI (Ubuntu 18.04) Version 42.1 since it comes with all the packages and drivers needed, such as TensorFlow, PyTorch, and support for CUDA. As the name suggests it is optimized to be used in Deep Learning projects.

Figure 1 Selecting the AMI for the instance

Next, you should choose the instance type. Our recommendation is to use the g4dn.xlarge instance type since it’s the cheapest one you can use to configure RAPIDS in AWS.

Figure 2 Selecting an appropriate instance type

After completing the remaining user specific details-which include defining a network, network interface for your instance, adding storage, configuring security groups, and defining a key pair that will be used to access your instance through SSH-you are ready to go. Once you launch, you should see the instance running and have the ability to connect to it.

To do so, go to the instance summary by selecting it from the list of available instances and clicking on the ‘Connect’ button. You will find the instructions to connect to your instance under the SSH client. You can use Putty or Git bash as a SSH client for Windows.

Figure 3 Instructions to SSH into your instance

If you accessed the machine successfully, then you are right on track and ready to install RAPIDS on your machine. If not, look back through the steps to make sure nothing was missed.

Installing RAPIDS

If you wish, you may also change some of the arguments as stated . This will create a new conda environment called rapids-0.18 that has everything you need to run RAPIDS on it.

Setting PyCharm to use remote interpreter

This next section provides a step-by-step guide to set the remote Python interpreter from the rapids-0.18 conda environment defined in our EC2 instance.

1. Start by creating a new Pure Python project. Check the radio button to use a previously configured interpreter then click the “…” button to add a new one (see visual).

2. From the left menu, select SSH Interpreter. There you will be asked to enter all the information necessary so that PyCharm can establish a SSH connection with the remote interpreter. Enter the public IP address of the EC2 instance in the Host field, and ubuntu in the Username field. Click ‘Next’.

3. Configure the path to the secret key file on your local machine. Click ‘Next’.

4. Once connected, you will be prompted to enter the file path to the desirable python interpreter. Insert this: /home/ubuntu/anaconda3/envs/rapids-0.18/bin/python. Note that rapids-0.18 refers to the name of the conda environment that we created in the previous steps. If you created an environment with a different name, use that instead. Click ‘Finish’ and you will be returned to the primary window.

5. Lastly, you have the option of setting the path to the folder where the project will be stored remotely on the virtual machine. If you want to specify the folder, you can do so in the field or you can just leave it to have the default value. Click ‘Create’ to finish your set up.

Congratulations! You have successfully created a project that will use a remote interpreter located in your EC2 instance.

Conclusion

Filip Velkov is a Machine Learning intern at Loka. In addition to ML, Filip is currently exploring the MLOps and DevOps world as part of his internship program. Filip is a team player and eager learner who is always seeking new challenges and opportunities to expand his skillset. When he’s not at his virtual office, you can catch Filip hiking or picking up a foreign language for whatever adventure comes next.

Originally published at https://loka.com.

Loka is a team of elite data engineers & designers who help ship fascinating innovations. Our stories give you a peek into what’s now & next for ML & humanity.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store