AWS Elastic Kubernetes Service (EKS)#

RAPIDS can be deployed on AWS via the Elastic Kubernetes Service (EKS).

To run RAPIDS you’ll need a Kubernetes cluster with GPUs available.

Prerequisites#

First you’ll need to have the aws CLI tool and eksctl CLI tool installed along with kubectl for managing Kubernetes.

Ensure you are logged into the aws CLI.

$ aws configure

Create the Kubernetes cluster#

Now we can launch a GPU enabled EKS cluster with eksctl.

Note

You will need to create or import a public SSH key to be able to execute the following command. In your aws console under EC2 in the side panel under Network & Security > Key Pairs, you can create a key pair or import (see “Actions” dropdown) one you’ve created locally.
If you are not using your default AWS profile, add --profile <your-profile> to the following command.
The --ssh-public-key argument is the name assigned during creation of your key in AWS console.

$ eksctl create cluster rapids \
                      --nodes 3 \
                      --node-type=g4dn.xlarge \
                      --timeout=40m \
                      --ssh-access \
                      --ssh-public-key <public key ID> \
                      --region us-east-1 \
                      --zones=us-east-1c,us-east-1b,us-east-1d \
                      --auto-kubeconfig

With this command, you’ve launched an EKS cluster called rapids. You’ve specified that it should use nodes of type g4dn.xlarge, which include one NVIDIA T4 GPU each.

When eksctl sees an NVIDIA GPU instance type, it selects the correct EKS-optimized accelerated AMI and installs the NVIDIA Kubernetes device plugin automatically. The EKS-optimized NVIDIA AMI includes the NVIDIA driver, CUDA user-mode driver, and the NVIDIA Container Toolkit.

To access the cluster we need to pull down the credentials. Add --profile <your-profile> if you are not using the default profile.

$ aws eks --region us-east-1 update-kubeconfig --name rapids

Verify GPU support#

Verify that the NVIDIA device plugin Pods are running.

$ kubectl get po -n kube-system -l name=nvidia-device-plugin-ds
NAME                                   READY   STATUS    RESTARTS   AGE
nvidia-device-plugin-daemonset-kv7t5   1/1     Running   0          52m
nvidia-device-plugin-daemonset-rhmvx   1/1     Running   0          52m
nvidia-device-plugin-daemonset-thjhc   1/1     Running   0          52m

Note

If you need to manage the NVIDIA device plugin version yourself, set eksctl create cluster --install-nvidia-plugin=false ... when creating the cluster and then install the device plugin manually. If you choose to install the NVIDIA GPU Operator on EKS-optimized NVIDIA AMIs, disable the operator’s driver and toolkit installation because those components are already included in the AMI.

After you have confirmed the device plugin is running, you are ready to test your cluster.

Let’s create a sample Pod that uses some GPU compute to make sure that everything is working as expected.

cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: cuda-vectoradd
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-vectoradd
    image: "nvidia/samples:vectoradd-cuda11.6.0-ubuntu18.04"
    resources:
       limits:
         nvidia.com/gpu: 1
EOF

$ kubectl logs pod/cuda-vectoradd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

If you see Test PASSED in the output, you can be confident that your Kubernetes cluster has GPU compute set up correctly.

Next, clean up that Pod.

$ kubectl delete pod cuda-vectoradd
pod "cuda-vectoradd" deleted

Install RAPIDS#

Now that you have a GPU enabled Kubernetes cluster on EKS you can install RAPIDS with any of the supported methods.

Clean up#

You can also delete the EKS cluster to stop billing with the following command.

$ eksctl delete cluster --region=us-east-1 --name=rapids
Deleting cluster rapids...⠼