Jupyter Notebook with TensorFlow

Deploy Jupyter Notebook with TensorFlow on CoreWeave Cloud

important

This tutorial is undergoing updates.

Our team is revising this example to work with the current version of TensorFlow, and will be updated soon.

TensorFlow is a highly popular deep learning framework that is greatly accelerated when running on GPUs, which is frequently used in conjunction with Jupyter notebooks.

In this simple example, an instance of TensorFlow with Jupyter is deployed to CoreWeave Cloud and exposed to the public Internet using Kubernetes.

Prerequisites

This tutorial presumes that you have...

installed kubectl on your system
obtained and configured CoreWeave access credentials

important

Please note that this tutorial is for demonstration purposes only, and should not be used in production due to the lack of HTTPS encryption on the Jupyter service.

Tutorial source code

To follow along with this tutorial, first clone the manifests from GitHub.

Manifests

The tensorflow-jupyter directory contains three YAML files, each of which are manifests used to deploy a different piece of this example.

Filename	Description
`tensorflow-deployment.yaml`	Defines the Deployment, which dictates how the TensorFlow with Jupyter Pods will be created
`tensorflow-service.yaml`	Defines the Service, which handles networking and public publishing
`jupyter-pvc.yaml`	Defines the PersistentVolumeClaim used for Jupyter notebook storage

The Deployment manifest

In this example, a Kubernetes Deployment manifest is used to deploy a single instance of TensorFlow with Jupyter notebooks.

note

Pods defined in the same Deployment manifest may be scheduled either on the same node, or on multiple nodes — where Pods are scheduled ultimately depends on node resource availability. If co-location of Pods is required for some reason, e.g. shared ephemeral or block storage, this can be controlled with affinity rules.

This example Deployment manifest does not showcase all possible node affinity rules; those shown are included purely for demonstration purposes. The entire affinity stanza may be removed from this example Deployment without breaking the example.

Learn more about scheduling Pods, or see a more advanced Deployment example in Custom Containers.

Replicas

This example Deployment manifest dictates to CoreWeave Cloud's Kubernetes control plane to ensure that there will only ever be one Pod running TensorFlow with Jupyter at all times. This single instance is defined via the spec.replicas: 1 key-value pair in the manifest.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tensorflow-jupyter
spec:
  strategy:
    type: Recreate
  # Replicas controls the number of instances of the Pod to maintain running at all times
  replicas: 1

Resources

The control plane is also in charge of reserving GPU, CPU and memory resources on CoreWeave compute nodes, on which Pods run. Each instance of Jupyter, running inside a Kubernetes Pod, is allocated 2 GPUs, as defined by the spec.resources.limits.nvidia.com/gpu: 2 pair seen below.

resources:
  requests:
    cpu: 500m # The CPU unit is mili-cores. 500m is 0.5 cores
    memory: 256Mi
  limits:
    cpu: 2000m
    memory: 2048Mi
    # GPUs can only be allocated as a limit, which both reserves and limits the number of GPUs the Pod will have access to
    # Making individual Pods resource light is advantageous for bin-packing. In the case of Jupyter, we stick to two GPUs for
    # demonstration purposes
    nvidia.com/gpu: 2

Additional Resources

To learn more about requests and limits in Kubernetes, refer to the official Kubernetes documentation.

The Service manifest

In Kubernetes, networking traffic is handled via Services, which are typically defined by their own manifests. In this example, a single Service is defined in the Service manifest tensorflow-service.yaml, which allows the Deployment to be published to the public Internet.

This Service manifest dictates that TCP port 8888 will be open to the public Internet as a LoadBalancer type:

Service manifest
apiVersion: v1
kind: Service
metadata:
  annotations:
    metallb.universe.tf/address-pool: public
    # Setting a sharing key might save public IP addresses
    # See https://metallb.universe.tf/usage/#ip-address-sharing for more detail
    metallb.universe.tf/allow-shared-ip: example-1
  name: tensorflow-jupyter
spec:
  type: LoadBalancer
  externalTrafficPolicy: Local
  ports:
    - name: notebook
      port: 8888
      protocol: TCP
      targetPort: notebook
  selector:
    app.kubernetes.io/name: tensorflow-jupyter

Connecting the Pod to the Service

In order to connect the containerPod exposed in the Deployment manifest to the Service, the Deployment manifest dictates that the Web interface of Jupyter will be published from the container's TCP port 8888, defined in the spec.ports.containerPort pair. This matches the port that the Service opens to the Internet under the spec.ports.port pair shown above.

Additional Resources

Learn more about how to connect applications to Services in Kubernetes.

Deployment manifest
spec:
  containers:
    - name: tf
      image: tensorflow/tensorflow:2.12.0-gpu-jupyter

     ports:
      - name: notebook
        containerPort: 8888
        protocol: TCP

Additional Resources

To learn more about Kubernetes Services, refer to the official Kubernetes documentation.

The Persistent Volume Claim manifest

Utilizing a persistent volume ensures that files persist, even if the node currently running the Pod fails. In this example, a persistent volume is allocated in order to persist user-uploaded notebooks.

Allocation of persistent volume storage is done via a Kubernetes PersistentVolumeClaim, commonly called a PVC, which requests the storage size and backing storage type (such as SSD or HDD) to be used. In this example, the PVC is defined in the jupyter-pvc.yaml file.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: jupyter-pv-claim
spec:
  # Available storage classes at time of writing are
  # block-nvme-lga1 - New York - NVMe Storage with 3 Replicas
  # block-hdd-lga1 - New York - HDD Storage with 3 Replicas
  storageClassName: block-nvme-lga1
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Mounting storage

The storage type selected here is a Block NVMe type, with a capacity of 10Gi. After it is created, the volume claim is then mounted to the /tf/notebooks directory, as specified by the spec.volumeMounts stanza in the Deployment manifest:

Deployment manifest
volumeMounts:
  - name: storage
    mountPath: /tf/notebooks

Additional Resources

Learn more about Storage and how to use it on CoreWeave Cloud.

Run the example

Run the following steps to deploy the TensorFlow Deployment and Service.

Apply the resources

Using kubectl apply, deploy the Deployment and the Service.

 $ kubectl apply -f tensorflow-deployment.yaml
 deployment.apps/tensorflow-jupyter configured

 $ kubectl apply -f tensorflow-service.yaml
 service/tensorflow-jupyter configured

View created Pods

Use kubectl get pods to list the newly created Pods. This should show that the Deployment is working to instantiate our requested instance.

 $ kubectl get pods

 NAME                                 READY   STATUS              RESTARTS   AGE
 tensorflow-jupyter-6794bcb465-4czqb  0/1     ContainerCreating   0          2s

After a moment, all Pods should transition from a ContainerCreating state to a Running state.

$ kubectl get pods

NAME                                  READY   STATUS    RESTARTS   AGE
tensorflow-jupyter-6794bcb465-4czqb   1/1     Running   0          6s

Obtaining the Deployment resource will also show that all desired Pods are up and running, as well as some additional information about them.

 $ kubectl get deployment

 NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
 tensorflow-jupyter   1/1     1            1           73m

Using kubectl describe on a Pod will display in-depth information on that Pod, which is useful if there is a Pod that hasn't yet started.

 $ kubectl describe pod tensorflow-jupyter-6794bcb465-4czqb
 ....
 Events:
   Type    Reason                  Age   From                     Message
   ----    ------                  ----  ----                     -------
   Normal  Scheduled               54s   default-scheduler        Successfully assigned tenant-test/tensorflow-jupyter-6794bcb465-4czqb to g04c225
   Normal  SuccessfulAttachVolume  54s   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-66fa0887-a4c4-4245-a4e2-02200f640fea"
   Normal  Pulling                 50s   kubelet, g04c225         Pulling image "tensorflow/tensorflow:1.15.0-py3-jupyter"
   Normal  Pulled                  24s   kubelet, g04c225         Successfully pulled image "tensorflow/tensorflow:1.15.0-py3-jupyter"
   Normal  Created                 19s   kubelet, g04c225         Created container miner
   Normal  Started                 18s   kubelet, g04c225         Started container miner

Pod labels may also be leveraged for more information. For example, for Pods given the spec.metadata.name attribute of tensorflow-jupyter, as is the case in the example Deployment, the following command may be used to acquire additional information on matching Pods:

$ kubectl describe pod -l app.kubernetes.io/name=tensorflow-jupyter

Additional Resources

For more information on Labels and Affinities in CoreWeave Cloud, see Advanced Label Selectors.

View the Pod's logs to obtain the Jupyter login token. The token will be displayed as part of the URL for the served instance. For example:

$ kubectl logs tensorflow-jupyter-6794bcb465-4czqb

WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.

To avoid this, run the container by specifying your user's userid:

$ docker run -u $(id -u):$(id -g) args...

[I 14:09:12.985 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
[I 14:09:13.153 NotebookApp] Serving notebooks from local directory: /tf
[I 14:09:13.153 NotebookApp] The Jupyter Notebook is running at:
[I 14:09:13.153 NotebookApp] http://tensorflow-jupyter-6794bcb465-4czqb:8888/?token=a71eb39261e6ef01bdec8867c2c051b0b3aaf31545bfbb84
[I 14:09:13.153 NotebookApp]  or http://127.0.0.1:8888/?token=a71eb39261e6ef01bdec8867c2c051b0b3aaf31545bfbb84

In this example, the token is a71eb39261e6ef01bdec8867c2c051b0b3aaf31545bfbb84.

Obtain the public IP

To view the public IP assigned to the Service, which will be used to log in to Jupyter, get all Services.

$ kubectl get service

NAME                 TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)    AGE
tensorflow-jupyter   LoadBalancer   10.134.100.173   64.79.105.199   8888/TCP   30s

Log in to Jupyter

Using your browser, navigate to http://<EXTERNAL-IP>:8888 using the login token obtained from the Pod's logs.

🎉 Congratulations! You've deployed an instance of TensorFlow running a Jupyter notebook on CoreWeave Kubernetes!

tip

The package includes TensorFlow's example Notebooks, which will leverage the GPUs made available to the Pod.

Prerequisites​

Tutorial source code​

Manifests​

The Deployment manifest​

Replicas​

Resources​

The Service manifest​

Connecting the Pod to the Service​

The Persistent Volume Claim manifest​

Mounting storage​

Run the example​

Apply the resources​

View created Pods​

Obtain the Jupyter login token​

Obtain the public IP​

Log in to Jupyter​

Prerequisites

Tutorial source code

Manifests

The Deployment manifest

Replicas

Resources

The Service manifest

Connecting the Pod to the Service

The Persistent Volume Claim manifest

Mounting storage

Run the example

Apply the resources

View created Pods

Obtain the Jupyter login token

Obtain the public IP

Log in to Jupyter