CoreWeave
Search…
Jupyter Notebook with TensorFlow
To follow along, please clone the GitHub repository with the example manifests.
Screenshot

Introduction

This example leverages a Deployment to always maintain one instances of Tensorflow with Jupyter. Tensorflow is a highly popular deep learning framework that is greatly accelerated by GPUs. Each instance, in Kubernetes terminology called a Pod is allocated 2 GPUs.
The Kubenetes Control-Plane in the CoreWeave Cloud will ensure that there are one instance (Pods) of Tensorflow Jupyter running at all times. The Control-Plane will reserve GPU, CPU and RAM on CoreWeaves compute nodes. Pods in the same deployment can be scheduled on the same or multiple physical nodes, depending on resource availability. If co-location of Pods is required for some reason, ie. shared ephemeral or block storage, this can be controlled with affinity rules.
The example Deployment does showcase some node affinity rules. These are purely for demonstration purposes, and the entire affinity section can be removed without breaking the example.

Service

A Service is included to show how to publish a Deployment to the public Internet. The Service publishes the web interface of Jupyter to the Internet on port 8888.

Persistent Storage

A Persistent Volume is allocated to persist user uploaded Notebooks. The allocation is done via a Persistent Volume Claim requesting the storage size and backing storage type (SSD, HDD). This volule claim is then mounted to the /tf/notebooks directory in the Pod definition. Utilizing a persistent volume ensures that files persist even if the node currently running the Pod fails.

Getting Started

After installing kubectl and adding your CoreWeave Cloud access credentials, the following steps will deploy the Ethminer Deployment and service.
  1. 1.
    Apply the resources. This can be used to both create and update existing manifests
    1
    $ kubectl apply -f tensorflow-deployment.yaml
    2
    deployment.apps/tensorflow-jupyter configured
    3
    $ kubectl apply -f tensorflow-service.yaml
    4
    service/tensorflow-jupyter configured
    Copied!
  2. 2.
    List pods to see the Deployment working to instantiate all our requested instances
    1
    $ kubectl get pods
    2
    NAME READY STATUS RESTARTS AGE
    3
    tensorflow-jupyter-6794bcb465-4czqb 0/1 ContainerCreating 0 2s
    Copied!
  3. 3.
    After a little while, all pods should transition to the Running state
    1
    $ kubectl get pods
    2
    NAME READY STATUS RESTARTS AGE
    3
    tensorflow-jupyter-6794bcb465-4czqb 1/1 Running 0 6s
    Copied!
  4. 4.
    The Deployment will also show that all desired Pods are up and running
    1
    $ kubectl get deployment
    2
    NAME READY UP-TO-DATE AVAILABLE AGE
    3
    tensorflow-jupyter 1/1 1 1 73m
    Copied!
  5. 5.
    Describing a Pod will help troubleshoot a Pod that does not want to start and gives other relevant information about the Pod
    1
    $ kubectl describe pod tensorflow-jupyter-6794bcb465-4czqb
    2
    ....
    3
    Events:
    4
    Type Reason Age From Message
    5
    ---- ------ ---- ---- -------
    6
    Normal Scheduled 54s default-scheduler Successfully assigned tenant-test/tensorflow-jupyter-6794bcb465-4czqb to g04c225
    7
    Normal SuccessfulAttachVolume 54s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-66fa0887-a4c4-4245-a4e2-02200f640fea"
    8
    Normal Pulling 50s kubelet, g04c225 Pulling image "tensorflow/tensorflow:1.15.0-py3-jupyter"
    9
    Normal Pulled 24s kubelet, g04c225 Successfully pulled image "tensorflow/tensorflow:1.15.0-py3-jupyter"
    10
    Normal Created 19s kubelet, g04c225 Created container miner
    11
    Normal Started 18s kubelet, g04c225 Started container miner
    Copied!
  6. 6.
    The logs can be viewed to retrieve the Jupyter login token
1
$ kubectl logs tensorflow-jupyter-6794bcb465-4czqb
2
3
WARNING: You are running this container as root, which can cause new files in
4
mounted volumes to be created as the root user on your host machine.
5
6
To avoid this, run the container by specifying your user's userid:
7
8
$ docker run -u $(id -u):$(id -g) args...
9
10
[I 14:09:12.985 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
11
[I 14:09:13.153 NotebookApp] Serving notebooks from local directory: /tf
12
[I 14:09:13.153 NotebookApp] The Jupyter Notebook is running at:
13
[I 14:09:13.153 NotebookApp] http://tensorflow-jupyter-6794bcb465-4czqb:8888/?token=a71eb39261e6ef01bdec8867c2c051b0b3aaf31545bfbb84
14
[I 14:09:13.153 NotebookApp] or http://127.0.0.1:8888/?token=a71eb39261e6ef01bdec8867c2c051b0b3aaf31545bfbb84
Copied!
To get the public IP assigned to the service, simply list all services
1
$ kubectl get service git:(master↓3|…
2
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
3
tensorflow-jupyter LoadBalancer 10.134.100.173 64.79.105.199 8888:30947/TCP 30s
Copied!
You can now now access Jupyter on http://EXTERNAL-IP:8888 using the login token from the log.
The package includes Tensorflows example notebooks that will leverage the GPUs available to the Pod.
Last modified 2mo ago