Launch GPT DeepSpeed Models using Determined AI
Launch a GPT DeepSpeed model using Determined AI on CoreWeave Cloud
DeepSpeed is an open-source deep learning library for PyTorch optimized for low latency and high throughput training, designed to reduce compute power and memory required to train large distributed models.
In the example below, a minimal GPT-NeoX DeepSpeed distributed training job is launched without the additional features such as tracking, metrics, and visualization that Determined AI offers.
Tutorial source code
To follow along with this example, first clone the CoreWeave GPT DeepSpeed repository to your workstation:
$git clone --recurse-submodules https://github.com/coreweave/gpt-det-deepseed.git
Prerequisites
This guide assumes that the following are completed in advance.
- You have set up your CoreWeave Kubernetes environment locally
git
is locally installed- Determined AI is installed in your namespace
Setup
The launcher configuration file
The launcher.yml
configuration file provided in this demo exposes the overall configuration parameters for the experiment**.**
Determined AI uses its own fork of DeepSpeed, so using that image is recommended.
image:gpu: liamdetermined/gpt-neox
In this example, a wrapper around DeepSpeed called determined.launch.deepspeed
allows for safe handling of note failure and shutdown.
entrypoint:- python3- -m- determined.launch.deepspeed
Mount path for host file
In train_deepspeed_launcher.py
, the default mount path is defined as:
shared_hostfile = "/mnt/finetune-gpt-neox/hostfile.txt"
Configure this hostfile path to your mount path.
Dockerfile
The Dockerfile provided in this experiment is used to build the Docker image needed to run the experiment in the cluster. The image may be manually built if customizations are desired.
The Dockerfile uses the following:
- Python 3.8
- PyTorch 1.12.1
- CUDA 11.6
Click to expand - Example Dockerfile
FROM coreweave/nccl-tests:2022-09-28_16-34-19.392_EDTENV DET_PYTHON_EXECUTABLE="/usr/bin/python3.8"ENV DET_SKIP_PIP_INSTALL="SKIP"# Run updates and install packages for buildRUN echo "Dpkg::Options { "--force-confdef"; "--force-confnew"; };" > /etc/apt/apt.conf.d/localRUN apt-get -qq update && \apt-get -qq install -y --no-install-recommends software-properties-common && \add-apt-repository ppa:deadsnakes/ppa -y && \add-apt-repository universe && \apt-get -qq update && \DEBIAN_FRONTEND=noninteractive apt-get install -y curl tzdata build-essential daemontools && \apt-get install -y --no-install-recommends \python3.8 \python3.8-distutils \python3.8-dev \python3.8-venv \git && \apt-get clean# python3.8 -m ensurepip --default-pip && \RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.pyRUN python3.8 get-pip.pyRUN python3.8 -m pip install --no-cache-dir --upgrade pipARG PYTORCH_VERSION=1.12.1ARG TORCHVISION_VERSION=0.13.1ARG TORCHAUDIO_VERSION=0.12.1ARG TORCH_CUDA=116ARG TORCH_INDEX=whlRUN python3.8 -m pip install --no-cache-dir install torch==${PYTORCH_VERSION}+cu${TORCH_CUDA} \torchvision==${TORCHVISION_VERSION}+cu${TORCH_CUDA} \torchaudio==${TORCHAUDIO_VERSION}+cu${TORCH_CUDA} \--extra-index-url https://download.pytorch.org/${TORCH_INDEX}/cu${TORCH_CUDA}RUN python3.8 -m pip install --no-cache-dir install packagingRUN mkdir -p /tmp/build && \cd /tmp/build && \git clone https://github.com/NVIDIA/apex && \cd apex && \python3.8 -m pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ && \cd /tmp && \rm -r /tmp/build#### Python packagesRUN python3.8 -m pip install --no-cache-dir determined==0.19.2COPY requirements/requirements.txt .RUN python3.8 -m pip install --no-cache-dir -r requirements.txtCOPY requirements/requirements-onebitadam.txt .RUN python3.8 -m pip install --no-cache-dir -r requirements-onebitadam.txtCOPY requirements/requirements-sparseattention.txt .RUN python3.8 -m pip install -r requirements-sparseattention.txtRUN python3.8 -m pip install --no-cache-dir pybind11RUN python3.8 -m pip install --no-cache-dir protobuf==3.19.4RUN update-alternatives --install /usr/bin/python3 python /usr/bin/python3.8 2RUN echo 2 | update-alternatives --config python
Launch the experiment
To run the experiment, invoke det experiment create
from the root of the cloned repository.
$det experiment create core_api.yml .
Logging
You can track logs for this experiment using the Determined AI web UI, and visualize metrics using Weights & Biases (WandB). To use WandB, pass your WandB API key to an environment variable called WANDB_API_KEY
, or modify the function get_wandb_api_key()
in deepy.py
to return your API Token.
To configure your DeepSpeed experiment to run on multiple nodes, change the slots_per_trail
option to the number of GPUs you require. The maximum number of GPUs per node on CoreWeave is 8
, so the experiment will become multi-node once it reaches this threshold.