git
installed on your terminal.finetune
.finetune-gpt-neox
PVC that you created earlier. Make sure that you actually add your PVC to the filebrowser list of mounts!determined
. This will bring up the Determined.ai (determined) application, which you can then deploy into your cluster.<YOUR_ACCESS_KEY>
and <YOUR_SECRET_KEY>
.LAS1 (Las Vegas)
<YOUR_ACCESS_KEY>
- this should be replaced by your actual Object Storage access key<YOUR_SECRET_KEY>
- this should be replaced by your actual Object Storage secret keyACCESS_KEY
and SECRET_KEY
once object storage has been configured for you. Contact support for more information.​+
to attach the finetune-gpt-neox
volume.finetune-gpt-neox
volume on the mount path /mnt/finetune-gpt-neox
.export DET_MASTER=...ord1.ingress.coreweave.cloud:80
command, found in the post-installation notes from the DeterminedAI deployment, prior to running the next command."text"
.data.jsonl
to filebrowser under finetune-gpt-neox
:gpt_finetune
under the finetune-gpt-neox
folder.gpt_finetune
directory in filebrowsertools/preprocess_data.py
. The arguments for this utility are listed below.gpt_finetune
is below:<data-dir>/<dataset-name>/<dataset-name>_text_document.bin
and <data-dir>/<dataset-name>/<dataset-name>_text_document.idx
.data-path
field.export DET_MASTER=...ord1.ingress.coreweave.cloud:80
command, found in the post-installation notes from the DeterminedAI deployment, prior to running the next command.logs
command:
det task logs -f <TASK_NAME_FROM_ABOVE>
examples/deepspeed/gpt_neox
.determined.ai
source code you cloned previously to the examples/deepspeed/gpt_neox
directory.determined-cluster.yml
file with the content below to configure the cluster for 96 GPUs in examples/deepspeed/gpt_neox/gpt_neox_config/determined-cluster.yml
. You may configure or change any of the optimizer values or training configurations to your needs. It is recommended to use the NeoX source code as reference when doing so.determined-cluster.yml
slots_per_trial: 8
in the following (next section) experiment configuration file finetune-gpt-neox.yml
:determined-cluster.yml
examples/deepspeed/gpt_neox
directoryfinetune-gpt-neox.yml
finetune-gpt-neox.yml
batches
, and slots_per_trail.
We use default values of 100
batches to finetune on with 50
batches before validation or early stopping, and 96 A40 GPUs
.<WANDB_GROUP>
and <WANDB_TEAM>
variables to your configuration file.