Skip to main content

SUNK Parameter Reference

Version: 0.1.0 Type: application

Requirements

RepositoryNameVersion
oci://ghcr.io/coreweave/k8s-device-plugin/chartsnvidia-device-plugin0.17.0-5c8a50df

Parameters

Key & DescriptionTypeDefault

imagePullSecrets
Image pull secrets to configure if using custom private images.

list
[]

nvidia-device-plugin
Options for the Coreweave fork of the Nvidia device plugin chart. This chart builds on the default configuration provided by Nvidia, and uses these default chart values.

objectSee default chart values.

operator.affinity
The affinity for the operator deployment.

object
null

operator.config.operator.nodeSet.failedPodsBackoffGCInterval
The time that has to pass before next iteration of backoff GC is run for checking failed pods.

string
"1m"

operator.config.operator.nodeSet.maxBurstReplicas
A rate limiter for booting pods when there are a lot of pods. A too high of a value can cause registry DoS issues.

int
250

operator.config.operator.nodeSet.statusUpdateBackoffGCInterval
The time that has to pass before next iteration of backoff GC is run for checking node status updates.

string
"1m"

operator.config.operator.nodeSlice.maxNodesPerNodeSlice
The maximum number of nodes that can be in a single nodeSlice.

int
100

operator.image
The image to use for the operator.

object
repository: registry.gitlab.com/coreweave/sunk/operator
tag:

operator.leaderElection.enabled
This forces the operator to use leader election even if the number of replicas is set to 1.
Useful if planning to scale after deployment.

bool
false

operator.leaderElection.leaderElectionID
The string value to use as the leader election id.

string
null

operator.logLevel
The log level.
Uses integers or zap log level strings:

  • debug
  • info
  • warn
  • error
  • dpanic
  • panic
  • fatal

string
"info"
operator.maxConcurrentReconciles
int
10

operator.podMonitor.enabled
Enable monitoring via Prometheus operator PodMonitor CRD.

bool
true

operator.priorityClassName
The priority class name for the operator.

string
null

operator.replicas
The number of replicas of the operator pod to run.
Leader election will be enabled if this is greater than 1 or leader election is explicitly enabled.

int
1

operator.resources
The resource to request for the operator.

object
limits:
memory: 2Gi
requests:
cpu: 2
memory: 2Gi

operator.tolerations
The tolerations for the operator deployment.

list
[]

priorityClass.enabled
Enable the priority class for the control plane components.

bool
true

priorityClass.value
The value of the priority class, generally should be high relative to other priority classes as these are critical components.

int
1000000000

scheduler.podMonitor.enabled
Enable monitoring via Prometheus operator PodMonitor CRD.

bool
true

syncer.podMonitor.enabled
Enable monitoring via Prometheus operator PodMonitor CRD.

bool
true