Configuring GPU¶
GPU resources can be configured on a Pipeline Vertex or a MonoVertex.
Prerequisites¶
Your cluster must support GPU scheduling and have the appropriate device plugin installed.
See the Kubernetes device plugins documentation and NVIDIA device plugin guide for details.
Adding Annotations (If Required)¶
In most clusters, specifying the GPU resource in the limits
field is enough. Some environments may require additional annotations for GPU access or monitoring. Check with your administrator if unsure.
Example:
vertices:
- name: gpu-vertex
metadata:
annotations:
mycompany.com/gpu-enabled: "true" # Replace with your annotation key and value
udf:
container:
image: my-ml-image:latest
resources:
limits:
nvidia.com/gpu: 1
Specifying GPU Resource Requests and Limits¶
Request a GPU by setting the nvidia.com/gpu
resource in the limits
field under the container section:
resources:
limits:
nvidia.com/gpu: 1
Important: For GPUs, Kubernetes requires that
requests
andlimits
must be the same (or specify onlylimits
).
Example: Vertex Requesting a GPU (with Annotations and Node Selector)¶
vertices:
- name: gpu-vertex
metadata:
annotations:
mycompany.com/gpu-enabled: "true" # Example annotation, use only if required by your cluster
nodeSelector:
nvidia.com/gpu.present: "true" # Replace with your cluster's GPU node label
udf:
container:
image: my-ml-image:latest
resources:
limits:
nvidia.com/gpu: 1
Adjust the nodeSelector and annotations as required by your cluster setup.
Dynamic Resource Allocation (Advanced)¶
For advanced GPU scheduling using Dynamic Resource Allocation (DRA), see Dynamic Resource Allocation documentation.
Troubleshooting¶
If your vertex is not using GPU resources as expected: - Pod Pending: Check for available GPU nodes and device plugin status. - Pod does not detect GPU: Ensure your container image includes necessary GPU drivers and libraries (e.g., CUDA). - Still having issues? Consult your cluster documentation or administrator.