Using Kueue with HAMi
This guide will help you use Kueue to manage HAMi vGPU resources, including enabling Deployment support, configuring ResourceTransformation, and creating workloads that request vGPU resources.
Prerequisites
Before you begin, ensure that:
- HAMi is installed in your cluster
- Kueue is installed in your cluster
Quick Start
Install Kueue
If Kueue is not already installed, you can install it using:
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.15.0/manifests.yaml
After installation, Kueue controllers will run in the kueue-system namespace.
Enable Deployment Support in Kueue
Kueue supports managing Deployments, but you need to enable the deployment integration in the Kueue configuration.
Configure Kueue
Edit the Kueue manager configuration to enable Deployment support:
kubectl edit configmap kueue-manager-config -n kueue-system
Add the following configuration:
apiVersion: config.kueue.x-k8s.io/v1beta2
kind: Configuration
integrations:
frameworks:
- "deployment"
- "pod"
Starting from Kueue v0.15, you don't need to explicitly enable the "pod" integration to use "deployment" integration. For Kueue v0.14 and earlier versions, you need to explicitly enable the "pod" integration.
After updating the configuration, restart the Kueue manager:
kubectl rollout restart deployment kueue-manager -n kueue-system
Configure ResourceTransformation
ResourceTransformation allows Kueue to transform resource requests from one form to another. For HAMi vGPU resources, this enables Kueue to track GPU resources using aggregated metrics like nvidia.com/total-gpucores and nvidia.com/total-gpumem instead of individual nvidia.com/gpu requests.
Edit the Kueue manager configuration to add ResourceTransformation:
kubectl edit configmap kueue-manager-config -n kueue-system
Add the ResourceTransformation configuration to the same Configuration object:
apiVersion: config.kueue.x-k8s.io/v1beta2
kind: Configuration
integrations:
frameworks:
- "deployment"
- "pod"
resources:
transformations:
- input: nvidia.com/gpucores
strategy: Replace
multiplyBy: nvidia.com/gpu
outputs:
nvidia.com/total-gpucores: "1"
- input: nvidia.com/gpumem
strategy: Replace
multiplyBy: nvidia.com/gpu
outputs:
nvidia.com/total-gpumem: "1"
After updating the configuration, restart the Kueue manager:
kubectl rollout restart deployment kueue-manager -n kueue-system
Create ResourceFlavor
First, create a ResourceFlavor:
apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
name: hami-flavor
spec:
nodeLabels:
gpu: "on"
This ResourceFlavor will only apply to nodes with vGPU functionality enabled.
Create ClusterQueue
Create a ClusterQueue to track HAMi resources:
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
name: hami-queue
spec:
resourceGroups:
- coveredResources: ["nvidia.com/gpu", "nvidia.com/total-gpucores", "nvidia.com/total-gpumem"]
flavors:
- name: hami-flavor
resources:
- name: "nvidia.com/gpu"
nominalQuota: 80
- name: "nvidia.com/total-gpucores"
nominalQuota: 200
- name: "nvidia.com/total-gpumem"
nominalQuota: 10240
The ClusterQueue tracks:
nvidia.com/total-gpucores: Total GPU cores across all vGPUs (each unit represents 1% of a GPU core)nvidia.com/total-gpumem: Total GPU memory across all vGPUs (in MiB)
Create LocalQueue
Create a LocalQueue that references the ClusterQueue:
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
name: hami-local-queue
namespace: default
spec:
clusterQueue: hami-queue
Create Workloads Requesting vGPU Resources
Now you can create Deployments that request vGPU resources. Kueue will automatically transform the resource requests using ResourceTransformation.
Basic vGPU Deployment
Here's an example Deployment that requests vGPU resources:
apiVersion: apps/v1
kind: Deployment
metadata:
name: vgpu-deployment
labels:
app: vgpu-app
kueue.x-k8s.io/queue-name: hami-local-queue
spec:
replicas: 2
selector:
matchLabels:
app: vgpu-app
template:
metadata:
labels:
app: vgpu-app
spec:
containers:
- name: vgpu-container
image: nvidia/cuda:11.8.0-base-ubuntu22.04
command: ["sleep", "infinity"]
resources:
limits:
nvidia.com/gpu: 1
nvidia.com/gpucores: 50
nvidia.com/gpumem: 1024
In this example:
kueue.x-k8s.io/queue-namelabel associates the Deployment with thehami-local-queuenvidia.com/gpu: 1requests 1 vGPUnvidia.com/gpucores: 50requests 50% of GPU cores for each vGPUnvidia.com/gpumem: 1024requests 1024 MiB of GPU memory for each vGPU
Deployment with Multiple vGPUs
You can also create a Deployment that requests multiple vGPUs:
apiVersion: apps/v1
kind: Deployment
metadata:
name: multi-vgpu-deployment
labels:
app: multi-vgpu-app
kueue.x-k8s.io/queue-name: hami-local-queue
spec:
replicas: 1
selector:
matchLabels:
app: multi-vgpu-app
template:
metadata:
labels:
app: multi-vgpu-app
spec:
containers:
- name: vgpu-container
image: nvidia/cuda:11.8.0-base-ubuntu22.04
command: ["sleep", "infinity"]
resources:
limits:
nvidia.com/gpu: 2
nvidia.com/gpucores: 30
nvidia.com/gpumem: 1024
This Deployment requests 2 vGPUs, each with 30% GPU cores and 1024 MiB GPU memory.
Verify Resource Usage
To check the resource usage in the ClusterQueue, inspect the status.flavorsReservation field:
kubectl get clusterqueue hami-queue -o yaml
The status.flavorsReservation shows the current resource consumption:
status:
flavorsReservation:
- name: hami-flavor
resources:
- name: nvidia.com/total-gpucores
total: "160" # Current usage: (2 replicas × 1 GPU × 50 cores) + (1 replica × 2 GPUs × 30 cores) = 160
- name: nvidia.com/total-gpumem
total: "4096" # Current usage: (2 replicas × 1 GPU × 1024 MiB) + (1 replica × 2 GPUs × 1024 MiB) = 4096
Resource Transformation Details
Kueue's ResourceTransformation automatically converts HAMi vGPU resource requests:
nvidia.com/gpu×nvidia.com/gpucores→nvidia.com/total-gpucoresnvidia.com/gpu×nvidia.com/gpumem→nvidia.com/total-gpumem
For example:
- A Deployment with 2 replicas, each requesting
nvidia.com/gpu: 1,nvidia.com/gpucores: 50, andnvidia.com/gpumem: 1024 - Will consume:
nvidia.com/total-gpucores: 100(2 replicas × 1 GPU × 50 cores) andnvidia.com/total-gpumem: 2048(2 replicas × 1 GPU × 1024 MiB)
Example
See the example file for a complete working example.