CNCF: SandboxLatest Release: v2.8.0

Kubernetes virtualization middleware for heterogeneous AI infrastructure

HAMi enables sharing, isolation and scheduling for GPU/NPU/MLU resources so mixed accelerators run efficiently on one platform.

Quick Start Join Community

Runs on Kubernetes

AI Workloads

Training • Inference • Batch • Pipelines

HAMi Platform

Virtualization • Sharing • Isolation • Scheduling

Heterogeneous Accelerators

GPU • NPU • MLU • DCU

CNCF Sandbox Project

HAMi is a CNCF Sandbox project

The project is developed in the open under CNCF governance, with contributions from a growing global community of companies and individuals.

Why HAMi

Unified multiplexing across heterogeneous devices

Use one Kubernetes-native workflow to schedule GPU, NPU, MLU and other AI accelerators.

Fine-grained slicing and hard isolation

Allocate memory/core slices precisely for training and inference jobs in mixed workloads, with hard isolation enforced at runtime.

Dynamic controls and scheduling

Supports binpack, spread, node-topology-aware, and task-topology-aware scheduling policies to optimize resource utilization and placement.

Aligned with Kubernetes standards (DRA/CDI)

Build on standard interfaces to avoid lock-in and simplify long-term platform evolution.

Key Features

Kubernetes Native

Zero-change adoption path with Kubernetes-compatible APIs and deployment model.

Open and Vendor Neutral

Community-driven governance and hardware ecosystem support for diverse environments.

Resource Isolation & QoS

Control memory/core usage to improve fairness, reliability and utilization.

Unified Monitoring

Provide consistent metrics and operational visibility across device vendors.

Architecture & How It Works

HAMi works through two core paths: GPU virtualization/slicing and heterogeneous scheduling from request to isolated execution.

View full architecture docs →

HAMi Runtime Mechanism

GPU slicing & isolation

Heterogeneous scheduling & topology

Request entry / Runtime interface

PodSpec + Device Plugin / DRA + CDI runtime

Control Plane

Decision path

MutatingWebhook

HAMi Scheduler + Policy/Topology

Device binding decision

Data Plane

Enforcement path

Device Plugin + CDI injection

HAMi-Core hard isolation (memory/core)

Container workloads run

Resource semantics

nvidia.com/gpu + gpumem/gpucores

Before vs After HAMi

The same Pod requests enter whole-GPU allocation on the left and HAMi slicing on the right, showing how scheduling semantics change placement.

Same workload inputStep 3: place Pod C

Pod A0.3 GPU slicemem 30% · core 25%

Pod B0.25 GPU slicemem 24% · core 20%

Pod C0.2 GPU slicemem 18% · core 15%

Whole-GPU allocation

Without HAMi

Exclusive semantics

Each Pod is scheduled with whole-GPU semantics, so the unused portion of that card cannot be shared with another Pod.

Pod C

Pod C claims an entire GPU, so the remaining capacity on that card becomes stranded.

GPUs consumed3/3

Fragmentation36 cells stranded by exclusive GPU claims

GPU 0

exclusive allocation

31%

Pod A

GPU 1

exclusive allocation

25%

Pod B

GPU 2

exclusive allocation

14%

Pod C

Fine-grained slicing & policy scheduling

With HAMi

Shared slices

The same Pod requests are sliced first, then placed by policy to pack, spread, or respect topology locality.

Pod C

Pod C slice and pack onto the most loaded compatible gpu.

Active GPUs1/3

Policy result36 cells still schedulable for later packing

GPU 0

Topo A

shared slices

70%

Pod APod BPod C

GPU 1

Topo B

open capacity

Available

GPU 2

Topo C

open capacity

Available

Ecosystem & Device Support

Broad accelerator ecosystem across vendors. See docs for full support matrix.

View full supported devices list →

Adopters

The organizations below are evaluating or using HAMi in production environments.

Join the adopters list

Submit your organization through the contributor guide process.

See submission instructions →

Contributors

HAMi is advanced by contributors from the community and industry. These organizations actively participate in project development and ecosystem collaboration.