v2.6.0

2025年6月7日

🚀 主要功能

Optimize scheduler log
Support enflame gcu-share
Support metax GPU and metax sGPU
Helm chart add checksum annotation for restarting hami component after ConfigMap modification
Support for using RuntimeClass with nvidia devices
Add support for profiling via net/http/pprof package
Add nvidia gpu topoloy score registry to node
Feat: vGPUmonitor support MigInfo metrics

🐛 主要 bug 修复

Fix stuck in driver 570+
Fix device memory not counted properly in comfyUI task
Fix cambricon devices not allocated properly
Fix wrong log and container request device count error
Fix vgpu-devices-allocated annotations are inconsistent
Fix removing node devices from node manager
Fix: Dynamic GPU partitioning lacks single-GPU-level granularity
Fix device memory count error on cuMallocAsync
Fix scheduler crash if a 'mig' task running accidentally on a 'hami-core' GPU
Fix multi-process device memory count

📝 变更内容

⬆️ Dependencies

Bump docker/build-push-action from 6.11.0 to 6.13.0，作者 (@dependabot) ，PR #837
Bump golang.org/x/net from 0.26.0 to 0.35.0，作者 (@dependabot) ，PR #859
Bump aquasecurity/trivy-action from 0.29.0 to 0.30.0，作者 (@dependabot) ，PR #941
Bump docker/login-action from 3.3.0 to 3.4.0，作者 (@dependabot) ，PR #942
Bump docker/build-push-action from 6.13.0 to 6.15.0，作者 (@dependabot) ，PR #899
build(deps): bump docker/build-push-action from 6.15.0 to 6.16.0，作者 (@dependabot) ，PR #1024
build(deps): bump docker/build-push-action from 6.16.0 to 6.17.0，作者 (@dependabot) ，PR #1052
build(deps): bump docker/build-push-action from 6.17.0 to 6.18.0，作者 (@dependabot) ，PR #1091

🔨 其他变更

fix: Enhance GPU metrics collection and error handling in vGPU monitor，作者 (@haitwang-cloud) ，PR #827
refactor: update service configurations for device plugin and scheduler，作者 (@haitwang-cloud) ，PR #799
add ut for scheduler/score，作者 (@shijinye) ，PR #853
add ut for device/metax，作者 (@shijinye) ，PR #850
Remove duplicate log fields，作者 (@learner0810) ，PR #860
[docs] Fix default nvidia.resourceCoreName value in config.md，作者 (@chinaran) ，PR #842
Update libvgpu.so，作者 (@archlitchi) ，PR #876
update example.png，作者 (@rockpanda) ，PR #874
support ascend 910B2，作者 (@ouyangluwei163) ，PR #885
fix docs typos，作者 (@JinVei) ，PR #869
Accelerate node score calculations using multiple goroutines，作者 (@learner0810) ，PR #824
Support Metax SGPU to sharing GPU，作者 (@Kyrie336) ，PR #895
docs: fix broken commmunity links，作者 (@agilgur5) ，PR #907
add config gpu core isolation policy for webhook，作者 (@lengrongfu) ，PR #901
feat: support scheduler replicas > 1，作者 (@Azusa-Yuan) ，PR #898
docs: add syntax highlighting to various code blocks，作者 (@agilgur5) ，PR #906
Fix UT not be properly executed during CI phase，作者 (@archlitchi) ，PR #911
typo: fix typos in log and comment，作者 (@popsiclexu) ，PR #917
feat: Add kube-qps and kube-burst parameters.，作者 (@chaunceyjiang) ，PR #769
docs: Update MAINTAINERS file with current contributor information，作者 (@Nimbus318) ，PR #918
Nominate chaunceyjiang to reviewer，作者 (@chaunceyjiang) ，PR #926
build: update dependencies and remove unused cdiapi，作者 (@yxxhero) ，PR #903
add lengrongfu to reviewers，作者 (@lengrongfu) ，PR #937
chore: add namespace override for multi-namespace deployments，作者 (@chinaran) ，PR #924
fix: hygon dcu concurrent creation conflict，作者 (@joy717) ，PR #921
Fix the wrong describe of device registry in protocol.md，作者 (@hurricane1988) ，PR #910
chore: helm chart support scheduler webhook cert-manager，作者 (@chinaran) ，PR #951
refactor(scheduler): replace init methods with constructor functions，作者 (@yxxhero) ，PR #905
add Dependencies policy and Security policy，作者 (@yangshiqi) ，PR #934
scheduler: fix blocked the nodeNotify channel when node changes，作者 (@Iceber) ，PR #964
docs: Update Ascend910 support documentation，作者 (@zhaikangqi331) ，PR #988
update iluvatar's docs，作者 (@yangshiqi) ，PR #995
refactor: replace interface{} with any in various files，作者 (@yxxhero) ，PR #1000
scheduler: fix duplicate handling of the node label selector，作者 (@Iceber) ，PR #965
refactor(.github/workflows/ci.yaml): Update golangci-lint to v2.0 and modify .golangci.yaml，作者 (@yxxhero) ，PR #1002
update hami arch，作者 (@wawa0210) ，PR #1007
Update README.md，作者 (@yowenter) ，PR #1005
refactor: simplify code by using modern constructs，作者 (@Shouren) ，PR #978
scheduler: fix removing node devices from node manager，作者 (@Iceber) ，PR #966
feat: Add support for profiling via net/http/pprof package，作者 (@Shouren) ，PR #963
Support Enflame gcushare for enflame devices，作者 (@archlitchi) ，PR #1013
docs: Remove ACTIVE_OOM_KILLER environment variable description，作者 (@chinaran) ，PR #1015
refactor(vGPUmonitor): change Run to RunE and return errors，作者 (@yxxhero) ，PR #999
refactored the filter logs and event messages to enhance their clarity,，作者 (@Wangmin362) ，PR #1023
feat: Support for using RuntimeClass with nvidia devices，作者 (@chinaran) ，PR #1021
fix wrong log and container request device count error，作者 (@Wangmin362) ，PR #1020
feat: helm chart add checksum annotation for restarting hami component after ConfigMap modification，作者 (@chinaran) ，PR #1022
fix vgpu-devices-allocated annotations are inconsistent #991，作者 (@ouyangluwei163) ，PR #1012
add Enflame GCU S60 into roadmap.，作者 (@winston-zhang-orz) ，PR #1030
add nvidia-smi command show cuda version info，作者 (@lengrongfu) ，PR #953
Separate options from client to make the responsibility more clear.，作者 (@yangshiqi) ，PR #938
Add nvidia gpu topoloy score registry to node，作者 (@lengrongfu) ，PR #1018
fix(cicd): update ci.yaml to upload coverage to Codecov，作者 (@Shouren) ，PR #1056
feat(Actions): Add an action to label pr automatically，作者 (@Shouren) ，PR #1053
fix: Improve Metax GPU usability and fix related issues，作者 (@Kyrie336) ，PR #1063
fix(chart): support GKE pre-release versions via kubeVersion '-0'，作者 (@Nimbus318) ，PR #1072
fix: Dynamic GPU partitioning lacks single-GPU-level granularity. (#1…，作者 (@Goend) ，PR #1061
update maintainer information，作者 (@wawa0210) ，PR #1079
add LIBCUDA_LOG_LEVEL env to device-plugin，作者 (@lengrongfu) ，PR #1087
fix: missing apiVersion in serviceMonitor dashboard docs，作者 (@ntheanh201) ，PR #1077
test(pkg/util): Add some unit tests for pkg/util，作者 (@Shouren) ，PR #1067
feat: vGPUmonitor support MigInfo metrics，作者 (@ouyangluwei163) ，PR #1048
update hami-core version，作者 (@lengrongfu) ，PR #1082

贡献者：🆕 新贡献者

rockpanda (@rockpanda)
ouyangluwei163 (@ouyangluwei163)
JinVei (@JinVei)
Shouren (@Shouren)
Kyrie336 (@Kyrie336)
agilgur5 (@agilgur5)
Azusa-Yuan (@Azusa-Yuan)
popsiclexu (@popsiclexu)
hurricane1988 (@hurricane1988)
Iceber (@Iceber)
zhaikangqi331 (@zhaikangqi331)
yowenter (@yowenter)
Wangmin362 (@Wangmin362)
winston-zhang-orz (@winston-zhang-orz)
Goend (@Goend)
ntheanh201 (@ntheanh201)

完整更新日志: https://github.com/Project-HAMi/HAMi/compare/v2.5.3...v2.6.0

🚀 主要功能​

🐛 主要 bug 修复​

📝 变更内容​

⬆️ Dependencies​

🔨 其他变更​

贡献者：🆕 新贡献者​

🚀 主要功能

🐛 主要 bug 修复

📝 变更内容

⬆️ Dependencies

🔨 其他变更

贡献者：🆕 新贡献者