v2.7.0
🚀 主要功能
- Metax sGPU topology aware,作者 (@Kyrie336) ,PR #1193
- NVIDIA Resourcequota,作者 (@FouoF) ,PR #1359
- Kunlunxin topology-aware scheduling,作者 (@FouoF) ,PR #1141
- Kunlunxin vxpu sopport #1016,作者 (@ouyangluwei163) (@archlitchi) ,PR #1337
- Enflame GCU topology-awareness (#1040),作者 (@zhaikangqi331) ,PR #1334
- AWS-neuron device and device-core allocation,作者 (@archlitchi) ,PR #1238
- Aggregated Scheduling Failure Events,作者 (@Wangmin362) ,PR #1333
🐛 主要 bug 修复
- fix: Before executing MIG partitioning, suppress NVML usage in o…,作者 (@Goend) ,PR #1095
- Fix golint-CI,作者 (@archlitchi) ,PR #1127
- fix: override node socre failure for kunlun #1137,作者 (@ouyangluwei163) ,PR #1138
- fix: Multi-node scoring nodes are inaccurate,作者 (@ouyangluwei163) ,PR #1147
- fix: An error occurred while create Iluvatar pod,作者 (@ouyangluwei163) ,PR #1149
- Fix e2e CI,作者 (@archlitchi) ,PR #1165
- fix: Add option for overwrite schedulerName,作者 (@Shouren) ,PR #1163
- fix: using go-safecast to fix incorrect conversion of numbers,作者 (@Shouren) ,PR #1183
- fix: deal with security issues reported by Trivy in image,作者 (@Shouren) ,PR #1189
- fix: wrong Pod's UID and emtpy Pod's name in log of webhook.go,作者 (@Shouren) ,PR #1092
- fix: concurrent map writes error in scheduler.calcScore #1269,作者 (@Shouren) ,PR #1270
- fix: release dangling node lock,作者 (@peachest) ,PR #1271
- fix: fix err which retrieved incorrect NUMA node information issue #1275,作者 (@abstractmj) ,PR #1276
- fix(security): resolve issues reported by Code scanning in Security,作者 (@Shouren) ,PR #1280
- fix: fix golangci-lint error,作者 (@DSFans2014) ,PR #1319
- Fix: device allocation missing containers with no device request,作者 (@FouoF) ,PR #1299
- fix: update int8Slice to uint8Slice for better type clarity and consistency,作者 (@yxxhero) ,PR #1357
📝 变更内容
📚 Documentation
- documentation: add Known Issues for dynamic mig support,作者 (@Goend) ,PR #1122
- docs: fix broken link,作者 (@lixd) ,PR #1125
- clearly list supported devices doc references at README,作者 (@FouoF) ,PR #1155
- docs: update ascend910b-support docs,作者 (@DSFans2014) ,PR #1321
🔨 其他变更
- Optimize Fit-in-device logic to make it device-specific,作者 (@archlitchi) ,PR #1097
- feat(scheduler): make node lock timeout configurable,作者 (@Kevinz857) ,PR #1117
- featue: mig mode-change #1116,作者 (@ouyangluwei163) ,PR #1124
- feat: Add new labels in .github/release.yml,作者 (@Shouren) ,PR #1066
- feat(scheduler-role): use a scoped-down role for scheduler,作者 (@Antvirf) ,PR #1152
- feat(helm): optionally disable admission webhook,作者 (@Antvirf) ,PR #1145
- remove redundant metrics for vgpu allocation,作者 (@FouoF) ,PR #1169
- refactor: clean up code and improve maintainability,作者 (@Wangmin362) ,PR #1195
- refactor: Ranging over SplitSeq is more efficient,作者 (@Shouren) ,PR #1239
- feat:NodeLockTimeout set from env,作者 (@miaobyte) ,PR #1244
- refactor: move watchAndFeedback function to feedback.go,作者 (@miaobyte) ,PR #1248
- feat: add informer-based pod cache to reduce API server load,作者 (@miaobyte) ,PR #1250
- feat: Add option to disable device plugin at values.yaml.,作者 (@FouoF) ,PR #1274
- refactor(util/nodelock): replace manual polling with k8s.io/client-go/util/retry,作者 (@mayooot) ,PR #1252
- refactor: Remove annotation in Devices interfaces,作者 (@Shouren) ,PR #1343
- feat: update the
Ascend910scheduling policy,作者 (@DSFans2014) ,PR #1344 - feat(nvidia): default gpucores=100 when memory is exclusive and cores…,作者 (@xrwang8) ,PR #1354
- Prerelease-v2.6,作者 (@archlitchi) ,PR #1108
- add new reviewers Shouren and ouyangluwei163,作者 (@wawa0210) ,PR #1131
- Support topology-awareness for Kunlunxin device,作者 (@archlitchi) ,PR #1121
- Support Metax sGPU Qos Policy,作者 (@Kyrie336) ,PR #1123
- add global image for chart,作者 (@calvin0327) ,PR #1133
- fix: Skip admission webhook when Pod's scheduler is already assigned.,作者 (@ghostloda) ,PR #1041
- Add node configs to docs,作者 (@wylswz) ,PR #1159
- build(deps): upgrade golang to 1.24.4,作者 (@Shouren) ,PR #1172
- build(deps): Upgrade golang image in ci to 1.24.4,作者 (@Shouren) ,PR #1176
- build(deps): Upgrade controller-runtime to 0.21.0,作者 (@Shouren) ,PR #1171
- build(deps): Dump github.com/NVIDIA/nvidia-container-toolkit,作者 (@Shouren) ,PR #1170
- Add unit tests for Fit Function for enflame,hygon, metax, mthreads, nvidia,作者 (@Wangmin362) ,PR #1199
- [Misc] update hami-core version,作者 (@chaunceyjiang) ,PR #1201
- Improve the impl of DevicePluginConfigs.Nodeconfig overwriting NvidiaConfig,作者 (@FouoF) ,PR #1158
- Add unit tests for cambricon's Fit Function,作者 (@Wangmin362) ,PR #1198
- Add unit tests for Ascend's Fit Function,作者 (@Wangmin362) ,PR #1197
- 修复生成 pod 请求资源时不必要的重复计算,作者 (@litaixun) ,PR #1215
- 修复更新节点注解时的日志提示词,作者 (@litaixun) ,PR #1214
- If the mem applied for the Mig device is the same as the template value,>will result in CardNotFoundCustom Filter Rule.,作者 (@zgqqiang) ,PR #1179
- updated dri section to combine text for better readability,作者 (@mpetason) ,PR #1216
- feat: Add nvidia gpu topoloy scheduler,作者 (@fyp711) ,PR #1028
- add issue translate robot,作者 (@wawa0210) ,PR #1232
- add issue translate robot,作者 (@wawa0210) ,PR #1234
- perf(util/nodelock): Use clientset Patch instead of Update.,作者 (@mayooot) ,PR #1192
- Update hami-core and fix readme documents,作者 (@archlitchi) ,PR #1240
- Update hami-core version to fix,作者 (@archlitchi) ,PR #1256
- [Snyk] Security upgrade tensorflow/tensorflow from latest-gpu to 2.20.0rc0-gpu,作者 (@wawa0210) ,PR #1243
- feat: Add an action of 'Close stale issue and PRs' in github worklfow,作者 (@Shouren) ,PR #1083
- Welcome fyp711 to become a HAMi member,作者 (@wawa0210) ,PR #1288
- Add values readme,作者 (@clcc2019) ,PR #1267
- Support Metax sGPU device health check,作者 (@Kyrie336) ,PR #1295
- Optimize pkg/util.go and distribute logics to corresponding logics,作者 (@archlitchi) ,PR #1296
- cleanup: Clear and correct ascend device name,作者 (@FouoF) ,PR #1315
- bugfix: Nvidia card abnormal pod will still continue to schedule,作者 (@zgqqiang) ,PR #1336
- FIx CI, add 910B4-1 template and fix vGPUmonitor metrics error,作者 (@archlitchi) ,PR #1345
- add httpTargetPort to values.yaml,作者 (@flpanbin) ,PR #1356
- Update kunlunxin documents,作者 (@archlitchi) ,PR #1366
- update chart version and hami-core,作者 (@archlitchi) ,PR #1369
贡献者:🆕 新贡献者
- Kevinz857 (@Kevinz857)
- FouoF (@FouoF)
- Antvirf (@Antvirf)
- wylswz (@wylswz)
- litaixun (@litaixun)
- zgqqiang (@zgqqiang)
- mpetason (@mpetason)
- fyp711 (@fyp711)
- mayooot (@mayooot)
- miaobyte (@miaobyte)
- peachest (@peachest)
- abstractmj (@abstractmj)
- clcc2019 (@clcc2019)
- DSFans2014 (@DSFans2014)
- xrwang8 (@xrwang8)
完整更新日志: https://github.com/Project-HAMi/HAMi/compare/v2.6.1...v2.7.0









