gguf格式量化方法

gguf格式下,各种量化方法后的支持情况,及运行速度

Library CPU (AVX2) CPU (ARM NEON) Metal cuBLAS rocBLAS SYCL CLBlast Vulkan Kompute
K-quants ✅ 慢 ✅慢
I-quants ✅慢 ✅慢 ✅慢 Partial¹
Multi-GPU N/A N/A N/A
K cache quants ✅ 慢 Partial⁶慢
MoE architecture Partial²

Note:

  • ✅: Supported
  • ❓: Not supported
  • N/A: Not applicable
  • Partial¹: Partially supported
  • Partial²: Partially supported
  • Partial⁶: Partially supported
  • 🐢⁴: Limited support
  • 🐢⁵: Limited support
  • 🐢³: Limited support
posted @ 2024-08-20 09:19  立体风  阅读(54)  评论(0编辑  收藏  举报