Ling-mini-2.0-GGUF 在MI50上的速度测试

下载模型:https://huggingface.co/inclusionAI/Ling-mini-2.0-GGUF

下载专用llama.cpp: https://github.com/im0qianqian/llama.cpp/releases

load_backend: loaded RPC backend from /root/build/bin/libggml-rpc.so
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV VEGA20) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from /root/build/bin/libggml-vulkan.so
load_backend: loaded CPU backend from /root/build/bin/libggml-cpu-haswell.so
| model                          |       size |     params | backend    | ngl | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| bailingmoe2 16B.A1B Q4_K - Medium |   9.22 GiB |    16.26 B | RPC,Vulkan |  99 |    1024 |           pp512 |       956.02 ± 10.17 |
| bailingmoe2 16B.A1B Q4_K - Medium |   9.22 GiB |    16.26 B | RPC,Vulkan |  99 |    1024 |           tg128 |        187.39 ± 0.84 |

build: 58fb8dfc (6570)

增大输出生成测试:

load_backend: loaded RPC backend from /root/build/bin/libggml-rpc.so
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV VEGA20) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from /root/build/bin/libggml-vulkan.so
load_backend: loaded CPU backend from /root/build/bin/libggml-cpu-haswell.so
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| bailingmoe2 16B.A1B Q4_K - Medium |   9.22 GiB |    16.26 B | RPC,Vulkan |  99 |           pp512 |       956.25 ± 10.76 |
| bailingmoe2 16B.A1B Q4_K - Medium |   9.22 GiB |    16.26 B | RPC,Vulkan |  99 |          tg1024 |        181.47 ± 1.52 |

达到180 tokens/s 的生成速度,非常快了。Ling-mini-2.0 紧凑而强大。它共有160亿个参数,但每个输入标记仅激活14亿个参数(非嵌入部分为7890万)。

原创文章,转载请注明: 转载自诺德美地科技

本文链接地址: Ling-mini-2.0-GGUF 在MI50上的速度测试

发表评论