下载模型:https://huggingface.co/inclusionAI/Ling-mini-2.0-GGUF
下载专用llama.cpp: https://github.com/im0qianqian/llama.cpp/releases
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 4.547 sec
ggml_metal_device_init: GPU name: Apple M2 Max
ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 26800.60 MB
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| bailingmoe2 16B.A1B Q4_K - Medium | 9.22 GiB | 16.26 B | Metal,BLAS,RPC | 8 | pp512 | 2094.89 ± 40.78 |
| bailingmoe2 16B.A1B Q4_K - Medium | 9.22 GiB | 16.26 B | Metal,BLAS,RPC | 8 | tg128 | 146.07 ± 0.58 |
build: 58fb8dfc (6570)
增大输出生成测试:
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.006 sec
ggml_metal_device_init: GPU name: Apple M2 Max
ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 26800.60 MB
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| bailingmoe2 16B.A1B Q4_K - Medium | 9.22 GiB | 16.26 B | Metal,BLAS,RPC | 8 | pp512 | 2134.48 ± 5.25 |
| bailingmoe2 16B.A1B Q4_K - Medium | 9.22 GiB | 16.26 B | Metal,BLAS,RPC | 8 | tg1024 | 130.45 ± 0.59 |
build: 58fb8dfc (6570)
原创文章,转载请注明: 转载自诺德美地科技
本文链接地址: Ling-mini-2.0-GGUF 在M2 Max上的速度测试