dahara1
/

Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K

Inference Endpoints

Model card Files Files and versions Community

dahara1 commited on Nov 28, 2024

Commit

a86022d

·

verified ·

1 Parent(s): 216ced7

Update README.md

Files changed (1) hide show

README.md +13 -1

README.md CHANGED Viewed

@@ -40,7 +40,6 @@ CUDA_VISIBLE_DEVICES=0 ./llama.cpp/llama.cpp/build/bin/llama-server \
 私のテストプロンプトの実行時間: 3285.17秒
 My test prompt execution time: 3285.17 seconds
 ### Qwen2.5-0.5B-Instruct-Q4_K_Lを使いGPUメモリも更に最適化した版 A version using Qwen2.5-0.5B-Instruct-Q4_K_L with further optimization of GPU memory
 ```
 CUDA_VISIBLE_DEVICES=0 ./llama.cpp/llama.cpp/build/bin/llama-server \
@@ -53,6 +52,19 @@ CUDA_VISIBLE_DEVICES=0 ./llama.cpp/llama.cpp/build/bin/llama-server \
 私のテストプロンプトの実行時間: 2173.36秒
 My test prompt execution time: 2173.36 seconds
 なお、温度0でも単独でモデルを実行した際と微妙な差異が出るケースを確認してますので再現性が最重要な場合は注意してください
 I have confirmed cases where there are slight differences when running the model alone even at 0 temperature, so please be careful if reproducibility is paramount.

 私のテストプロンプトの実行時間: 3285.17秒
 My test prompt execution time: 3285.17 seconds
 ### Qwen2.5-0.5B-Instruct-Q4_K_Lを使いGPUメモリも更に最適化した版 A version using Qwen2.5-0.5B-Instruct-Q4_K_L with further optimization of GPU memory
 ```
 CUDA_VISIBLE_DEVICES=0 ./llama.cpp/llama.cpp/build/bin/llama-server \
 私のテストプロンプトの実行時間: 2173.36秒
 My test prompt execution time: 2173.36 seconds
+### CUDA指定なし CUDA device not specified
+```
+./llama.cpp/llama.cpp/build/bin/llama-server \
+    -m  ./llama.cpp/qwen/32B/Qwen2.5-32B-Instruct-Q8_0-f16.gguf \
+    -e --temp 0 -fa -c 4096
+```
+私のテストプロンプトの実行時間: 3787.47秒
+My test prompt execution time: 3787.47 seconds
 なお、温度0でも単独でモデルを実行した際と微妙な差異が出るケースを確認してますので再現性が最重要な場合は注意してください
 I have confirmed cases where there are slight differences when running the model alone even at 0 temperature, so please be careful if reproducibility is paramount.