dahara1
/

Qwen2.5-0.5B-Instruct-gguf-japanese-imatrix-128K

Inference Endpoints

Model card Files Files and versions Community

dahara1 commited on Nov 28, 2024

Commit

0547ef2

·

verified ·

1 Parent(s): 2cb7222

Update README.md

Files changed (1) hide show

README.md +8 -4

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ CUDA_VISIBLE_DEVICES=0 ./llama.cpp/llama.cpp/build/bin/llama-server \
     -ngl 10 -ngld 10 -e --temp 0 -fa -c 4096 \
     --draft-max 16 --draft-min 5
 ```
-私のテストプロンプトの実行時間: 2520.65秒
 My test prompt execution time: 2520.65 seconds
@@ -31,10 +31,14 @@ CUDA_VISIBLE_DEVICES=0 ./llama.cpp/llama.cpp/build/bin/llama-server \
     -m  ./llama.cpp/qwen/32B/Qwen2.5-32B-Instruct-Q8_0-f16.gguf \
     -ngl 10 -e --temp 0 -fa -c 4096
 ```
-私のテストプロンプトの実行時間: 3240.36秒
 My test prompt execution time: 3240.36 seconds
-詳細は[llama.cppの公式ページ](https://github.com/ggerganov/llama.cpp/pull/10455)をご覧ください
-For more information, see the official [llama.cpp page](https://github.com/ggerganov/llama.cpp/pull/10455).

     -ngl 10 -ngld 10 -e --temp 0 -fa -c 4096 \
     --draft-max 16 --draft-min 5
 ```
+私のテストプロンプトの実行時間: 2520.65秒
 My test prompt execution time: 2520.65 seconds
     -m  ./llama.cpp/qwen/32B/Qwen2.5-32B-Instruct-Q8_0-f16.gguf \
     -ngl 10 -e --temp 0 -fa -c 4096
 ```
+私のテストプロンプトの実行時間: 3240.36秒
 My test prompt execution time: 3240.36 seconds
+クライアントスクリプトの例は[dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K)をご覧ください
+See [dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K) for cliant example.
+コマンドの詳細は[llama.cppの公式ページ](https://github.com/ggerganov/llama.cpp/pull/10455)をご覧ください
+For more command information, see the official [llama.cpp page](https://github.com/ggerganov/llama.cpp/pull/10455).