namespace-Pt
/

Llama-3-8B-Instruct-80K-QLoRA-Merged-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

namespace-Pt commited on May 1

Commit

12d942d

•

1 Parent(s): ba248bd

Update README.md

Files changed (1) hide show

README.md +1 -3

README.md CHANGED Viewed

@@ -13,9 +13,7 @@ We extend the context length of Llama-3-8B-Instruct to 80K using QLoRA and 3.5K
 **NOTE**: This repo contains the quantized model of [namespace-Pt/Llama-3-8B-Instruct-80K-QLoRA-Merged](https://huggingface.co/namespace-Pt/Llama-3-8B-Instruct-80K-QLoRA-Merged). The quantization is conducted with [llama.cpp](https://github.com/ggerganov/llama.cpp) (Q4_K_M and Q8_0).
-All the following evaluation results are based on the [UNQUANTIZED MODEL](https://huggingface.co/namespace-Pt/Llama-3-8B-Instruct-80K-QLoRA-Merged). They can be reproduced following instructions [here](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/longllm_qlora).
-**NOTE**: After quantization, you may observe quality degradation.
 ## Needle in a Haystack
 We evaluate the model on the Needle-In-A-HayStack task using the official setting. The blue vertical line indicates the training context length, i.e. 80K.

 **NOTE**: This repo contains the quantized model of [namespace-Pt/Llama-3-8B-Instruct-80K-QLoRA-Merged](https://huggingface.co/namespace-Pt/Llama-3-8B-Instruct-80K-QLoRA-Merged). The quantization is conducted with [llama.cpp](https://github.com/ggerganov/llama.cpp) (Q4_K_M and Q8_0).
+All the following evaluation results are based on the [UNQUANTIZED MODEL](https://huggingface.co/namespace-Pt/Llama-3-8B-Instruct-80K-QLoRA-Merged). They can be reproduced following instructions [here](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/longllm_qlora). However, after quantization, you may observe **quality degradation**.
 ## Needle in a Haystack
 We evaluate the model on the Needle-In-A-HayStack task using the official setting. The blue vertical line indicates the training context length, i.e. 80K.