3Simplex
/

Meta-Llama-3.1-8B-Instruct-gguf

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Meta-Llama-3.1-8B-Instruct-gguf / README.md

3Simplex's picture

Update README.md

1e9155a verified 4 months ago

|

history blame contribute delete

1.14 kB

	---
	license: llama3.1
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	pipeline_tag: text-generation
	tags:
	- text-generation-inference
	---
	> [!WARNING]
	> At the time of this release, llama.cpp did not support the rope scaling required for full context (limit is 8192). Soon this will be updated for full 128K functionality.
	> Depriciated models still listed do not have 128k mark.

	> [!NOTE]
	> The new release of llama.cpp and transformers have been applied and the gguf was tested.
	> [Meta-Llama-3.1-8B-Instruct-128k](https://huggingface.co./3Simplex/Meta-Llama-3.1-8B-Instruct-gguf/blob/main/Meta-Llama-3.1-8B-Instruct-128k-Q4_0.gguf)
	> You will need to update llama.cpp and transformers to use the full context.

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/645e666bb5c9a8666d0d99c5/9T9q6k90ZGa5EJKeSMbru.png)


	## Prompt Template

	```
	<\|start_header_id\|>system<\|end_header_id\|>

	{system_prompt}<\|eot_id\|>
	```

	```
	<\|start_header_id\|>user<\|end_header_id\|>

	{user_input}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	{assistant_response}
	```

	## 128k Context Length
	"llama.context_length": 131072