morgendigital
/

h2ogpt-4096-llama2-13b-chat-GGUF

Model card Files Files and versions Community

h2ogpt-4096-llama2-13b-chat-GGUF / README.md

freefallr's picture

Update README.md

5abf327 about 1 year ago

|

history blame contribute delete

1.84 kB

	---
	license: llama2
	---
	# What is it?
	This is a quantized version of h2oai/h2ogpt-4096-llama2-13b-chat, formatted in GGUF format to be run with llama.cpp and similar inference tools. The convert.py script from llama.cpp was used for the conversion.

	## Available Formats
	\| Format \| Bits \| Use case \|
	\| ---- \| ---- \| ----- \|
	\| q8_0 \| 8 \| Original quant method, 8-bit. \|

	# Original Model Card

	h2oGPT clone of [Meta's Llama 2 13B Chat](https://huggingface.co./meta-llama/Llama-2-13b-chat-hf).

	Try it live on our [h2oGPT demo](https://gpt.h2o.ai) with side-by-side LLM comparisons and private document chat!

	See how it compares to other models on our [LLM Leaderboard](https://evalgpt.ai/)!

	See more at [H2O.ai](https://h2o.ai/)


	## Model Architecture

	```
	LlamaForCausalLM(
	(model): LlamaModel(
	(embed_tokens): Embedding(32000, 5120, padding_idx=0)
	(layers): ModuleList(
	(0-39): 40 x LlamaDecoderLayer(
	(self_attn): LlamaAttention(
	(q_proj): Linear(in_features=5120, out_features=5120, bias=False)
	(k_proj): Linear(in_features=5120, out_features=5120, bias=False)
	(v_proj): Linear(in_features=5120, out_features=5120, bias=False)
	(o_proj): Linear(in_features=5120, out_features=5120, bias=False)
	(rotary_emb): LlamaRotaryEmbedding()
	)
	(mlp): LlamaMLP(
	(gate_proj): Linear(in_features=5120, out_features=13824, bias=False)
	(up_proj): Linear(in_features=5120, out_features=13824, bias=False)
	(down_proj): Linear(in_features=13824, out_features=5120, bias=False)
	(act_fn): SiLUActivation()
	)
	(input_layernorm): LlamaRMSNorm()
	(post_attention_layernorm): LlamaRMSNorm()
	)
	)
	(norm): LlamaRMSNorm()
	)
	(lm_head): Linear(in_features=5120, out_features=32000, bias=False)
	)
	```