smpanaro
/

pythia-6.9b-AutoGPTQ-4bit-128g

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

pythia-6.9b-AutoGPTQ-4bit-128g / README.md

smpanaro's picture

Create README.md

856ca1f verified 7 months ago

|

history blame contribute delete

1.57 kB

	---
	license: mit
	datasets:
	- wikitext
	---

	[pythia-6.9b](https://huggingface.co./EleutherAI/pythia-6.9b) quantized to 4-bit using [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ).

	To use, first install AutoGPTQ:

	```shell
	pip install auto-gptq
	```

	Then load the model from the hub:
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

	model_name = "smpanaro/pythia-6.9b-AutoGPTQ-4bit-128g"
	model = AutoGPTQForCausalLM.from_quantized(model_name)
	```


	\|Model\|4-Bit Perplexity\|16-Bit Perplexity\|Delta\|
	\|--\|--\|--\|--\|
	\|[smpanaro/pythia-70m-AutoGPTQ-4bit-128g](https://huggingface.co./smpanaro/pythia-70m-AutoGPTQ-4bit-128g)\|49.125\|-\|-\|
	\|[smpanaro/pythia-160m-AutoGPTQ-4bit-128g](https://huggingface.co./smpanaro/pythia-160m-AutoGPTQ-4bit-128g)\|33.4375\|23.3024\|10.1351\|
	\|[smpanaro/pythia-410m-AutoGPTQ-4bit-128g](https://huggingface.co./smpanaro/pythia-410m-AutoGPTQ-4bit-128g)\|21.4688\|13.9838\|7.485\|
	\|[smpanaro/pythia-1b-AutoGPTQ-4bit-128g](https://huggingface.co./smpanaro/pythia-1b-AutoGPTQ-4bit-128g)\|12.0391\|11.6178\|0.4213\|
	\|[smpanaro/pythia-1.4b-AutoGPTQ-4bit-128g](https://huggingface.co./smpanaro/pythia-1.4b-AutoGPTQ-4bit-128g)\|10.9609\|10.4391\|0.5218\|
	\|[smpanaro/pythia-2.8b-AutoGPTQ-4bit-128g](https://huggingface.co./smpanaro/pythia-2.8b-AutoGPTQ-4bit-128g)\|9.8281\|9.0028\|0.8253\|
	\|smpanaro/pythia-6.9b-AutoGPTQ-4bit-128g\|8.5078\|8.2257\|0.2821\|


	<sub>Wikitext perplexity measured as in the [huggingface docs](https://huggingface.co./docs/transformers/en/perplexity), lower is better</sub>