smpanaro
/

pythia-160m-AutoGPTQ-4bit-128g

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

pythia-160m-AutoGPTQ-4bit-128g / README.md

smpanaro's picture

Create README.md

c2667aa verified 7 months ago

|

803 Bytes

	---
	license: mit
	datasets:
	- wikitext
	---

	[pythia-160m](https://huggingface.co./EleutherAI/pythia-160m) quantized to 4-bit using [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ).

	To use, first install AutoGPTQ:

	```shell
	pip install auto-gptq
	```

	Then load the model from the hub:
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

	model_name = "smpanaro/pythia-160m-AutoGPTQ-4bit-128g"
	model = AutoGPTQForCausalLM.from_quantized(model_name)
	```


	\|Model\|4-Bit Perplexity\|16-Bit Perplexity\|Delta\|
	\|--\|--\|--\|--\|
	\|smpanaro/pythia-160m-AutoGPTQ-4bit-128g\|33.4375\|23.3024\|10.1351\|
	<sub>Wikitext perplexity measured as in the [huggingface docs](https://huggingface.co./docs/transformers/en/perplexity), lower is better</sub>