File size: 1,533 Bytes
0b5d7fa 1ef4622 0b5d7fa ac003c6 e0eca88 4803551 e0eca88 f94dc64 138c85b 2e8d599 e154121 e0eca88 bd0aac7 e0eca88 9a9d2e8 aa73533 9a9d2e8 aa73533 e0eca88 bd0aac7 e0eca88 aa73533 e0eca88 aa73533 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
tags:
- text-generation
- 8bit
- 8-bit
- quantization
- compression
inference: False
license: apache-2.0
---
# ethzanalytics/gpt-j-6B-8bit-sharded
This is a version of `hivemind/gpt-j-6B-8bit` for low-RAM loading, i.e., free Colab runtimes :)
- shards are <= 1000MB each
- a demo notebook of how to use it [is here](https://colab.research.google.com/gist/pszemraj/1c0b32173df5b1efbdb7a2358ed4195b/generate-text-with-an-llm-sharded-on-huggingface.ipynb)
[![colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/pszemraj/1c0b32173df5b1efbdb7a2358ed4195b/generate-text-with-an-llm-sharded-on-huggingface.ipynb)
**Please refer to the [original model card for hivemind/gpt-j-6B-8bit](https://huggingface.co./hivemind/gpt-j-6B-8bit) for all details.**
## Usage
> **NOTE:** PRIOR to loading the model, you need to "patch" it to be compatible with loading 8bit weights etc. See the original model card above for details on how to do this.
install `transformers`, `accelerate`, and `bitsandbytes` if needed:
```sh
pip install transformers accelerate bitsandbytes
```
Patch the model, load using `device_map="auto"`:
```python
import transformers
from transformers import AutoTokenizer
"""
CODE TO PATCH GPTJForCausalLM GOES HERE
"""
tokenizer = AutoTokenizer.from_pretrained("ethzanalytics/gpt-j-6B-8bit-sharded")
model = GPTJForCausalLM.from_pretrained(
"ethzanalytics/gpt-j-6B-8bit-sharded",
device_map="auto",
)
```
Take a look at the notebook for details.
|