A newer version of the Gradio SDK is available:
5.12.0
LLaMA is a Large Language Model developed by Meta AI.
It was trained on more tokens than previous models. The result is that the smallest version with 7 billion parameters has similar performance to GPT-3 with 175 billion parameters.
This guide will cover usage through the official transformers
implementation. For 4-bit mode, head over to GPTQ models (4 bit mode)
.
Getting the weights
Option 1: pre-converted weights
- Direct download (recommended):
https://huggingface.co./Neko-Institute-of-Science/LLaMA-7B-HF
https://huggingface.co./Neko-Institute-of-Science/LLaMA-13B-HF
https://huggingface.co./Neko-Institute-of-Science/LLaMA-30B-HF
https://huggingface.co./Neko-Institute-of-Science/LLaMA-65B-HF
- Torrent:
https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1484235789
The tokenizer files in the torrent above are outdated, in particular the files called tokenizer_config.json
and special_tokens_map.json
. Here you can find those files: https://huggingface.co./oobabooga/llama-tokenizer
Option 2: convert the weights yourself
- Install the
protobuf
library:
pip install protobuf==3.20.1
- Use the script below to convert the model in
.pth
format that you, a fellow academic, downloaded using Meta's official link.
If you have transformers
installed in place:
python -m transformers.models.llama.convert_llama_weights_to_hf --input_dir /path/to/LLaMA --model_size 7B --output_dir /tmp/outputs/llama-7b
Otherwise download convert_llama_weights_to_hf.py first and run:
python convert_llama_weights_to_hf.py --input_dir /path/to/LLaMA --model_size 7B --output_dir /tmp/outputs/llama-7b
- Move the
llama-7b
folder inside yourtext-generation-webui/models
folder.
Starting the web UI
python server.py --model llama-7b