doshisha-mil
/

llama-2-70b-chat-4bit-japanese-v1

Text Generation

text-generation-inference

Model card Files Files and versions Community

llama-2-70b-chat-4bit-japanese-v1 / README.md

kimura's picture

Update README.md

395bf31 over 1 year ago

|

history blame contribute delete

1.98 kB

	---
	library_name: peft
	license: llama2
	language:
	- ja
	pipeline_tag: text-generation
	inference: false
	tags:
	- llama-2
	- pytorch
	- facebook
	- meta
	- text-generation-inference
	---
	# doshisha-mil/llama-2-70b-chat-4bit-japanese-v1

	This model is Llama-2-Chat 70B fine-tuned with the following Japanese version of the alpaca dataset.

	https://github.com/shi3z/alpaca_ja

	## Copyright Notice

	Since this model is built on the copyright of Meta's LLaMA series, users of this model must also agree to Meta's license.

	https://ai.meta.com/llama/

	## How to use



	```
	from huggingface_hub import notebook_login
	notebook_login()
	```


	```python
	import torch
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

	model_id = "meta-llama/Llama-2-70b-chat-hf"
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16,
	)

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")

	peft_name = "doshisha-mil/llama-2-70b-chat-4bit-japanese-v1"
	model = PeftModel.from_pretrained(
	model,
	peft_name,
	is_trainable=True
	)
	model.eval()

	device = "cuda:0"

	text = "# Q: 日本一高い山は何ですか？ # A: "
	inputs = tokenizer(text, return_tensors="pt").to(device)
	with torch.no_grad():
	outputs = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))


	```
	## Training procedure


	The following `bitsandbytes` quantization config was used during training:
	- load_in_8bit: False
	- load_in_4bit: True
	- llm_int8_threshold: 6.0
	- llm_int8_skip_modules: None
	- llm_int8_enable_fp32_cpu_offload: False
	- llm_int8_has_fp16_weight: False
	- bnb_4bit_quant_type: nf4
	- bnb_4bit_use_double_quant: True
	- bnb_4bit_compute_dtype: float32
	### Framework versions


	- PEFT 0.4.0