sbintuitions
/

sarashina2.2-3b

Model card Files Files and versions Community

sarashina2.2-3b / README.md

ryo0634's picture

Update README.md

9a06e76 verified 5 days ago

|

history blame contribute delete

3.44 kB

	---
	language:
	- ja
	- en
	license: mit
	---

	# Sarashina2.2-3B

	This repository provides large language models trained by [SB Intuitions](https://www.sbintuitions.co.jp/).

	## How to use

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, set_seed

	model = AutoModelForCausalLM.from_pretrained("sbintuitions/sarashina2.2-3b", torch_dtype=torch.bfloat16, device_map="auto")
	tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2.2-3b")
	generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
	set_seed(123)

	text = generator(
	"おはようございます、今日の天気は",
	max_length=30,
	do_sample=True,
	pad_token_id=tokenizer.pad_token_id,
	num_return_sequences=3,
	)

	for t in text:
	print(t)


	```

	## Model Description

	We constructed the Sarashina2.2-3B model, which consists of about 3 billion parameters (excluding embeddings and the LM head when calculating the number of parameters), using a three-phase training process.
	First, we trained the model on 10 trillion tokens, including Japanese, English, and code data extracted from web corpora.
	Next, we trained the model using synthetic data to improve its performance on math and coding tasks.
	Finally, we trained the model with a small amount of data to enhance its performance on various application tasks.

	The following tables show the model's performance on Japanese tasks.
	For reference, we also present the performance of our previous LLMs.
	As shown in the table, our Sarashina2.2-3B outperforms Sarashina2-7B in Japanese QA tasks such as NIILC and JMMLU.
	In addition, Sarashina2.2-3B outperforms Sarashina2-70B in Japanese math and coding tasks such as MGSM-ja and JHumanEval.

	#### Evaluation in Japanese tasks

	\| Model \| NIILC \| JMMLU \| MGSM-ja \| JHumanEval \|
	\|--------------------------------------------------------------------------------\|------------\|-----------\|------------\|-------------\|
	\| [Sarashina2-7B](https://huggingface.co./sbintuitions/sarashina2-7b) \| 61.4 \| 42.5 \| 8.4 \| 12.8 \|
	\| [Sarashina2-70B](https://huggingface.co./sbintuitions/sarashina2-70b) \| 65.4 \| 62.7 \| 54.0 \| 22.0 \|
	\| [Sarashina2.2-0.5B](https://huggingface.co./sbintuitions/sarashina2.2-0.5b) \| 33.9 \| 28.8 \| 21.6 \| 15.2 \|
	\| [Sarashina2.2-1B](https://huggingface.co./sbintuitions/sarashina2.2-1b) \| 47.2 \| 38.2 \| 39.6 \| 20.7 \|
	\| [Sarashina2.2-3B](https://huggingface.co./sbintuitions/sarashina2.2-3b) \| 63.0 \| 52.7 \| 63.6 \| 39.0 \|


	## Ethical Considerations and Limitations
	This repository contains the pre-trained model, which has not yet been tuned to follow instructions.
	Therefore, this model may generate meaningless sequences, inaccurate instances, or biased/objectionable outputs.
	As post-trained Sarashina2.2 models, we have published [Sarashina2.2-0.5B-instruct-v0.1](https://huggingface.co./sbintuitions/sarashina2.2-0.5b-instruct-v0.1), [Sarashina2.2-1B-instruct-v0.1](https://huggingface.co./sbintuitions/sarashina2.2-1b-instruct-v0.1), and [Sarashina2.2-3B-instruct-v0.1](https://huggingface.co./sbintuitions/sarashina2.2-3b-instruct-v0.1).

	## License
	[MIT License](https://huggingface.co./sbintuitions/sarashina2.2-3b/blob/main/LICENSE)