kurumuz commited on
Commit
2d9465d
โ€ข
1 Parent(s): 19eafc4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - jp
4
+ tags:
5
+ - pytorch
6
+ - causal-lm
7
+ license: apache-2.0
8
+
9
+ ---
10
+
11
+ # Genji-JP 6B
12
+
13
+ Please check our blog post for more details, samples, evaluations and more:
14
+ [Blogpost](https://colab.research.google.com/drive/1PnWpx02IEUkY8jhLKd_NewUGEXahAska?usp=sharing)
15
+
16
+ ## Model Description
17
+
18
+ Genji-JP 6B is a model finetuned on our Japanese storytelling dataset based off EleutherAI's GPT-J 6B model. This particular model is trained on Japanese web novels.
19
+
20
+ | Hyperparameter | Value |
21
+ |-------------------|--------|
22
+ | n_parameters | 6,053,381,344 |
23
+ | n_layers | 28* |
24
+ | d_model | 4,096 |
25
+ | d_ff | 16,384 |
26
+ | n_heads | 16 |
27
+ | d_head | 256 |
28
+ | n_ctx | 2,048 |
29
+ | n_vocab | 50,400 (same tokenizer as GPT-2/3) |
30
+ | position encoding | [Rotary position encodings (RoPE)](https://arxiv.org/abs/2104.09864) |
31
+ | RoPE dimensions | [64](https://github.com/kingoflolz/mesh-transformer-jax/blob/f2aa66e0925de6593dcbb70e72399b97b4130482/mesh_transformer/layers.py#L223) |
32
+
33
+ `*` each layer consists of one feedforward block and one self attention block
34
+
35
+ The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model
36
+ dimension is split into 16 heads, each with a dimension of 256. Rotary position encodings (RoPE) was applied to 64
37
+ dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as
38
+ GPT-2/GPT-3.
39
+
40
+ ## Training data
41
+
42
+ GPT-J 6B was pretrained on the [Pile](pile.eleuther.ai), a large scale curated dataset created by EleutherAI for the purpose of training this model. After the pre-training, it's finetuned on the python code that was taken from the Pile.
43
+
44
+ ### How to use
45
+
46
+ ```from transformers import AutoTokenizer, AutoModelForCausalLM
47
+ import torch
48
+
49
+ tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
50
+ model = AutoModelForCausalLM.from_pretrained("NovelAI/genji-jp", torch_dtype=torch.float16, low_cpu_mem_usage=True).eval().cuda()
51
+ text = '''ใ‚ใ‚‰ใ™ใ˜๏ผšใ‚ใชใŸใฏ็•ฐไธ–็•Œใซ่ปข็”Ÿใ—ใฆใ—ใพใ„ใพใ—ใŸใ€‚ๅ‹‡่€…ใจใชใฃใฆใ€ไปฒ้–“ใ‚’ไฝœใ‚Šใ€็•ฐไธ–็•Œใ‚’ๅ†’้™บใ—ใ‚ˆใ†๏ผ
52
+ ***
53
+ ่ปข็”Ÿใ™ใ‚‹ใจใ€ใ‚ใ‚‹่ƒฝๅŠ›ใ‚’ๆ‰‹ใซๅ…ฅใ‚Œใฆใ„ใŸใ€‚ใใ‚Œใฏใ€'''
54
+
55
+ tokens = tokenizer(text, return_tensors="pt").input_ids
56
+ generated_tokens = model.generate(tokens.long().cuda(), use_cache=True, do_sample=True, temperature=1, top_p=0.9, repetition_penalty=1.125, min_length=1, max_length=len(tokens[0]) + 400, pad_token_id=tokenizer.eos_token_id)
57
+ last_tokens = generated_tokens[0]
58
+ generated_text = tokenizer.decode(last_tokens).replace("๏ฟฝ", "")
59
+ print("Generation:\n" + generated_text)
60
+ ```
61
+ When run, this code generates:
62
+ ```
63
+ Generation:
64
+ ใ‚ใ‚‰ใ™ใ˜๏ผšใ‚ใชใŸใฏ็•ฐไธ–็•Œใซ่ปข็”Ÿใ—ใฆใ—ใพใ„ใพใ—ใŸใ€‚ๅ‹‡่€…ใจใชใฃใฆใ€ไปฒ้–“ใ‚’ไฝœใ‚Šใ€็•ฐไธ–็•Œใ‚’ๅ†’้™บใ—ใ‚ˆใ†๏ผ
65
+ ***
66
+ ่ปข็”Ÿใ™ใ‚‹ใจใ€ใ‚ใ‚‹่ƒฝๅŠ›ใ‚’ๆ‰‹ใซๅ…ฅใ‚Œใฆใ„ใŸใ€‚ใใ‚Œใฏใ€ใ€Žไบˆ็Ÿฅใ€ใ ใ€‚้ŽๅŽปใ‹ใ‚‰ๆœชๆฅใฎใ“ใจใ‚’ใ€่ชฐใ‚‚็Ÿฅใ‚‰ใชใ„ๅ‡บๆฅไบ‹ใ‚‚ๅซใ‚ใฆ่ฆ‹้€šใ™ใ“ใจใŒๅ‡บๆฅใ‚‹ใ€‚
67
+ ๆ‚ช้ญ”ใฎๆฌ ็‰‡ใจๅ‘ผใฐใ‚Œใ‚‹ๅฐใ•ใช็ตๆ™ถใ‚’ๅ–ใ‚Š่พผใ‚“ใงใ€ไฝฟๅฝนใ™ใ‚‹ใ“ใจใŒๅ‡บๆฅใ‚‹ใ€‚ไบบใ‚’ๆƒนใใคใ‘ใ€ๅ •่ฝใ•ใ›ใ‚‹ใ€‚ไฝ•ใ‚ˆใ‚Šใ€ไฟบใฏ็”ทใชใ‚“ใฆๅฑ…ใชใ‹ใฃใŸใ—ใ€ๅฅณใซ่ˆˆๅ‘ณใ‚‚ใชใ„ใ€‚โ€ฆโ€ฆใใ‚“ใชใ‚ฏใ‚บใฎ็‰‡ๆฃ’ใ‚’ๆ‹…ใŽไธŠใ’ใ‚‹ๅฅดใŒๅคšใใชใ‚‹ใจๆ€ใ†ใจใ€ใกใ‚‡ใฃใจ่‹ฆใ—ใ„ใ€‚
68
+ ใ ใŒใ€ไธ€้ƒจใฎไบบ้–“ใซใฏๅ”ๅŠ›่€…ใ‚’ๅพ—ใ‚‹ใ“ใจใŒๅ‡บๆฅใ‚‹ใ€‚็›ฎ็ซ‹ใŸใชใ„่ก—ใซใ‚ใ‚‹ๅฏบใฎไธญใงใ€ๅธธใซๅฎถใซๅผ•ใใ“ใ‚‚ใฃใฆใ„ใ‚‹่€ไบบใ€‚ใใ‚“ใชใƒคใƒ„ใฎ้ญ‚ใ‚’ใ‚ณใƒณใƒˆใƒญใƒผใƒซใ™ใ‚‹ใ“ใจใŒๅ‡บๆฅใ‚‹ใฎใ ใ€‚ไพฟๅˆฉใช่ƒฝๅŠ›ใ ใ€‚ใ—ใ‹ใ—ใ€่ฃๅˆ‡ใ‚Š่€…ใฏๅคงๅ‹ขใ„ใ‚‹ใ€‚ๆฐ—ใ‚’ๆŠœใ‘ใฐใ€็‹‚ใ†ใ€‚ใ ใ‹ใ‚‰ๆณจๆ„ใŒๅฟ…่ฆใ ใ€‚
69
+ โ€•โ€•ใ€Œใ‚„ใฃใฆใ‚„ใ‚‹ใ‚ˆใ€
70
+ ใ€€ใ‚ขใƒผใƒญใƒณใฏไธๆ•ตใซ็ฌ‘ใฃใŸใ€‚ใ“ใฎ๏ฟฝ
71
+ ```
72
+
73
+ ## Acknowledgements
74
+
75
+ This project was possible because of the compute provided by the
76
+ [TPU Research Cloud](https://sites.research.google/trc/)
77
+
78
+ Thanks [EleutherAI](https://eleuther.ai/) for pretraining the GPT-J 6B model.
79
+
80
+ Thanks to everyone who contributed to this project!
81
+
82
+ - [Finetune](https://github.com/finetuneanon)
83
+ - [Aero](https://github.com/AeroScripts)
84
+ - [Kurumuz](https://github.com/kurumuz)