ynakashima
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,74 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
library_name: transformers
|
3 |
+
license: apache-2.0
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
- ja
|
7 |
+
base_model: Qwen/QwQ-32B-Preview
|
8 |
+
---
|
9 |
+
|
10 |
+
# KARAKURI LM 32B Thinking 2501 Experimental
|
11 |
+
|
12 |
+
## Model Details
|
13 |
+
|
14 |
+
### Model Description
|
15 |
+
|
16 |
+
- **Developed by:** [KARAKURI Inc.](https://about.karakuri.ai/)
|
17 |
+
- **Model type:** Causal Language Models
|
18 |
+
- **Languages**: Japanese
|
19 |
+
- **License:** Apache 2.0
|
20 |
+
- **Finetuned from model:** [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview)
|
21 |
+
- **Contact**: For questions and comments about the model, please email `[email protected]`
|
22 |
+
- **Demo**: https://lm.karakuri.cc/
|
23 |
+
|
24 |
+
## Usage
|
25 |
+
|
26 |
+
### Run the model
|
27 |
+
|
28 |
+
```python
|
29 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
30 |
+
|
31 |
+
model_name = "karakuri-ai/karakuri-lm-32b-thinking-2501-exp"
|
32 |
+
|
33 |
+
model = AutoModelForCausalLM.from_pretrained(
|
34 |
+
model_name,
|
35 |
+
torch_dtype="auto",
|
36 |
+
device_map="auto",
|
37 |
+
)
|
38 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
39 |
+
|
40 |
+
messages = [
|
41 |
+
{"role": "user", "content": "こんにちは。"}
|
42 |
+
]
|
43 |
+
input_ids = tokenizer.apply_chat_template(
|
44 |
+
messages,
|
45 |
+
add_generation_prompt=True,
|
46 |
+
return_tensors="pt",
|
47 |
+
).to(model.device)
|
48 |
+
outputs = model.generate(input_ids, max_new_tokens=512)
|
49 |
+
tokenizer.decode(outputs[0][input_ids.shape[-1]:])
|
50 |
+
```
|
51 |
+
|
52 |
+
## Training Details
|
53 |
+
|
54 |
+
### Training Infrastructure
|
55 |
+
|
56 |
+
- **Hardware**: The model was trained on 16 nodes of an Amazon EC2 trn1.32xlarge instance.
|
57 |
+
- **Software**: We use code based on [neuronx-nemo-megatron](https://github.com/aws-neuron/neuronx-nemo-megatron).
|
58 |
+
|
59 |
+
## Acknowledgments
|
60 |
+
|
61 |
+
This work was supported by the Ministry of Economy, Trade and Industry (METI) and the New Energy and Industrial Technology Development Organization (NEDO) through the [Generative AI Accelerator Challenge (GENIAC)](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html).
|
62 |
+
|
63 |
+
## Citation
|
64 |
+
|
65 |
+
```
|
66 |
+
@misc{karakuri_lm_32b_thinking_2501_exp,
|
67 |
+
author = { {KARAKURI} {I}nc. },
|
68 |
+
title = { {KARAKURI} {LM} 32{B} {T}hinking 2501 {E}xperimental },
|
69 |
+
year = { 2025 },
|
70 |
+
url = { https://huggingface.co/karakuri-ai/karakuri-lm-32b-thinking-2501-exp },
|
71 |
+
publisher = { Hugging Face },
|
72 |
+
journal = { Hugging Face repository }
|
73 |
+
}
|
74 |
+
```
|