Correct README about arcitecture (#5)
Browse files- Correct README about arcitecture (1548fd0cdafb2804d9436577eaed54a054e84034)
Co-authored-by: Hiroaki Mikami <[email protected]>
README.md
CHANGED
@@ -13,7 +13,7 @@ library_name: transformers
|
|
13 |
## Model Description
|
14 |
PLaMo 2 1B is a 1B model pre-trained on English and Japanese datasets, developed by Preferred Elements, Inc.
|
15 |
|
16 |
-
PLaMo 2 models adapt the [Samba](https://arxiv.org/abs/2406.07522)
|
17 |
|
18 |
PLaMo 2 1B is released under Apache License version 2.0.
|
19 |
|
|
|
13 |
## Model Description
|
14 |
PLaMo 2 1B is a 1B model pre-trained on English and Japanese datasets, developed by Preferred Elements, Inc.
|
15 |
|
16 |
+
PLaMo 2 models adapt the hybrid architecture like [Samba](https://arxiv.org/abs/2406.07522) rather than the Transformer architecture. Samba integrates [Mamba](https://arxiv.org/abs/2312.00752), a selective State Space Model (SSM), with sliding window attention, combining their strengths for improved efficiency and performance. The major differences between Samba and PLaMo 2 are 1) adding normalization layers to improve training stability, and 2) using Mamba2 kernel for computational efficiency.
|
17 |
|
18 |
PLaMo 2 1B is released under Apache License version 2.0.
|
19 |
|