File size: 3,245 Bytes
2f7b096
ba001c9
2f7b096
ba001c9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2f7b096
 
70df6b2
2f7b096
ba001c9
2f7b096
ba001c9
2f7b096
ba001c9
 
 
2f7b096
ba001c9
 
 
 
 
2f7b096
ba001c9
2f7b096
ba001c9
2f7b096
0d2ad22
2f7b096
82e48cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2f7b096
ba001c9
 
2f7b096
ba001c9
 
2f7b096
ba001c9
 
2f7b096
ba001c9
 
 
2f7b096
ba001c9
 
 
 
 
2f7b096
ba001c9
 
2f7b096
ba001c9
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
base_model: BSC-LT/salamandra-2b
language:
- bg
- ca
- code
- cs
- cy
- da
- de
- el
- en
- es
- et
- eu
- fi
- fr
- ga
- gl
- hr
- hu
- it
- lt
- lv
- mt
- nl
- nn
- \no
- oc
- pl
- pt
- ro
- ru
- sh
- sk
- sl
- sr
- sv
- uk
---

![image/png](https://cdn-uploads.huggingface.co/production/uploads/633b489acbdbadd99c0b75ef/rremJczEA0mULGHHKol6S.png)

# Salamandra-2b-fp8 Model Card

This model is the fp8-quantized version of [Salamandra-2b](https://huggingface.co./BSC-LT/salamandra-2b).

The model weights are quantized from FP16 to FP8 (8-bit weights) using the FP8 quantization algorithm 
from [NeuralMagic](https://neuralmagic.com/blog/vllm-brings-fp8-inference-to-the-open-source-community/). 
Inferencing with this model can be done using [VLLM](https://docs.vllm.ai/en/stable/models/engine_args.html). 

Salamandra is a highly multilingual model pre-trained from scratch that comes in three different 
sizes — 2B, 7B and 40B parameters — with their respective base and instruction-tuned variants, 
promoted and financed by the Government of Catalonia through the [Aina Project](https://projecteaina.cat/) 
and the _Ministerio para la Transformación Digital y de la Función Pública_ - Funded by EU – NextGenerationEU 
within the framework of [ILENIA Project](https://proyectoilenia.es/) with reference 2022/TL22/00215337.

This model card corresponds to the fp8-quantized version of Salamandra-2b.

The entire Salamandra family is released under a permissive [Apache 2.0 license]((https://www.apache.org/licenses/LICENSE-2.0)).

## How to Use

The following example code works under ``Python 3.9.16``, ``vllm==0.6.3.post1``, ``torch==2.4.0`` and ``torchvision==0.19.0``, though it should run on
any current version of the libraries. This is an example of how to create a text completion using the model:

```
from vllm import LLM, SamplingParams

model_name = "BSC-LT/salamandra-2b-base-fp8"
llm = LLM(model=model_name)

outputs = llm.generate("El mercat del barri ",
                       sampling_params=SamplingParams(
                           temperature=0.5,
                           max_tokens=200)
                       )
print(outputs[0].outputs[0].text)

```

### Author
International Business Machines (IBM).

### Copyright
International Business Machines (IBM).

### Contact
For further information, please send an email to <[email protected]>.

### Acknowledgements
We appreciate the collaboration with IBM in this work. 
Specifically, the IBM team created fp8-quantized version of the Salamandra-2b model released here. 

### Disclaimer
Be aware that the model may contain biases or other unintended distortions. 
When third parties deploy systems or provide services based on this model, or use the model themselves, 
they bear the responsibility for mitigating any associated risks and ensuring compliance with applicable 
regulations, including those governing the use of Artificial Intelligence.

Barcelona Supercomputing Center and International Business Machines shall 
not be held liable for any outcomes resulting from third-party use.

### License
[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)