File size: 1,577 Bytes
2819ed0
 
e0f4760
 
2819ed0
e0f4760
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2a5cbf2
 
e0f4760
 
 
 
 
1f79a76
e0f4760
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
license: bigscience-bloom-rail-1.0
language:
- it
---
# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

This model is obtained by adapting bloom-1b7 to the Italian language. Among the languages supported by the BLOOM model, there is no Italian, making its use
in that context challenging. We adapt the original BLOOM model using the MAD-X language adaptation strategy.

## Model Details

### Model Description

We adapt the bloom-1b7 to the Italian language using the MAD-X language adaptation strategy.
To produce a valuable model, we follow the same procedure proposed in: https://arxiv.org/abs/2212.09535

We use default script parameters and select a sample of 100,000 examples in the Italian language. We decided to sample data from the Filtered Oscar Dataset for
the Italian Language released by Sarti.

**It is important to underline that when you use the adapted LLM is necessary to use the tokenizer of the adapted model.**

- **Developed by:** Pierpaolo Basile, Pierluigi Cassotti, Marco Polignano, Lucia Siciliani, Giovanni Semeraro. Department of Computer Science, University of Bari Aldo Moro, Italy
- **Model type:** BLOOM
- **Language(s) (NLP):** Italian
- **License:** BigScience BLOOM RAIL 1.0

## Citation

Pierpaolo Basile, Pierluigi Cassotti, Marco Polignano, Lucia Siciliani, Giovanni Semeraro. On the impact of Language Adaptation for Large Language Models: A
case study for the Italian language using only open resources. Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023).