|
--- |
|
library_name: peft |
|
license: cc-by-nc-4.0 |
|
base_model: CohereForAI/aya-expanse-8b |
|
tags: |
|
- trl |
|
- sft |
|
- generated_from_trainer |
|
model-index: |
|
- name: results |
|
results: [] |
|
datasets: |
|
- Svngoku/french-multilingual-reward-bench-dpo |
|
language: |
|
- fr |
|
metrics: |
|
- accuracy |
|
- bleu |
|
new_version: Svngoku/Aya-Expanse-8B-French |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# Aya Expanse 8B French |
|
|
|
<img src="https://huggingface.co./CohereForAI/aya-expanse-8b/resolve/main/aya-expanse-8B.png" width="650" style="margin-left:'auto' margin-right:'auto' display:'block'"/> |
|
|
|
Aya Expanse is an open-weight research release of a model with highly advanced multilingual capabilities. It focuses on pairing a highly performant pre-trained [Command family](https://huggingface.co./CohereForAI/c4ai-command-r-plus) of models with the result of a year’s dedicated research from [Cohere For AI](https://cohere.for.ai/), including [data arbitrage](https://arxiv.org/pdf/2408.14960), [multilingual preference training](https://arxiv.org/abs/2407.02552), [safety tuning](https://arxiv.org/abs/2406.18682), and [model merging](https://arxiv.org/abs/2410.10801). The result is a powerful multilingual large language model serving 23 languages. |
|
|
|
We cover 23 languages: Arabic, Chinese (simplified & traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese |
|
|
|
This model card corresponds to the 8-billion version of the Aya Expanse model. We also released an 32-billion version which you can find [here](https://huggingface.co./CohereForAI/aya-expanse-32B). |
|
|
|
- Developed by: [Cohere For AI](https://cohere.for.ai/) |
|
- Point of Contact: Cohere For AI: [cohere.for.ai](https://cohere.for.ai/) |
|
- License: [CC-BY-NC](https://cohere.com/c4ai-cc-by-nc-license), requires also adhering to [C4AI's Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy) |
|
- Model: Aya Expanse 8B |
|
- Model Size: 8 billion parameters |
|
|
|
## Model description |
|
|
|
This version have been fine-tuned using `SFT` with huggingface |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.001 |
|
- train_batch_size: 16 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 2 |
|
- total_train_batch_size: 32 |
|
- optimizer: Use paged_adamw_32bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
- lr_scheduler_type: constant |
|
- lr_scheduler_warmup_ratio: 0.05 |
|
- num_epochs: 3 |
|
|
|
### Training results |
|
|
|
|
|
|
|
### Framework versions |
|
|
|
- PEFT 0.13.2 |
|
- Transformers 4.46.0 |
|
- Pytorch 2.1.1+cu121 |
|
- Datasets 3.0.2 |
|
- Tokenizers 0.20.1 |