--- library_name: peft license: cc-by-nc-4.0 base_model: CohereForAI/aya-expanse-8b tags: - trl - sft - generated_from_trainer model-index: - name: results results: [] datasets: - Svngoku/french-multilingual-reward-bench-dpo language: - fr metrics: - accuracy - bleu new_version: Svngoku/Aya-Expanse-8B-French pipeline_tag: text-generation --- # Aya Expanse 8B French Aya Expanse is an open-weight research release of a model with highly advanced multilingual capabilities. It focuses on pairing a highly performant pre-trained [Command family](https://huggingface.co./CohereForAI/c4ai-command-r-plus) of models with the result of a year’s dedicated research from [Cohere For AI](https://cohere.for.ai/), including [data arbitrage](https://arxiv.org/pdf/2408.14960), [multilingual preference training](https://arxiv.org/abs/2407.02552), [safety tuning](https://arxiv.org/abs/2406.18682), and [model merging](https://arxiv.org/abs/2410.10801). The result is a powerful multilingual large language model serving 23 languages. We cover 23 languages: Arabic, Chinese (simplified & traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese This model card corresponds to the 8-billion version of the Aya Expanse model. We also released an 32-billion version which you can find [here](https://huggingface.co./CohereForAI/aya-expanse-32B). - Developed by: [Cohere For AI](https://cohere.for.ai/) - Point of Contact: Cohere For AI: [cohere.for.ai](https://cohere.for.ai/) - License: [CC-BY-NC](https://cohere.com/c4ai-cc-by-nc-license), requires also adhering to [C4AI's Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy) - Model: Aya Expanse 8B - Model Size: 8 billion parameters ## Model description This version have been fine-tuned using `SFT` with huggingface ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.001 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 32 - optimizer: Use paged_adamw_32bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: constant - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 3 ### Training results ### Framework versions - PEFT 0.13.2 - Transformers 4.46.0 - Pytorch 2.1.1+cu121 - Datasets 3.0.2 - Tokenizers 0.20.1