Falcon3-3B-Base / README.md
Iheb-Chaabane's picture
Update README.md
2721bc1 verified
|
raw
history blame
5.14 kB
metadata
language:
  - en
  - fr
  - es
  - pt
tags:
  - falcon3

Falcon3-7B-Base

Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.

This repository contains the Falcon3-3B-Base. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-3B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 8K. Falcon3-3B-Base pruned (depth + width) from Falcon3-7B-Base, was effeciently trained on only 100 GT using a knowledge distillation objective.

⚠️ This is a raw, pretrained model, which should be further finetuned for most usecases.

Model Details

  • Architecture
    • Transformer based causal decoder only architecture
    • 22 decoder blocks
    • Grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
    • Wider head dimension: 256
    • High RoPE value to support long context understanding: 1000042
    • 8k context length
    • 131k vocab size
  • Pruned and Healed from Falcon3-7B-Base on only 100 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
  • Supports EN, FR, ES, PT
  • Developed by Technology Innovation Institute
  • License: TII Falcon-LLM License 2.0
  • Model Release Date: December 2024

Getting started

Click to expand
import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation", 
    model="tiiuae/Falcon3-3B-Base", 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)
response = pipe("Question: How many hours in one day? Answer: ")
print(response[0]['generated_text'])

Benchmarks

We report in the following table our internal pipeline benchmarks:

    </tr>
    <tr>
        <td rowspan="2">Math</td>
        <td>GSM8K (5-shot)</td>
        <td>26.68</td>
        <td>68.99</td>
        <td>25.7</td>
        <td>63.91</td>
    </tr>
    <tr>
        <td>MATH(4-shot)</td>
        <td>1.39</td>
        <td>8.43</td>
        <td>1.73</td>
        <td>9.38</td>
    </tr>
    <tr>
        <td rowspan="4">Reasoning</td>
        <td>Arc Challenge (25-shot)</td>
        <td>50.76</td>
        <td>55.54</td>
        <td>50.34</td>
        <td>54.86</td>
    </tr>
    <tr>
        <td>GPQA (0-shot)</td>
        <td>27.49</td>
        <td>27.53</td>
        <td>38.6</td>
        <td>31.15</td>
    </tr>
    <tr>
        <td>MUSR (0-shot)</td>
        <td>35.24</td>
        <td>43.03</td>
        <td>42.13</td>
        <td>37.5</td>
    </tr>
    <tr>
        <td>BBH (3-shot)</td>
        <td>38.59</td>
        <td>46.12</td>
        <td>40.85</td>
        <td>44.23</td>
    </tr>
    <tr>
        <td rowspan="4">CommonSense Understanding</td>
        <td>PIQA (0-shot)</td>
        <td>77.42</td>
        <td>78.89</td>
        <td>78.29</td>
        <td>75.62</td>
    </tr>
    <tr>
        <td>SciQ (0-shot)</td>
        <td>92.7</td>
        <td>95.6</td>
        <td>96.1</td>
        <td>93.1</td>
    </tr>
    <tr>
        <td>Winogrande (0-shot)</td>
        <td>69.69</td>
        <td>68.82</td>
        <td>68.35</td>
        <td>64.64</td>
    </tr>
    <tr>
        <td>OpenbookQA (0-shot)</td>
        <td>43.2</td>
        <td>42.2</td>
        <td>43</td>
        <td>39.4</td>
    </tr>
</tbody>
Category Benchmark Llama3.2-3B Qwen2.5-3B Minitron-4B Falcon3-3B-Base
General MMLU (5-shot) 56.1 65.6 58.6 55.5
MMLU-PRO (5-shot) 24.9 31.99 26.21 28.77
IFEval 12.83 27 22.81 27.67

Citation

If Falcon3 family were helpful to your work, feel free to give us a cite.

@misc{Falcon3,
    title = {The Falcon 3 family of Open Models},
    author = {TII Team},
    month = {December},
    year = {2024}
}