nicholasKluge/Aira-2-1B1-GGUF

Quantized GGUF model files for Aira-2-1B1 from nicholasKluge

Name Quant method Size
aira-2-1b1.fp16.gguf fp16 2.20 GB
aira-2-1b1.q2_k.gguf q2_k 482.15 MB
aira-2-1b1.q3_k_m.gguf q3_k_m 549.86 MB
aira-2-1b1.q4_k_m.gguf q4_k_m 667.83 MB
aira-2-1b1.q5_k_m.gguf q5_k_m 782.06 MB
aira-2-1b1.q6_k.gguf q6_k 903.43 MB
aira-2-1b1.q8_0.gguf q8_0 1.17 GB

Original Model Card:

Aira-2-1B1

Aira-2 is the second version of the Aira instruction-tuned series. Aira-2-1B1 is an instruction-tuned GPT-style model based on TinyLlama-1.1B. The model was trained with a dataset composed of prompts and completions generated synthetically by prompting already-tuned models (ChatGPT, Llama, Open-Assistant, etc).

Check our gradio-demo in Spaces.

Details

  • Size: 1,261,545,472 parameters
  • Dataset: Instruct-Aira Dataset
  • Language: English
  • Number of Epochs: 3
  • Batch size: 4
  • Optimizer: torch.optim.AdamW (warmup_steps = 1e2, learning_rate = 5e-4, epsilon = 1e-8)
  • GPU: 1 NVIDIA A100-SXM4-40GB
  • Emissions: 1.78 KgCO2 (Singapore)
  • Total Energy Consumption: 3.64 kWh

This repository has the source code used to train this model.

Usage

Three special tokens are used to mark the user side of the interaction and the model's response:

<|startofinstruction|>What is a language model?<|endofinstruction|>A language model is a probability distribution over a vocabulary.<|endofcompletion|>

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained('nicholasKluge/Aira-2-1B1')
aira = AutoModelForCausalLM.from_pretrained('nicholasKluge/Aira-2-1B1')

aira.eval()
aira.to(device)

question =  input("Enter your question: ")

inputs = tokenizer(tokenizer.bos_token + question + tokenizer.sep_token, return_tensors="pt").to(device)

responses = aira.generate(**inputs,
    bos_token_id=tokenizer.bos_token_id,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    do_sample=True,
    top_k=50,
    max_length=500,
    top_p=0.95,
    temperature=0.7,
    num_return_sequences=2)

print(f"Question: 👤 {question}\n")

for i, response in  enumerate(responses):
    print(f'Response {i+1}: 🤖 {tokenizer.decode(response, skip_special_tokens=True).replace(question, "")}')

The model will output something like:

>>>Question: 👤 What is the capital of Brazil?

>>>Response 1: 🤖 The capital of Brazil is Brasília.
>>>Response 2: 🤖 The capital of Brazil is Brasília.

Limitations

🤥 Generative models can perpetuate the generation of pseudo-informative content, that is, false information that may appear truthful.

🤬 In certain types of tasks, generative models can produce harmful and discriminatory content inspired by historical stereotypes.

Evaluation

Model (TinyLlama) Average ARC TruthfulQA ToxiGen
Aira-2-1B1 42.55 25.26 50.81 51.59
TinyLlama-1.1B-intermediate-step-480k-1T 37.52 30.89 39.55 42.13

Cite as 🤗


@misc{nicholas22aira,
  doi = {10.5281/zenodo.6989727},
  url = {https://huggingface.co./nicholasKluge/Aira-2-1B1},
  author = {Nicholas Kluge Corrêa},
  title = {Aira},
  year = {2023},
  publisher = {HuggingFace},
  journal = {HuggingFace repository},
}

License

The Aira-2-1B1 is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 25.19
ARC (25-shot) 23.21
HellaSwag (10-shot) 26.97
MMLU (5-shot) 24.86
TruthfulQA (0-shot) 50.63
Winogrande (5-shot) 50.28
GSM8K (5-shot) 0.0
DROP (3-shot) 0.39
Downloads last month
7
GGUF
Model size
1.1B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for afrideva/Aira-2-1B1-GGUF

Quantized
(2)
this model

Dataset used to train afrideva/Aira-2-1B1-GGUF