Fimbulvetr-11B-v2 / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
7adb017 verified
|
raw
history blame
5.09 kB
metadata
language:
  - en
license: cc-by-nc-4.0
model-index:
  - name: Fimbulvetr-11B-v2
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 70.14
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/Fimbulvetr-11B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 87.79
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/Fimbulvetr-11B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 66.83
            name: accuracy
        source:
          url: >-
            https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/Fimbulvetr-11B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 63.43
        source:
          url: >-
            https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/Fimbulvetr-11B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 82.95
            name: accuracy
        source:
          url: >-
            https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/Fimbulvetr-11B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 64.67
            name: accuracy
        source:
          url: >-
            https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/Fimbulvetr-11B-v2
          name: Open LLM Leaderboard

Fox1

Cute girl to catch your attention.

https://huggingface.co./Sao10K/Fimbulvetr-11B-v2-GGUF <------ GGUF

Fimbulvetr-v2 - A Solar-Based Model


4/4 Status Update:

got a few reqs on wanting to support me: https://ko-fi.com/sao10k

anyway, status on v3 - Halted for time being, working on dataset work mainly. it's a pain, to be honest. the data I have isn't up to my standard for now. it's good, just not good enough


Prompt Formats - Alpaca or Vicuna. Either one works fine. Recommended SillyTavern Presets - Universal Light

Alpaca:

### Instruction:
<Prompt>
### Input:
<Insert Context Here>
### Response:

Vicuna:

System: <Prompt>

User: <Input>

Assistant:

Changelogs:

25/2 - repo renamed to remove test, model card redone. Model's officially out.
15/2 - Heavy testing complete. Good feedback.


Rant - Kept For Historical Reasons

Ramble to meet minimum length requirements:

Tbh i wonder if this shit is even worth doing. Like im just some broke guy lmao I've spent so much. And for what? I guess creds. Feels good when a model gets good feedback, but it seems like im invisible sometimes. I should be probably advertising myself and my models on other places but I rarely have the time to. Probably just internal jealousy sparking up here and now. Wahtever I guess.

Anyway cool EMT vocation I'm doing is cool except it pays peanuts, damn bruh 1.1k per month lmao. Government to broke to pay for shit. Pays the bills I suppose.

Anyway cool beans, I'm either going to continue the Solar Train or go to Mixtral / Yi when I get paid.

You still here?


Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 72.63
AI2 Reasoning Challenge (25-Shot) 70.14
HellaSwag (10-Shot) 87.79
MMLU (5-Shot) 66.83
TruthfulQA (0-shot) 63.43
Winogrande (5-shot) 82.95
GSM8k (5-shot) 64.67