leaderboard-pr-bot's picture
Adding Evaluation Results
eae9743 verified
|
raw
history blame
6.28 kB
metadata
language:
  - zh
  - en
license: apache-2.0
model-index:
  - name: tigerbot-13b-base
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 53.84
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=TigerResearch/tigerbot-13b-base
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 77.05
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=TigerResearch/tigerbot-13b-base
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 53.57
            name: accuracy
        source:
          url: >-
            https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=TigerResearch/tigerbot-13b-base
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 44.06
        source:
          url: >-
            https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=TigerResearch/tigerbot-13b-base
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 74.98
            name: accuracy
        source:
          url: >-
            https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=TigerResearch/tigerbot-13b-base
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 17.06
            name: accuracy
        source:
          url: >-
            https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=TigerResearch/tigerbot-13b-base
          name: Open LLM Leaderboard

TigerBot

A cutting-edge foundation for your very own LLM.

💻Github • 🌐 TigerBot • 🤗 Hugging Face

快速开始

  • 方法1,通过transformers使用

    • 下载 TigerBot Repo

      git clone https://github.com/TigerResearch/TigerBot.git
      
    • 启动infer代码

      python infer.py --model_path TigerResearch/tigerbot-13b-base-v2 --model_type base
      
  • 方法2:

    • 下载 TigerBot Repo

      git clone https://github.com/TigerResearch/TigerBot.git
      
    • 安装git lfs: git lfs install

    • 通过huggingface或modelscope平台下载权重

      git clone https://huggingface.co./TigerResearch/tigerbot-13b-base-v2
      git clone https://www.modelscope.cn/TigerResearch/tigerbot-13b-base-v2.git
      
    • 启动infer代码

      python infer.py --model_path tigerbot-13b-base-v2 --model_type base --max_generate_length 64
      

Quick Start

  • Method 1, use through transformers

    • Clone TigerBot Repo

      git clone https://github.com/TigerResearch/TigerBot.git
      
    • Run infer script

      python infer.py --model_path TigerResearch/tigerbot-13b-base-v2 --model_type base
      
  • Method 2:

    • Clone TigerBot Repo

      git clone https://github.com/TigerResearch/TigerBot.git
      
    • install git lfs: git lfs install

    • Download weights from huggingface or modelscope

      git clone https://huggingface.co./TigerResearch/tigerbot-13b-base-v2
      git clone https://www.modelscope.cn/TigerResearch/tigerbot-13b-base-v2.git
      
    • Run infer script

      python infer.py --model_path tigerbot-13b-base-v2 --model_type base --max_generate_length 64
      

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 52.11
ARC (25-shot) 53.84
HellaSwag (10-shot) 77.05
MMLU (5-shot) 53.57
TruthfulQA (0-shot) 44.06
Winogrande (5-shot) 74.98
GSM8K (5-shot) 17.06
DROP (3-shot) 44.21

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 53.42
AI2 Reasoning Challenge (25-Shot) 53.84
HellaSwag (10-Shot) 77.05
MMLU (5-Shot) 53.57
TruthfulQA (0-shot) 44.06
Winogrande (5-shot) 74.98
GSM8k (5-shot) 17.06