StructLM-7B / README.md
azhx's picture
Update README.md
4d0f0f6 verified
|
raw
history blame
6.29 kB
metadata
license: mit
datasets:
  - TIGER-Lab/SKGInstruct
language:
  - en

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Project Page: https://tiger-ai-lab.github.io/StructLM/

Paper: Arxiv link not yet announced

Code: https://github.com/TIGER-AI-Lab/StructLM

Introduction

StructLM, is a series of open-source large language models (LLMs) finetuned for structured knowledge grounding (SKG) tasks.

We release 3 models:

|-----|---------------------------------------------------------------| | 7B | StructLM-7B | | 13B | StructLM-13B | | 34B | StructLM-34B |

Training Data

These models are trained on 🤗 SKGInstruct Dataset, an instruction-tuning dataset containing mixture of 19 SKG tasks combined with 🤗 SlimOrca. Check out the dataset card for more details.

Training Procedure

The models are fine-tuned with CodeLlama-Instruct-hf models as base models. Each model is trained for 3 epochs, and the best checkpoint is selected.

Evaluation

The models are evaluated using open-ended and multiple-choice math problems from several datasets. Here are the results:

Model Decoding GSM MATH AQuA NumG SVA Mat Sim SAT MMLU AVG
MAmmoTH-7B CoT 50.5 10.4 43.7 44.0 47.3 9.2 18.9 32.7 39.9 33.0
PoT 51.6 28.7 43.3 52.3 65.1 41.9 48.2 39.1 44.6 46.1
Hybrid 53.6 31.5 44.5 61.2 67.7 46.3 41.2 42.7 42.6 47.9
MAmmoTH-Coder-7B CoT 22.4 7.9 36.2 36.0 37.0 8.2 7.2 32.7 34.6 24.7
PoT 58.8 32.1 47.2 57.1 71.1 53.9 44.6 40.0 47.8 50.3
Hybrid 59.4 33.4 47.2 66.4 71.4 55.4 45.9 40.5 48.3 52.0
MAmmoTH-13B CoT 56.3 12.9 45.3 45.6 53.8 11.7 22.4 43.6 42.3 37.1
PoT 61.3 32.6 48.8 59.6 72.2 48.5 40.3 46.8 45.4 50.6
Hybrid 62.0 34.2 51.6 68.7 72.4 49.2 43.2 46.8 47.6 52.9
MAmmoTH-Coder-13B CoT 32.1 10.2 40.6 36.2 43.0 9.6 10.1 40.9 36.6 28.8
PoT 64.3 35.2 46.8 54.2 73.2 60.0 44.2 48.2 48.2 52.7
Hybrid 64.7 36.3 46.9 66.8 73.7 61.5 47.1 48.6 48.3 54.9
MAmmoTH-Coder-33B CoT 34.3 11.6 39.0 36.2 44.6 10.8 10.9 46.4 42.9 30.7
PoT 72.3 42.8 53.8 59.6 84.0 64.7 50.6 58.6 52.7 59.9
Hybrid 72.7 43.6 54.7 71.6 84.3 65.4 51.8 60.9 53.8 62.1
MAmmoTH-70B CoT 72.4 21.1 57.9 58.9 71.6 20.0 31.9 57.3 52.1 49.2
PoT 76.7 40.1 60.2 64.3 81.7 55.3 45.3 64.1 53.5 60.1
Hybrid 76.9 41.8 65.0 74.4 82.4 55.6 51.4 66.4 56.7 63.4

Usage

You can use the models through Huggingface's Transformers library. Check our Github repo for the evaluation code: https://github.com/TIGER-AI-Lab/StructLM

Prompt Format

For this 7B model, the prompt format is

[INST] <<SYS>>
You are an AI assistant that specializes in analyzing and reasoning
over structured information. You will be given a task, optionally
with some structured knowledge input. Your answer must strictly
adhere to the output format, if specified.
<</SYS>>
{instruction} [/INST]```

To linearize structured input of various types during training, we follow the linearization procedures from [UnifiedSKG](https://arxiv.org/pdf/2201.05966.pdf), so using this format during prompting will be most effective.
To see concrete examples of this linearization, you can directly reference the 🤗 [SKGInstruct Dataset](https://huggingface.co./datasets/TIGER-Lab/SKGInstruct).

## Intended Uses
These models are trained for research purposes. They are designed to be proficient in interpreting linearized structured input. Downstream uses can potentially include various applications requiring the interpretation of structured data.

## Limitations
While we've tried to build an SKG-specialized model capable of generalizing, we have shown that this is a challenging domain, and it may lack performance characteristics that allow it to be directly used in chat or other applications.


## Citation
If you use the models, data, or code from this project, please cite the original paper: